1. Campus Sonar Insights
  2. Marketing
  3. May 2021: Social Listening Benchmarks

Methodology: Social Listening Benchmarks for Higher Ed: May 2021

Campus Sonar collected one year of historical online conversation data for 92 higher education institutions.


To determine the sample size for this report, we identified the population size, confidence level, and confidence interval. These values were carefully selected—as our sample size increases, confidence in our observations and estimates increase, too. We followed a specific protocol to identify a sample representative of the population and calculated confidence intervals and levels to further contextualize our findings. Based on these values, the calculated sample size is 92 higher educational institutions. We arrived at this sample size based on key decisions made about our knowledge of the population, the confidence level, and the confidence interval.

Population Size

The entire group about which some information is required to ascertain.

Confidence Level

An expression of how confident a researcher can be of the data obtained from the sample.

  • Value: 95%

Confidence Interval

Also called the Margin of Error, the Confidence Interval is a range of values, above and below a finding, in which the actual value is likely to fall; it represents the accuracy of precision of an estimate.

  • Value: 10%
  • Rationale: A 95% Confidence Level is a common research selection. Together with the Confidence Interval of 10%, the goal of this research is to report that 95% of the time, the true percentage of the population falls within a range of plus or minus 10% for any metric.

Sample Procedure

We segmented the population by four institutional characteristics, which supports analysis of our dataset across these segments in our sample. These characteristics were sourced from the 2015 Carnegie Classification of Institutions of Higher Education® (CCIHE) and National Center for Education Statistics (IPEDS) to select and segment a sample proportionate to the population.

Program Type (CCIHE)

  • BASIC2018
  • 2018 Basic Classification (e.g., Doctoral, Master’s, etc.)

Institution Type (IPEDS)

  • Control of Institution (i.e., public or private)

Size by Enrollment (CCIHE)

  • SIZESET2018
  • 2018 Size and Setting Classification

Geographic Region (IPEDS)

  • Region Code

A fifth variable from IPEDS, NAME, CITY, STABBR indicates the institution name, city location, and state abbreviation to generally identify higher education institutions.

Sample Selection

A quota sampling procedure was used to select the 92 higher education institutions for the study, ensuring that the assembled sample has the same proportions to the population for the four characteristics outlined.

Quota sampling is the ideal technique to select and segment the sample and investigate each characteristic based on the data collected for each identified metric. This type of sampling appears representative of the population, however it is only representative for these four characteristics—other unidentified characteristics may be unintentionally under- or over-represented in the sample.

Campus Sonar used the following process to select 92 institutions representative of the 2,657 institutions that do not primarily confer Associate’s degrees across the four characteristics.

  1. Determine the proportion of each characteristic (Program Type, Institution Type, Size by Enrollment, and Geographic Region) across the entire population (2,657 institutions).

    To identify how many institutions from the sample should be represented in each category of Program Type, we identified that 411 of 2,657 institutions are primarily doctoral—15.5%. Then we multiplied 92 by 15.5% to confirm that 14 institutions in our sample should be doctoral.
  2. Define the distribution of the 92 sample institutions within each characteristic, rounding to the nearest whole number, to identify how many institutions should fall within each category of each characteristic.

    We determined the final sampling of the 92 institutions for each characteristic as objectively as possible, with the following exceptions.
    • All but 8 of the 65 schools used in the 2019 Online Conversation Benchmarks for Higher Education were used in the sample again this year. With the addition of 28 schools to this year's report, these 8 schools from the original sample were removed in order to best ensure the sample was representative this year.
    • With all other characteristics being equal, in the event of a tie between two geographic areas, we selected the area containing more private, nonprofit institutions with enrollment under 10,000 students because Campus Sonar works most closely with these types of schools.
    • Institutions located in outlying areas such as Puerto Rico and the Virgin Islands were not selected due to language limitations in collecting comparable social listening data.
    • Institutions with multiple campuses or shared names avoided as much as possible due to challenging data collection and analysis, e.g., an undue burden in terms of time spent validating data when it’s nearly impossible to determine campus location from a tweet without campus knowledge.
    • Institutions unclassified by size, classified as two-year, and U.S. service schools were not selected.
    • Institutions Campus Sonar previously collected data (e.g., prospects and clients) for were selected whenever possible within the sampling parameters. This resulted in more efficient data collection and validation efforts.

Data Collection

The nature of social listening as a research tool is limited due to privacy restrictions from individuals, online platforms, and social listening software. Individuals can set their social media posts to private, online platforms may not allow or limit social software access to their data (i.e., Facebook, LinkedIn, Instagram) and as a result, social listening software is unable to collect that social data from individuals or sites.

  • In January 2021, Campus Sonar Social Media Data Analysts used enterprise-level social listening software to collect publicly-available online conversations for the 92 higher education institutions in our sample.
  • Social listening data was collected from public online conversation occurring from July to December 2020. Analysts gathered 185 metrics aligned with the objectives of this study using the following data collection process.

Data Collection Steps

  1. Update Queries: Unique queries using Boolean operators for each institution were built for the report analyzing 2018–2019 conversation and were re-used for this report. Key terms in two boolean operators were removed because they collected irrelevant conversation. The boolean string with authors was narrowed to search Twitter only to collect relevant author data. The social listening software uses the query to search the internet for matching content.
  2. Re-use Hashtag List in Query: Identify hashtags that are unique to each campus and unlikely to be used outside of that campus.
  3. Re-use Owned Author List: Create a list of Twitter, Instagram, YouTube, Flickr, Reddit, and Tumblr owned authors for each institution.
  4. Categorize Online Mentions: Write Boolean to categorize an institution’s collected online mentions and content.
  5. Validate Data: Manually validate data to check its relevancy and accuracy to the institution.
  6. Capture and Calculate Metrics: Capture and calculate 286 pre-defined metrics for each institution.

General Data Collection Process Overview

Write Query

Identify and write a query including the following components.

  • General names and terms for each institution, including nicknames and athletic teams
  • URLs for and links to the main institution website and athletic site or referencing institutional terms
  • Hashtags unique to the institution
  • Owned social media accounts
  • Exclude duplicative terms or phrases

Create Hashtag List

Add any unique hashtags discovered when researching and building an institutional query to the query (e.g., #GoDogs is a hashtag used by multiple campuses, but it’s not unique to a school included in the sample so it was not included in the query).

Create Owned Author List

Create two author lists for each institution—owned and athletic.

  • Owned Authors: For the sites searched, accounts for the main campus, alumni and admissions departments, and other campus accounts that appear to be controlled by the institution when available; student associations were not included.
  • Athletic Authors: Owned accounts that are primarily focused on institutional athletic teams. Accounts for intramural teams were not included unless affiliated with the institution’s athletic department and featured on their website.

Categorize Online Mentions

Categorize the data collected using the following rules.

  • Owned content created by authors on the Owned Author List, retweets of owned content (by any author), and content from owned websites.
  • Athletics conversation.
  • Prospective student mentions, including inquiry and application mentions, on social media and forums.
  • Admitted student mentions, including scholarship recipients and athletes, on social media and forums.
  • Alumni first-person mentions on social media.
  • Mentions of alumni (from anyone) on news sites.

Prospective student, admitted student, and alumni mentions are identified using a proprietary taxonomy created by Campus Sonar.

Validate Data

Assess relevancy of each institution’s social data collected by beginning with some assumptions of relevant data. Data was assumed to be relevant if it met the following criteria.

  • Content created or shared by authors on the Owned Author List, earned authors who @ mentioned an owned author.
  • Content from owned websites (e.g., .edu or athletic sites).
  • Content used the full name of the institution.
  • Content from higher education specific forums, like College Confidential.

From there, social data that did not meet this criteria was assessed via a number of tactics, such as reviewing Top Phrases and Hashtags, mentions by location, author, and site to identify large swathes of the data that may either be relevant or irrelevant. 

Special categories, like Admissions and Alumni, were reviewed carefully to ensure that any irrelevant mentions were removed.

Capture and Calculate Metrics

Use a standard data collection dashboard to gather the required metrics and enter the data into a Google Sheet for each institution in our sample.

Data Analysis and Year-to-Year Changes

Metrics Gathered

The metrics gathered spanned several categories, including the following, for both athletic and non-athletics conversation.

  • Athletic affiliation
  • Total conversation volume (e.g., over time and by mention type)
  • Number of unique authors contributing to the conversation
  • Sentiment
  • Owned versus earned conversation volume and type
  • Content sources (e.g., social media, news, blogs, forums, etc.)
  • Admissions conversation volume and content sources
  • Alumni conversation volume and content sources

Year-to-Year Changes

  • Query updates (as outlined above) were minimal and included in order to decrease irrelevancies in data collected
  • Data was not collected this year for the The Stevens Institute of Technology, which was included in the last report
  • Athletic content for this report includes retweets of owned athletic content