1. Campus Sonar Insights
  2. Marketing
  3. January 2021: Social Listening Benchmarks

Methodology: Social Listening Benchmarks for Higher Ed: January 2021

Campus Sonar collected one year of historical online conversation data for 93 higher education institutions.


To determine the sample size for this report, we identified the population size, confidence level, and confidence interval. These values were carefully selected—as our sample size increases, confidence in our observations and estimates increase, too. We followed a specific protocol to identify a sample representative of the population and calculated confidence intervals and levels to further contextualize our findings. Based on these values, the calculated sample size is 93 higher educational institutions. We arrived at this sample size based on key decisions made about our knowledge of the population, the confidence level, and the confidence interval.

Population Size

The entire group about which some information is required to ascertain.

Value: 2,657

Rationale: The number of higher education institutions in the U.S. according to the 2018 Carnegie Classification of Institutions of Higher Education® (CCIHE), excluding institutions that entirely or predominantly confer only Associate’s degrees, due to low conversation volume.

Confidence Level

An expression of how confident a researcher can be of the data obtained from the sample.

Value: 95%

Confidence Interval

Also called the Margin of Error, the Confidence Interval is a range of values, above and below a finding, in which the actual value is likely to fall; it represents the accuracy of precision of an estimate.

Value: 10%

Rationale: A 95% Confidence Level is a common research selection. Together with the Confidence Interval of 10%, the goal of this research is to report that 95% of the time, the true percentage of the population falls within a range of plus or minus 10% for any metric.

Sample Procedure

We segmented the population by four institutional characteristics, which supports analysis of our dataset across these segments in our sample. These characteristics were sourced from the 2015 Carnegie Classification of Institutions of Higher Education® (CCIHE) and National Center for Education Statistics (IPEDS) to select and segment a sample proportionate to the population.

Program Type (CCIHE)

  • BASIC2018
  • 2018 Basic Classification (e.g., Doctoral, Master’s, etc.)

Institution Type (IPEDS)

  • Control of Institution (i.e., public or private)

Size by Enrollment (CCIHE)

  • SIZESET2018
  • 2018 Size and Setting Classification

Geographic Region (IPEDS)

  • Region Code

A fifth variable from IPEDS, NAME, CITY, STABBR indicates the institution name, city location, and state abbreviation to generally identify higher education institutions.

Sample Selection

A quota sampling procedure was used to select the 93 higher education institutions for the study, ensuring that the assembled sample has the same proportions to the population for the four characteristics outlined.

Quota sampling is the ideal technique to select and segment the sample and investigate each characteristic based on the data collected for each identified metric. This type of sampling appears representative of the population, however it is only representative for these four characteristics—other unidentified characteristics may be unintentionally under- or over-represented in the sample.

Campus Sonar used the following process to select 93 institutions representative of the 2,657 institutions that do not primarily confer associates degrees across the four characteristics.

  1. Determine the proportion of each characteristic (Program Type, Institution Type, Size by Enrollment, and Geographic Region) across the entire population (2,657 institutions).

    Example: To identify how many institutions from the sample should be represented in each category of Program Type, we identified that 411 of 2,657 institutions are primarily doctoral—15.5%. Then we multiplied 93 by 15.5% to confirm that 14 institutions in our sample should be doctoral.
  2. Define the distribution of the 93 sample institutions within each characteristic, rounding to the nearest whole number, to identify how many institutions should fall within each category of each characteristic.

    We determined the final sampling of the 93 institutions for each characteristic as objectively as possible, with the following exceptions.
    • All but eight schools of the 65 schools used in the 2019 Online Conversation Benchmarks for Higher Education were used in the sample again this year. With the addition of 28 schools to this year's report, these eight schools from the original sample were removed in order to best ensure the sample was representative this year.
    • With all other characteristics being equal, in the event of a tie between two geographic areas, we selected the area containing more private, nonprofit institutions with enrollment under 10,000 students because Campus Sonar works most closely with these types of schools.
    • Institutions located in outlying areas such as Puerto Rico and the Virgin Islands were not selected due to language limitations in collecting comparable social listening data.
    • Institutions with multiple campuses or shared names avoided as much as possible due to challenging data collection and analysis, e.g., an undue burden in terms of time spent validating data when it’s nearly impossible to determine campus location from a tweet without campus knowledge.
    • Institutions unclassified by size, classified as two-year, and U.S. service schools were not selected.
    • Institutions Campus Sonar previously collected data (e.g., prospects and clients) for were selected whenever possible within the sampling parameters. This resulted in more efficient data collection and validation efforts.

Data Collection

The nature of social listening as a research tool is limited due to privacy restrictions from individuals, online platforms, and social listening software. Individuals can set their social media posts to private, online platforms may not allow or limit social software access to their data (i.e., Facebook, LinkedIn, Instagram) and as a result, social listening software is unable to collect that social data from individuals or sites.

  • In August and September 2020, Campus Sonar Social Media Data Analysts used enterprise-level social listening software to collect publicly-available online conversations for the 93 higher education institutions in our sample.
  • Social listening data was collected from public online conversation occurring from August 2018 to July 2019. Analysts gathered 185 metrics aligned with the objectives of this study using the following data collection process.

Data Collection Steps

  1. Write Query: Create a unique query using Boolean operators for each institution. The social listening software uses the query to search the internet for matching content.
  2. Create Hashtag List: Identify hashtags that are unique to each campus and unlikely to be used outside of that campus.
  3. Create Owned Author List: Create a list of Twitter, Instagram, YouTube, Flickr, Reddit, and Tumblr owned authors for each institution.
  4. Categorize Online Mentions: Write Boolean to categorize an institution’s collected online mentions and content.
  5. Validate Data: Manually validate data to check its relevancy and accuracy to the institution.
  6. Capture and Calculate Metrics: Capture and calculate 286 pre-defined metrics for each institution.

      Data Collection Process

      Write Query

      Identify and write a query including the following components.

      • General names and terms for each institution, including nicknames and athletic teams
      • URLs for and links to the main institution website and athletic site or referencing institutional terms
      • Hashtags unique to the institution
      • Owned social media accounts
      • Exclude duplicative terms or phrases

      Create Hashtag List

      Add any unique hashtags discovered when researching and building an institutional query to the query (e.g., #GoDogs is a hashtag used by multiple campuses, but it’s not unique to a school included in the sample so it was not included in the query).

      Create Owned Author List

      Create two author lists for each institution—owned and athletic.

      • Owned Authors: For the sites searched, accounts for the main campus, alumni and admissions departments, and other campus accounts that appear to be controlled by the institution when available; student associations were not included.
      • Athletic Authors: Owned accounts that are primarily focused on institutional athletic teams. Accounts for intramural teams were not included unless affiliated with the institution’s athletic department and featured on their website.

      Categorize Online Mentions

      Categorize the data collected using the following rules.

      • Owned content created by authors on the Owned Author List, retweets of owned content (by any author), and content from owned websites.
      • Athletics conversation.
      • Prospective student mentions, including inquiry and application mentions, on social media and forums.
      • Admitted student mentions, including scholarship recipients and athletes, on social media and forums.
      • Alumni first-person mentions on social media.
      • Mentions of alumni (from anyone) on news sites.

      Prospective student, admitted student, and alumni mentions are identified using a proprietary taxonomy created by Campus Sonar.

      Validate Data

      Assess relevancy of each institution’s social data collected by beginning with some assumptions of relevant data. Data was assumed to be relevant if it met the following criteria.

      • Content created or shared by authors on the Owned Author List, earned authors who @ mentioned an owned author.
      • Content from owned websites (e.g., .edu or athletic sites).
      • Content used the full name of the institution.
      • Content from higher education specific forums, like College Confidential.

      From there, social data that did not meet this criteria was assessed via a number of tactics, such as reviewing Top Phrases and Hashtags, mentions by location, author, and site to identify large swathes of the data that may either be relevant or irrelevant. 

      Special categories, like Admissions and Alumni, were reviewed carefully to ensure that any irrelevant mentions were removed.

      Capture and Calculate Metrics

      Use a standard data collection dashboard to gather the required metrics and enter the data into a Google Sheet for each institution in our sample.

      Data Analysis and Year-to-Year Changes

      Metrics Gathered

      The metrics gathered spanned several categories, including the following, for both athletic and non-athletics conversation.

      • Athletic affiliation
      • Total conversation volume (e.g., over time and by mention type)
      • Number of unique authors contributing to the conversation
      • Sentiment
      • Owned versus earned conversation volume and type
      • Content sources (e.g., Social media, news, blogs, forums, etc.)
      • Admissions conversation volume and content sources
      • Alumni conversation volume and content sources

      Year-to-Year Changes

      • A proprietary definition of each content source was used this year, grouping mentions from certain sites as from a standard content source (e.g., Social Media includes mentions from Twitter, YouTube, and Instagram).
      • The social listening software used underwent a significant migration after the completion of last year’s report and prior to collecting data for this year. It’s expected that this resulted in changes in how sentiment was registered for each mention, which is accounted for in our reporting.
      • Data was also collected slightly differently due to this migration with two large queries instead of 93 separate queries. The impact of this is minimal, and only resulted in additional segmentation of collected query data at the dashboard level to ensure each institution’s data could be reviewed individually.