The Trending Terms widget reveals the most significant terms in any healthcare conversations according to Symplur's NLP algorithm.

Comparing to the Past

By default, the trending terms of the selected time period are compared to the trending terms of the preceding time period of the same length. The 40 most significant terms from both periods are compared and the common words are displayed in the middle while the unique top 40 words of each time period are displayed on each side.

The sizes of the bubbles in each group are determined by the algorithm score rather than by the frequency count.

While visually shown in this Compare View each group's most significant term is scaled up to max size in order to improve legibility and implied significance. However, only when visually shown in the Combine View are the sizes of the bubbles proportionally correct.

Comparing Healthcare Stakeholder Voices

Selecting a single stakeholder will compare that stakeholder's 40 most significant terms in the selected time period to all stakeholders (including the selected stakeholder). When two stakeholders are selected the two stakeholder voices are compared for the selected time period.

The Algorithm

The texts of all the tweets included in the query are first converted to a list of word groupings. The groupings are consecutive regular words that aren't broken up by things like punctuation, screen names, hashtags, or URLs. We also include any regular words that occur in isolation.

Then we convert these groupings into lists of all possible phrases of 4 words long and count how many times each phrase occurs.  We remove any phrase that only occurred once, contains a 1-character word that isn't a digit, contains a word in our Polygram Stopwords list, or occurs less frequently than the mean among all 4-word phrases.  We then compute a Pointwise Mutual Information (PMI) score for each phrase, and we remove any that score negatively for co-occurrence.

Then we make another pass with the above steps, looking at 3-word phrases, and then 2-word phrases.  At the end of each pass, any phrases that didn't get filtered out are removed from consideration in subsequent passes. This way, the same words aren't counted multiple times.

We then do a similar pass with single words, except that instead of filtering based on PMI scores, we remove all words that are shorter than 3 characters long, consist solely of digits, or match a term in either our Polygram or our Monogram Stopwords lists.

Then we merge the four lists of terms and compute the final score for each term.  For each list, we multiply the frequencies in the list by a scale factor. The scale factor is different for each list and is calculated to equalize the standard deviations of all four lists, based on an exponential curve fit.  Finally, single-word phrases are penalized by an arbitrary value of 30% in order to emphasize multi-word phrases, which typically are more meaningful.

Filters

  • Filter Healthcare Stakeholder voices by selecting 1 or 2 stakeholders from the toolbar. Selecting 2 stakeholders will create a comparison view between them.

Engagements

  • Toggle between the Compare View and the Combine View with the toolbar button.
  • Select a bubble to reveal the term's frequency count, algorithm score, and the option to list the tweets with this term (Note: the Signals report that opens use a different algorithm that may give a different count).

Did this answer your question?