The Trending Terms section reveals the most significant terms in any healthcare conversation according to Symplur's Natural Language Processing (NLP) algorithm. See below for an explanation of this algorithm.

The sizes of the bubbles in each group are determined by the algorithm score rather than by the frequency count. The larger the bubble, the more significant the term. The bubble sizes are proportional only to the bubbles in the same grouping.

As with other sections, click on stakeholder group(s) to view only the data for the group(s). Click on the stakeholder group a second time to deselect it.


Click on a bubble to reveal the term's frequency count and the algorithm score, as well as three options:

Tweet Transcript will open the list of tweets associated with this term. Within this list, you can select specific stakeholders, include or exclude retweets, and expand the list from 100 to 1,500.

Filter By This Term will filter this section by the selected term, updating the chart and bubbles. You can remove this filter by clicking on the "X" next to the highlighted term located in the upper right of the chart (as shown below).

Note: You can only filter this chart by one term at a time. Selecting a second term will override the first term.

Remove will remove this term from the chart and will produce an updated chart excluding the removed term. You will be presented with the option to remove the selected term from the entire dashboard (as shown below). Multiple terms can be removed and the below pop-up will update to show all currently removed terms.


Clicking on the sliding bars icon above will display the three options seen in the above screenshot.

You can include or exclude retweets, turn on or off the ability to drag bubbles (similar to the Network Analysis section), and adjust the colors of the bubbles.

Note: The Randomize colors option only applies when the Associated Terms chart option is selected.


By default, the terms are displayed using the Compare option. The trending terms of the selected time period are compared to the trending terms of the preceding time period of the same length. The forty most significant terms from both periods are compared and the common words are displayed in the middle while the unique top forty words of each time period are displayed on each side.

The Associated Terms option will display the terms in groups as defined by terms that are used with each other. This is similar to the Communities option in the Network Analysis section.

The date range option will display the most significant terms for the selected time period.

The Combine option displays the most significant terms used over both of the time periods.


The icons in the upper right provide several additional options:

Table - The table icon changes the display to an Excel-like table. Click on the icon a second time to change the display back to the original format.

Rank By - Toggle between different ranking algorithms by algorithm name in the header. The available options are SymplurRank and Count.

Export as CSV - Download this section into an Excel table file.

Read Help Article - The associated Help article will open in a new browser tab.

API Query - The code for the associated API query will display in a pop-up box.

Refresh - The section data will refresh.

Remove - This section of the dashboard will be hidden. To view this section again, scroll to the bottom of the entire dashboard and click on the icon associated with this section.

Note - The section will display at the bottom of the dashboard and not in its original location.


The Algorithm

The texts of all the tweets included in the query are first converted to a list of word groupings. The groupings are consecutive regular words that aren't broken up by things like punctuation, screen names, hashtags, or URLs. We also include any regular words that occur in isolation.

Then we convert these groupings into lists of all possible phrases of 4 words long and count how many times each phrase occurs.  We remove any phrase that only occurred once, contains a 1-character word that isn't a digit, contains a word in our Polygram Stopwords list, or occurs less frequently than the mean among all 4-word phrases.  We then compute a Pointwise Mutual Information (PMI) score for each phrase, and we remove any that score negatively for co-occurrence.

Then we make another pass with the above steps, looking at 3-word phrases, and then 2-word phrases.  At the end of each pass, any phrases that didn't get filtered out are removed from consideration in subsequent passes. This way, the same words aren't counted multiple times.

We then do a similar pass with single words, except that instead of filtering based on PMI scores, we remove all words that are shorter than 3 characters long, consist solely of digits, or match a term in either our Polygram or our Monogram Stopwords lists.

Then we merge the four lists of terms and compute the final score for each term.  For each list, we multiply the frequencies in the list by a scale factor. The scale factor is different for each list and is calculated to equalize the standard deviations of all four lists, based on an exponential curve fit.  Finally, single-word phrases are penalized by an arbitrary value of 30% in order to emphasize multi-word phrases, which typically are more meaningful.

Did this answer your question?