Text mining for topics is not just topic modeling…

I recently published a journal article that looked at how identifying topics of conversation in social media posts is not as easy as some make it out to be. I have heard people say so many different times that you can just use “topic modeling”.


As always with data science methods (in their current state), you have have to understand a bit more about how they work before you make assumptions from the output. Since journal articles are sometimes (many times) so difficult to read, you can get the gist of my paper from these following tables. I took a Twitter timeline of a brand (FitBit) and compared various techniques to humans reading the text and identifying topics of conversation.

See, it is not that straight-forward. And there are way more ways than what I have listed.

If you want a copy of the article, let me know. But you can read the almost identical pre-print (not as nicely formatted) here.

Data science is not magic and you actually have to understand a bit of what is going on.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store