If you are interested in understanding trends in what is posted, then you will want to classify the text of posts or comments (or both).
One way to classify what is posted is to create classifications based on specific words or phrases that appear in the post or comment text. This kind of classification works well where we have a small list of keywords and phrases that are almost always associated with the class we are interested in. For example, this is useful when you want to classify text that contains hate speech, hashtags, group identifiers, specific events, and places.
On the Phoenix platform, we call this a keyword text classifier. A keyword text classifier will automatically add classes to a post and/or comment when its text contains certain keywords.
<aside> đź“Ś
A keyword text classifier will almost certainly get false positives and you will miss some posts. There may be some contexts where these issues will be overwhelming and the classification won’t be useful (e.g. some hate terms can have multiple uses, including in common parlance). In these cases, you may want to consider applying a complex model.
</aside>
If a keyword text classifier seems appropriate, you can create one in three steps:
<aside> đź’ˇ
Getting to the list of keywords and features for each class will be the most time consuming part of this process, and may require iteration. You can try to apply a “lexicon” that someone else has already built, as a starting point.
</aside>