If you are interested in understanding trends in what is posted, then you will want to classify the text of posts or comments (or both).

One way to classify what is posted is to create classifications based on specific words or phrases that appear in the post or comment text. This kind of classification works well where we have a small list of keywords and phrases that are almost always associated with the class we are interested in. For example, this is useful when you want to classify text that contains hate speech, hashtags, group identifiers, specific events, and places.

On the Phoenix platform, we call this a keyword text classifier. A keyword text classifier will automatically add classes to a post and/or comment when its text contains certain keywords.

<aside> <img src="/icons/thought-dialogue_blue.svg" alt="/icons/thought-dialogue_blue.svg" width="40px" />

We have noticed that some people confuse a gather that uses keywords with a classifier that uses keywords. Here are some ways to help you distinguish:

To create a keyword text classifier from the data that you have gathered:

  1. Give your classifier a name and a description. For example, name: “Political Position” and description: “Noting the political position of authors”
  2. Create all the classes that you will need. For example, you could add “violent extremism” and “peacekeeping”, and include a description for what each of those mean.
  3. Once you have created these classes, you can then add the keywords or phrases you want to include in each class. You can add keywords to a class in two ways:
    1. If you add them together, separated by spaces, in one text box, then there will be an “AND” operator between them. This means you will only classify text that has all the words in this sequence.
      1. For example, if I write war on terror then I will classify posts that have both the word war and the word on and the word terror, but not posts that only have the word war in them.
      2. If I want to classify the phrase war on terror then I need to write it in quotation marks “war on terror”. This will classify posts that have the exact expression war on terror, but not posts that have those three words in a different order or separated by other words (e.g. This war is terror on Earth).
    2. If you add them separately, using the “+” to add a new text box, then there will be an “OR” operator between them. For example, if I write “war on terror” and terrorism in two separate boxes, then I will classify posts that have either the phrase war on terror or the word terrorism or both in them.

<aside> 💡

Getting to the list of keywords and features for each class will be the most time consuming part of this process, and may require iteration. You can try to apply a “lexicon” that someone else has already built, as a starting point.

</aside>

  1. Once you are done adding keywords to the classes, you can finish and view the classifier. You will then see a button to run the classifier. The classifier will be applied to all gathered data. It will add a class (or label) to all data that it matches. This will allow you to, for example, find out how many posts contained the words “war on terror” or “terrorism”.

<aside> 📌

A keyword text classifier will almost certainly get false positives and you will miss some posts. There may be some contexts where these issues will be overwhelming and the classification won’t be useful (e.g. some hate terms can have multiple uses, including in common parlance). In these cases, you may want to consider applying a complex model

</aside>

Once you run this classifier, any new data that you gather will have this classifier applied to it.