Keyword text classifier

If you are interested in understanding trends in what is posted, then you will want to classify the text of posts or comments (or both).

One way to classify what is posted is to create classifications based on specific words or phrases that appear in the post or comment text. This kind of classification works well where we have a small list of keywords and phrases that are almost always associated with the class we are interested in. For example, this is useful when you want to classify text that contains hate speech, hashtags, group identifiers, specific events, and places.

On the Phoenix platform, we call this a keyword text classifier. A keyword text classifier will automatically add classes to a post and/or comment when its text contains certain keywords.

We have noticed that some people confuse a gather that uses keywords with a classifier that uses keywords. Here are some ways to help you distinguish:

A gather collects data; a classifier applies a class (or label) to the data you have collected.
You do not collect any new data when you apply a keyword classifier, you simply apply a label to each bit of data you already have collected that contains the keywords.
If you gather data from a list of accounts and then you apply a keyword classifier to that data, you will find out when the accounts in your list post using the keywords you defined in your classifier. For example, when does the Build Up Facebook page post the word “peacebuilding”? </aside>

To create a keyword text classifier from the data that you have gathered:

Give your classifier a name and a description. For example, name: “Political Position” and description: “Noting the political position of authors”
Create all the classes that you will need. For example, you could add “violent extremism” and “peacekeeping”, and include a description for what each of those mean.
Once you have created these classes, you can then add the keywords or phrases you want to include in each class. You can add keywords to a class in two ways:
1. If you add them together, separated by spaces, in one text box, then there will be an “AND” operator between them. This means you will only classify text that has all the words in this sequence.
  1. For example, if I write war on terror then I will classify posts that have both the word war and the word on and the word terror, but not posts that only have the word war in them.
  2. If I want to classify the phrase war on terror then I need to write it in quotation marks “war on terror”. This will classify posts that have the exact expression war on terror, but not posts that have those three words in a different order or separated by other words (e.g. This war is terror on Earth).
2. If you add them separately, using the “+” to add a new text box, then there will be an “OR” operator between them. For example, if I write “war on terror” and terrorism in two separate boxes, then I will classify posts that have either the phrase war on terror or the word terrorism or both in them.

<aside> 💡

Getting to the list of keywords and features for each class will be the most time consuming part of this process, and may require iteration. You can try to apply a “lexicon” that someone else has already built, as a starting point.

</aside>

Once you are done adding keywords to the classes, you can finish and view the classifier. You will then see a button to run the classifier. The classifier will be applied to all gathered data. It will add a class (or label) to all data that it matches. This will allow you to, for example, find out how many posts contained the words “war on terror” or “terrorism”.

<aside> 📌

A keyword text classifier will almost certainly get false positives and you will miss some posts. There may be some contexts where these issues will be overwhelming and the classification won’t be useful (e.g. some hate terms can have multiple uses, including in common parlance). In these cases, you may want to consider applying a complex model

</aside>

Once you run this classifier, any new data that you gather will have this classifier applied to it.

You can archive a classifier, and all the classes (labels) will be removed from your data.
You can restore a classifier you have archived and the classes (lables) will be applied again to your data.