Why apply a complex model?

The challenges of a keyword classifier is, in summary, that the classification does not account for context. We can overcome this challenge to a certain extent by applying a classification that uses a large language model (or LLM). LLMs are trained on large amounts of data to learn how language works. They can then use this knowledge to perform a variety of natural language processing (NLP) tasks, including classifying text.

Put simply for our use case: LLMs are much better at classifying complicated things such as broad topics (”politics” or “gender”) or tone (“intimidation” or “negative sentiment” or “toxicity”).

Complex models available to apply in Phoenix

When you click “create” in the Classify tab on the Phoenix platform, you will see options to create an author classifier and a keyword text classifier (these two options are described in the previous sections). You will also see options to apply complex models to classify text (in posts or comments), with the name of the model and a short description. The models listed below are currently available on Phoenix.

<aside> ➡️

We aim to add more classifier models as we grow. Do you have access to a model that you think would help Phoenix? Email [email protected] to let us know and we’ll try to integrate it!

</aside>

Perspective API

Perspective API is a free classifier that uses machine learning to identify toxic language. You can use Perspective API to classify based on the text of a post and/or comment. This will be applied to all posts and/or comments that have been gathered, and data gathered in the future.

To facilitate user interpretation and application of these scores, we have implemented a system that clusters the continuous scores into four discrete probability buckets. This manual outlines the methodology for this clustering process and provides guidance on how to interpret and utilize these buckets effectively.

Clustering Methodology

Given the absence of definitive thresholds that delineate the severity of attributes like toxicity, we have adopted an equal distribution approach to categorize the scores. This method ensures that users can select and apply thresholds that best suit their specific needs. The four probability buckets are defined as follows: