What is SAMbot?

Samara Areto Monitorbot, aka SAMbot, is a machine learning bot that detects and tracks toxic sentiment on Twitter. 

During election periods where SAMbot is deployedSAMbot tracks candidates’ mentions 24/7 during the election campaign and provides us with data and insights about the online abuse on a weekly basis. SAMbot tracks the Twitter mentions of all candidates and registered political parties with a Twitter account.

SAMbot is bilingual! SAMbot can monitor tweets in both English and French.

SAMBOT-head-illustration

Why do we need SAMbot ?

While it is commonly understood that toxic online spaces are harming our democracy, there is little data that illustrates the extent of the problem in detail. Consequently, effective regulations, policies and social expectations for online conduct in Canadian political contexts are lacking.

Online political discourse during campaign periods can be extremely toxic. SAMbot is deployed during Canadian elections to examine the current state of Canada’s online political conversations. SAMbot tracks all English and French tweets directed at all candidates with a public Twitter account. Each tweet that SAMbot tracks — whether it’s a reply, quote tweet or mention — is assessed for how likely the content can be classified across seven toxicity attributes: toxicity, severe toxicity, identity attacks, insults, profanity, threats, and sexually explicit comments. A tweet may contain a single attribute (e.g. include a threat) or a combination (e.g. include a threat, identity attack and be profane).

This information will help to inform important conversations and nuanced approaches to reducing the toxicity of online political spaces in Canada.

Toxicity Attributes

Attribute name Description
TOXICITY A rude, disrespectful, or unreasonable comment that is likely to make people leave a discussion.
SEVERE TOXICITY A very hateful, aggressive, disrespectful comment or otherwise very likely to make a user leave a discussion or give up on sharing their perspective. This attribute is much less sensitive to more mild forms of toxicity, such as comments that include positive uses of curse words.
IDENTITY ATTACK Negative or hateful comments targeting someone because of their identity.
INSULT Insulting, inflammatory, or negative comment towards a person or a group of people.
PROFANITY Swear words, curse words, or other obscene or profane language.
THREAT Describes an intention to inflict pain, injury, or violence against an individual or group.
SEXUALLY EXPLICIT Contains references to sexual acts, body parts, or other lewd content.

What does "likely toxic" mean?

SAMbot uses a natural language processing machine learning model to make predictions about whether someone would consider text toxic or not. SAMbot is trained and tested on millions of language data points to understand colloquialisms, natural language syntax and more. The algorithm looks at things like the specific words used in a tweet and the order they were used in to make a prediction. Each tweet that SAMbot analyzes is assessed for how likely a person receiving the text would view it as toxic.

A “likely toxic” tweet is a tweet assessed as being greater than or equal to 51% likely to be toxic, meaning that the text is likely to be considered uncivil, insulting, hostile and may even be threatening or profane. Language is highly nuanced and SAMbot can never replace a human, which is why the word “likely” is used in reports. 

A different filter is used to make predictions about whether a tweet is “severely toxic” or not. This filter is less sensitive to milder forms of toxicity and helps to cull false positives (e.g. when people use swear words in a positive way).

SAMbot does not provide absolute or definitive conclusions but rather offers important insights about the state of online political conversations.

How does SAMbot detect toxicity?

SAMbot is a machine learning bot — a software application robot that runs automated tasks over the Internet. SAMbot tracks all English and French tweets sent to all candidates with an active Twitter account. Each message that SAMbot tracked — whether that’s a reply, quote tweet or mention — is analyzed and scored on seven toxicity attributes.

SAMbot can collect and analyze tweets in real time during any election campaign period. SAMbot provides a tweet analysis and stores each tweet that mentions the Twitter handle of at least one of the monitored candidates.

SAMbot does not track or store retweets, as counting the same tweet more than once can confuse the analysis. Twitter data was collected and used in line with Twitter’s acceptable terms of use.

SAMbot analyzes and stores tweets by:

  1. analyzing the tweet information to identify and extract the tweet text
  2. scoring the tweet on a seven-point toxicity scale, and
  3. storing the tweet text and its toxicity score in a database

Read more about the technology behind SAMbot and how it has been used in other elections:

View PDF
report illustration

How machine learning helps SAMbot evolve

A helpful analogy for building a machine learning model is baking a cake from scratch using your own recipe. This requires a lot of testing – trying different ingredients, temperature settings for the oven, baking times and techniques. You will bake a lot of cakes. Some will turn out great and others less so. There are many variables to consider as you refine your recipe.

Once you have landed your best recipe you document it and label it “Best Cake Recipe” or Cake V1. You share this cake with others and they give you great feedback (e.g. “delicious!”).
Over time, you discover new ingredients and techniques. You start expanding your network and meet other bakers and learn about new tools and methods. There is application of this learning – you are eager to test new ingredients and tools as you set about making “Best Cake Ever Recipe” or Cake V2. You take the view that even the best cake can be better and start to fine tune. The basic ingredients stay the same but you try a different order and start adjusting the presentation.

And so it goes with machine learning and SAMbot. As we continue to deploy SAMbot to track different elections, we’ll continue to improve and iterate on the SAMbot machine learning “recipe” to attempt to refine, improve, and increase the accuracy of our results.

Where can I read SAMbot’s findings?

The Samara Centre for Democracy and Areto Labs continuously review the large amount of data that SAMbot collects during the election periods to conduct further analysis. This is being done with the intention of developing informed conclusions about online toxicity, civic engagement and democracy in Canada.

Visit our elections page to review weekly snapshots and reports from elections where SAMbot was deployed.

Elections Page

I have feedback for SAMbot!

Share your thoughts