Today we will look at tactics and techniques used by threat actors as reported in “AI’s coming home: How Artificial Intelligence Can Help Tackle Racist Emoji in Football” by Hannah Rose Kirk for the Oxford Internet Institute on 16 July 2021.
The intention of this series is to make it easier to understand why the article has been tagged with particular tactics or techniques. Associating reporting of real-world attacks with DISARM tactics and techniques helps us get a better understanding of how they have practically been used, who’s used them, and who they’ve been used against. To do this a relevant quote from the article will be provided under the title of the associated technique. If the technique exists in DISARM, then its identifier will be included too.
Decide to Act: Harass People Based on Identities (T0048.002)
Produce: Harassment (T0048)
England’s final game in the Euros on Sunday night was an emotional occasion. The dark side of disappointment fuelled racist abuse targeted at England players Bukayo Saka, Marcus Rashford and Jadon Sancho.
Calibrate: Use “Algospeak” to avoid automated content moderation
While Twitter and Instagram’s algorithms claim to detect hateful and abusive content, there appears to be a large emoji-shaped hole in their defences. Despite users repeatedly flagging racist use of the monkey ([🐒]), banana ([🍌]) and watermelon emoji ([🍉]), abuse expressed in emoji form was not flagged by Instagram or Twitter’s content moderation algorithms.
On why it’s hard for algorithms to detect racist emoji use:
The lack of emoji in training datasets causes a model to err when faced with real-world social media data—either by missing hateful emoji content (false negative) or by incorrectly flagging innocuous uses (false positives). Internet slang is constantly evolving, and in the specific context of abuse detection, algorithms and malicious users engage in a cat-and-mouse game with new and creative ways being devised to circumvent censorship filters. The use of emoji for racist abuse is an example of a change which has outpaced current detection models, allowing offensive messages to continue being shared even if equivalent abuse expressed in text would be removed.
There are two problems when it comes to detecting abuse online expressed in emoji. First, models have rarely been shown hateful emoji examples in their training. Second, models are rarely evaluated against emoji-based hate; so, their weaknesses in detection are unknown. Our ongoing research at the University of Oxford and The Alan Turing Institute marks the first attempt to address these problems.
On a taxonomy of emoji based hate:
Without much existing research or datasets telling us how emoji are used in online abuse, we constructed the first taxonomy of emoji-based hate. The taxonomy tests specific ways in which emoji are used on social media. For example, an emoji can be substituted for a protected identity e.g. “I hate [👦🏿]”, or it can be substituted for a violent action towards these protected groups e.g., “I will [🔪] Muslims”. In line with the predominant ways emoji were abusive in the Euros 2020 context, a person or group of peoples can be described as similar to a dehumanising emoji e.g. “Black footballers are like [🐒]”. Alternatively, a number of these emoji substitutions can be combined in double swaps to create ‘emoji only’ hate e.g. “[👦🏿]=[🐒]” or “[👦🏿][🍌] ”.