Crowdsourced hate speech database

4/8/2013

Crowdsourced hate speech database could spot early signs of genocide

"Hatebase" can help distinguish between angry noise and systematic hate speech.

by KADHIM SHUBBER, WIRED.CO.UK | APRIL 7, 2013

The use of hate speech to dehumanize people is widely recognized as one of the first steps towards genocide. From Rwanda, where Hutu radio stations blared out propaganda referring to Tutsis as "cockroaches," to Nazi Germany, where Jews were likened to a disease that needed to be cleansed from society, hate speech has been a clear warning sign of terrible things to come.

Hatebase, a new crowdsourced database of multilingual hate speech from The Sentinel Project, is an attempt to create a repository of words and phrases that researchers can use to detect the early stages of genocide.

"How many people outside of Sri Lanka know that 'sakkiliya' is a Sinhala term used to refer to a Tamil person as 'a very unhygienic or uncultured person'," Christopher Tuckwood, executive director of The Sentinel Project, told Wired.co.uk. "Hatebase helps us to know what to look for and to make sense of what we see."

Front-end users can log on to the website and add examples of hate speech from their communities, and also record location-specific "sightings," while developers can use an authenticating API that allows them to mesh Hatebase data with other tools for genocide prevention.

"Our intention with Hatebase was for the data to be used as a contextual layer on top of other monitoring datasets and infrastructure," Hatebase's developer Timothy Quinn told Wired.co.uk. "It's essentially acting as a sort of Z-axis to escalate or lessen threat severity and allow NGOs to redeploy resources accordingly."

Anyone who's spent a short amount of time online will know that hate speech isn't in short supply. The challenge is distinguishing between low-level background noise and systematic hate speech that could be the beginning of something worse.

"Hatebase gives us a reference point for what we should be listening for—picking that signal out of the noise—and then help with quantifying it. The real trick is to then connect those hate speech trends with other real-world phenomena," says Tuckwood. "For example, we've seen some hint of a possible correlation between when Iranian officials make anti-Baha'i statements and when there are upticks in attacks such as vandalism or arson on members of that religious minority."

Launched on 25 March, the database is still in its early stages, but the developers say that further functionality will be added in the coming months. In the future Hatebase may become a valuable tool for NGOs trawling through vast amounts of online communication, providing "a layer of relevance which complements other context-based information sources, not unlike traffic congestion layered onto a city map."

Crowdsourced hate speech database

Articles of Interest

Categories