Measuring and characterizing hate speech on news websites
MetadataShow full item record
Citation (published version)Savvas Zannettou, Mai Elsherief, Elizabeth Belding, Shirin Nilizadeh, Gianluca Stringhini. 2020. "Measuring and Characterizing Hate Speech on News Websites." 12th ACM Conference on Web Science, https://doi.org/10.1145/3394231.3397902
The Web has become the main source for news acquisition. At the same time, news discussion has become more social: users can post comments on news articles or discuss news articles on other platforms like Reddit. These features empower and enable discussions among the users; however, they also act as the medium for the dissemination of toxic discourse and hate speech. The research community lacks a general understanding on what type of content attracts hateful discourse and the possible effects of social networks on the commenting activity on news articles. In this work, we perform a large-scale quantitative analysis of 125M comments posted on 412K news articles over the course of 19 months. We analyze the content of the collected articles and their comments using temporal analysis, user-based analysis, and linguistic analysis, to shed light on what elements attract hateful comments on news articles. We also investigate commenting activity when an article is posted on either 4chan’s Politically Incorrect board (/pol/) or six selected subreddits. We find statistically significant increases in hateful commenting activity around real-world divisive events like the “Unite the Right” rally in Charlottesville and political events like the second and third 2016 US presidential debates. Also, we find that articles that attract a substantial number of hateful comments have different linguistic characteristics when compared to articles that do not attract hateful comments. Furthermore, we observe that the post of a news articles on either /pol/ or the six subreddits is correlated with an increase of (hateful) commenting activity on the news articles.
This paper was presented at the 12th ACM Conference on Web Science, 5-9 October 2020. This is the accepted manuscript version of the paper. The final version is available online from the Association for Computing Machinery at: https://doi.org/10.1145/3394231.3397902.
Rights© 2020 Association for Computing Machinery.