Identifying Coronavirus Disinformation Online

Team

DSSG Fellows:

  1. George Hope Chidziwisano, Michigan State University
  2. Kseniya Husak, University of Michigan
  3. Maya Luetke, Indiana University
  4. Richa Gupta, Columbia University

Project Leads:

  1. Maggie Engler, Lead Data Scientist, Global Disinformation Index
  2. Lucas Wright, Senior Researcher, Global Disinformation Index

Data Science Leads:

  1. Noah Benson, Senior Data Scientist, eScience Institute, University of Washington
  2. Vaugh Iverson, Research Scientist, eScience Institute, University of Washington

Abstract

Websites that disseminate disinformation might harm the general public by spreading rumors and producing untrustworthy content. Specifically, the spread of disinformation related to coronavirus could contribute to people not engaging in the necessary prevention measures and, thus, lead to increased transmission of the virus, which would result in increased morbidity and mortality worldwide.

Developing a method to identify disinformation sites could mitigate these harmful effects by allowing advertisers to not fund such sites. The purpose of this project was to develop an open-source natural language processing model that could accurately classify news articles according to their risk of containing disinformation about the coronavirus.

We developed a neural network model which was able to correctly identify 93.7% of disinformation articles and only incorrectly classified genuine articles as disinformation 2.8% of the time.