Data Science Leads:
Websites that disseminate disinformation might harm the general public by spreading rumors and producing untrustworthy content. Specifically, the spread of disinformation related to coronavirus could contribute to people not engaging in the necessary prevention measures and, thus, lead to increased transmission of the virus, which would result in increased morbidity and mortality worldwide.
Developing a method to identify disinformation sites could mitigate these harmful effects by allowing advertisers to not fund such sites. The purpose of this project was to develop an open-source natural language processing model that could accurately classify news articles according to their risk of containing disinformation about the coronavirus.
We developed a neural network model which was able to correctly identify 93.7% of disinformation articles and only incorrectly classified genuine articles as disinformation 2.8% of the time.