Mining Online Data for Early Identification of Unsafe Food Products

The goal of this project is to use product reviews from Amazon.com to identify potentially unsafe food products. Foods that are mislabeled, contaminated, or spoiled get recalled through a time-consuming process that can leave consumers at risk of allergic reactions, injury, and illness for months. Our goal is to use reviews that consumers post online to predict whether a product will be recalled. Specifically, we:

  1. Mine and integrate a large corpus of data posted online to understand trends and features in unsafe food product reports

  2. Develop preliminary classification models for early identification of unsafe foods

This is one of four projects from the 2016 Data Science for Social Good summer fellowship at the University of Washington eScience Institute.

The Team

Project Lead: Elaine Nsoesie, Institute for Health Metrics and Evaluation, Department of Global Health, UW

Data Scientists: Valentina Staneva, Joe Hellerstein, Jes Ford

DSSG Fellows: Michael Munsell, Miki Verma, Cynthia Vint, Kara Woo

Explore the Reviews

We created an exploratory tool for viewing reviews of recalled products. The plot below shows reviews and ratings for a recalled product over time, as well as the date the product was recalled (if no date appears, the recall happened outside the date range of our Amazon review data). Hover over the points to view the text of the review. In this case, a reviewer noted a labeling issue in 2011, long before the product was recalled for mislabeling. The reviews in this tool provide some support for the idea that product reviews can be a fruitful data source for identifying unsafe foods.