Data Processing


GIS img

Gathering Geospatial Datasets

In order to create an impactful index, we gathered data from a variety of sources (detailed in the citations below). The data we used captures the indicators that were specified in the driver-indicator structure for drivers of water reuse described on the Motivation page.

The data we used was meant to represent the United States on a national level as much as possible, though of course certain variables were more sparse in certain areas. We acknowledge the bias and the impact that a lack of data availability creates in our calculated index and mapped representation.

Most of the data we used was publically available, and anything that was not publically available was provided by other researchers to use with permission.

crosswalk img

Merge Data to Target Scale (“Crosswalk”)

Because we used data on various geographic scales, for example census data on county or census tract level in combination with environmental data on a watershed level, we had to scale values up and down based on area or population weighting, depending on the variables and if they were best represented by a mean (e.g. population density) or a sum (e.g. total population) value, respectively. We also had to match boundary lines of overlapping areas by calculating the intersecting areas, bounded by their coordinate geometries. These types of procesesses are common in GIS tasks. Our final dataframe for the Water Reuse Index is on the drinking water service area level.

We decided to create functions that allow for “crosswalking” data between different geospatial units, sometimes based on a common ID column (for nested geometries) and sometimes based on more complicated area calculations. We allowed for point (longitude, latitude), raster, polygon, and tabular (with an ID column) data in these functions. The functions are detailed in this tutorial and the associated R scripts (complete with doc strings) are located in this folder. If you are looking for examples of how we used these functions in our data cleaning and crosswalking process, you can take a look at our cleaning files here.

Additional resources and tips for GIS processing have been compiled here.

Notes and examples: For the Provisional Water Reuse Index, we imputed null values to 0. We converted all values to percentiles prior to performing a Principal Component Analysis. We dropped any water service areas with no associated geometry that could be mapped.

Assign Variables (“Indicators”) to Categories (“Drivers”)

We knew that it was important to maintain the conceptual indicator/driver structure shown on the Motivation. We did this by creating a list mapping the drivers to the indicator values (columns) they are measured by. This impacts both how the index is calculated and the menu options that appear on the final website. This is detailed in following pages and in the create_website.Rmd.

Data Citations

Dataset Name Citation
2000-2020, Public supply water use reanalysis Luukkonen, C.L., Alzraiee, A.H., Larsen, J.D., Martin, D.J., Herbert, D.M., Buchwald, C.A., Houston, N.A., Valseth, K.J., Paulinski, S., Miller, L.D., Niswonger, R.G., Stewart, J.S., and Dieter, C.A., 2023, Public supply water use reanalysis for the 2000-2020 period by HUC12, month, and year for the conterminous United States: U.S. Geological Survey data release, https://doi.org/10.5066/P9FUL880.
WaSSI USDA/UFS
2015 Water Use Survey USGS
2024 EJScreen Data - Census Tracts U.S. Environmental Protection Agency. Risk-Screening Environmental Indicators (RSEI) https://www.epa.gov/rsei (2019).
ACS 2018-2022 Steven Manson, Jonathan Schroeder, David Van Riper, Katherine Knowles, Tracy Kugler, Finn Roberts, and Steven Ruggles. IPUMS National Historical Geographic Information System: Version 18.0 [dataset]. Minneapolis, MN: IPUMS. 2023. http://doi.org/10.18128/D050.V18.0
ATTAINS Dataset - Data only US EPA
ATTAINS Dataset - Geopackage US EPA
Bureau of Labor Statistics Bureau of Labor Statistics
CDC Waterborne Disease Outbreaks CDC
Clean Watersheds Needs Survey (CWNS) US EPA
Climate Vulnerability Index (CVI)  
Coastal Submarine Flows Sawyer, David, and Famiglietti 2016
County-Scale Rainwater Harvesting Potential Ennenbach, M.W., Concha Larrauri, P. and Lall, U., 2018. County‐scale rainwater harvesting feasibility in the United States: Climate, collection area, density, and reuse considerations. JAWRA Journal of the American Water Resources Association, 54(1), pp.255-274.
CSO Inventory US Environmental Protection Agency
CVI –> 2018 Highway Performance Monitoring System (HPMS) U.S. Department of Transportation - Federal Highway Administration. Highway Performance Monitoring System (HPMS) Functional System 1 - 4 , https://www.fhwa.dot.gov/policyinformation/hpms.cfm (2018).
CVI –> 2020 Presidential Voter Turnout MIT Election Data and Science Lab. Voter Turnout (2020 Presidential), https://electionlab.mit.edu/research/voter-registration (2020).
CVI –> 2050 Wildfire Factor First Street Foundation. Flood Factor Risk Statistics V 2.0, https://firststreet.org/press/press-release-flood-factor-v2-0-launch/ (2022).
CVI –> Agricultural pesticides Wieben, C. M. Estimated Annual Agricultural Pesticide Use for Counties of the Conterminous United States, 2013-17 (ver. 2.0, May 2020). U.S. Geological Survey data release (2019). https://doi.org/10.5066/P9F2SRYH.
CVI –> Change in Flood Factor, 2020-2050 First Street Foundation. Flood Factor Risk Statistics V 2.0, https://firststreet.org/press/press-release-flood-factor-v2-0-launch/ (2022)
CVI –> CHAS U.S. Department of Housing and Urban Development. Office of Policy Development and Research Consolidated Planning/CHAS Data. Database https://www.huduser.gov/portal/datasets/cp.html#2006-2018_query (2014-2018).
CVI –> Consecutive Dry Days Iturbide M. et al. Repository supporting the implementation of FAIR principles in the IPCC-WG1 Atlas. (2021).
CVI –> County Health Rankings & Roadmaps (CHR&R) University of Wisconsin Population Health Institute and the Robert Wood Johnson Foundation. County Health Rankings & Roadmaps (CHR&R), https://www.countyhealthrankings.org/2022-measures (2022).
CVI –> Flood Factor First Street Foundation. Flood Factor Risk Statistics V 2.0, https://firststreet.org/press/press-release-flood-factor-v2-0-launch/ (2022).
CVI –> Medically Underserved Areas Health Resources and Services Administration (HRSA). Locations of US Dept of Health and Human Services Health Resources and Services Administration Nursing Facilities; Locations of hospitals and critical access hospitals; Medically Underserved Areas; Counts and rates of health resources; and Health Professional Shortage Areas, https://data.hrsa.gov/tools/data-explorer (2022).
CVI –> National Environmental Public Health Tracking Network Centers for Disease Control and Prevention. National Environmental Public Health Tracking Network, https://ephtracking.cdc.gov/ (2022).
CVI –> National Neighborhood Data Archive (NaNDA) Finlay, J. L., M. Esposito, M. Gomez-Lopez, I. Khan, A. Clarke, P. and Chenoweth, M. National Neighborhood Data Archive (NaNDA): Religious, Civic, and Social Organizations by Census Tract, United States, 2003-2017. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor]. (2020-10-20). https://doi.org/10.3886/E115967V2
CVI –> Yields (% change) Hsiang, S. et al. Estimating economic damage from climate change in the United States. Science 356, 1362-1369 (2017). https://doi.org/10.1126/science.aal4369
Drinking Water Service Area Boundaries SimpleLab, Environmental Policy Innovation Center (EPIC), 2022
Drinking Water Service Area Boundaries US EPA
Drought Risk to Water Supply Devineni et al
Global Land Subsidence Probability Hasan, M. F., R. Smith, S. Vajedian, R. Pommerenke, S. Majumdar (2023). Global Land Subsidence Mapping Reveals Widespread Loss of Aquifer Storage Capacity Datasets, HydroShare, https://doi.org/10.4211/hs.dc7c5bfb3a86479b889d3b30ab0e4ef7
Global Wastewater Dataset Jones et. al.
GRACE/GLDAS USGS
Hurricane/Wind Factor The First Street Foundation
National Land Cover Database (NLCD)  
NPDES Discharge Monitoring Report (DMR) Loading Tool Summaries US EPA
NPDES Discharge Monitoring Reports US EPA
NPDES Violations US EPA
Plumbing Poverty Meehan
Principal Aquifers of the US U.S. Geological Survey, 2003, Principal Aquifers of the 48 Conterminous United States, Hawaii, Puerto Rico, and the U.S. Virgin Islands: U.S. Geological Survey data release, https://doi.org/10.5066/P9Y2HOUJ.