We created three different point-in-time household definitions. Our point-in-time definitions use only a single month of data. They start simple, incorporating just address information, and then increase in complexity, as we add last name information in two different ways.
Naive
The naive coresidence definition groups the entire population by their imputed address for a given month. It takes the resulting residences, for example a house with five people shown in the diagram below, and labels them as households. In this definition, a household references people who live together at the same time.
The naive coresidence definition fails to capture whether or not these people are roommates, or family. We are more interested in the latter as we try to identify resource-sharing units for the purpose of measuring poverty.
Why Last Names?
Limited Options:
Pros of Last Names:
(color = last name)
Last Names Deterministic
For the last name deterministic definition, we keep the naive definition for the residences where everyone shares a last name. When there are multiple last names, we split the residence into multiple households, one for each last name.
(color = last name)
Last Names Probabilistic
Because we know that many families contain multiple last names, we try using a weighted coin for the residences with multiple last names to decide if we should keep the naive definition or split the people apart by last name.
(color = last name)
The coin is weighted by the likelihood that a residence is a family household according to Census data about household size and geographical tract. In the example below, two residences with the same household composition (five people, two last names) will have different probabilities since in County A, 80% of cases like this are families, compared to only 20% in County B.
(color = last name)