Respondent Driven Sampling (RDS) is a method both for data collection and statistical inference. The unit of analysis in RDS is a network structure rather than an individual, and the analysis generalizes to the networks of the sampled population. Unlike traditional sampling methods, this method of analysis influences the kind of estimators we use, the way we understand the variance around the estimates, and how we interpret the findings. Respondent-driven sampling (RDS) combined survey and statistical method for providing a quasi-probability sample for hard to reach populations such as people experiencing homelessness. The statistical model compensates for the fact that the sample was collected through peer referral rather than a traditional random sampling approach such as an addressed based sample. This method relies on multiple waves of peer-to-peer recruitment to approximate random sampling within “hard-to-reach” (see Figure 1). As a type of chain referral sampling, RDS is particularly useful when traditional probability sampling methods are not feasible (e.g., when a sampling frame does not exist). It integrates social network theory to reduce known biases such as oversampling people with large personal networks in the target population. RDS population estimates generalize to the target population (e.g., the population of people living unsheltered in King County).
Theory supporting RDS requires:
Current estimators for RDS analysis are primarily developed to describe proportions in a network and make inference about an entire network based on information about the known part of the network (the part in your sample). For more details, see this article.
This method is based on inclusion probabilities for members of the sample and reported network sizes to adjust estimates. It allows for weights to be applied to the entire sample rather than to each variable separately. The approach models the sampling process as a first-order Markov random walk through the network of the target population, providing robust estimates by accounting for the complex social structures inherent in Respondent-Driven Sampling (RDS). This method approximates the population proportion by weighting it based on a repeated-sampling model for RDS, assuming that the inclusion probability is proportional to the degree of each respondent.
Under these assumptions, it is evident that at every step of the sampling process, each tie has an equal probability of being sampled. For more details, see this article.
Before initiating the research, the research team secured approval from the University of Washington Institutional Review Board (IRB) for this study involving human subjects.
Seed selection is a critical step in Respondent Driven Sampling (RDS) as it initiates the recruitment chain. For the 2023 UW King County Understanding Homelessness Project, the seed selection process was designed to ensure diversity and representation within the target population of individuals experiencing homelessness.
Seeds were recruited through a combination of outreach efforts, including collaborations with local service providers, shelters, and community organizations. Potential seeds were approached with detailed information about the study and were selected based on their willingness to participate and their ability to recruit others.
Ensuring informed consent and maintaining participant confidentiality were paramount throughout the study.
Data collection began in late April and concluded in early June, lasting just over a month. Volunteers and site managers were active at eleven distinct site locations.
Figure 1: Recreated from Almquist et al. (in press). GIS representation of all 11 hub sites used in the survey with US Census urban areas in red and US Census tracts in gray.
We prepared the data by creating functions to clean it and impute additional columns where needed. We shortened column names to enhance clarity and readability, such as using prefixes such as ego
and alter
and replacing terms like “household” with hr
. These efforts resulted in a more streamlined and user-friendly dataset, facilitating accurate and efficient analysis. Additionally, we standardized common columns in both the RDS (Respondent-Driven Sampling) and PSD (Puget Sound Data) datasets to ensure consistency.
For the analysis, we utilized the RDS package (version 0.9-9), which provides various tools for implementing Respondent-Driven Sampling (RDS). The RDS package provides functionality for carrying out estimation with data collected using Respondent-Driven Sampling (RDS). It includes the Heckathorn’s RDS-I and RDS-II estimators as well as Gile’s Sequential Sampling estimator. This package is part of the “RDS Analyst” suite of packages designed for the analysis of respondent-driven sampling data.
The specific functions used in our analysis include:
bootstrap.contingency.test
: This function performs a bootstrap test of independence between two categorical variables, offering a robust method for hypothesis testing, especially when sample sizes are small or distributions are unknown.
RDS.II.estimates
: This function computes RDS-II estimates for categorical or numeric variables, adjusting for network size and sample inclusion probabilities to provide accurate estimates.
For additional details, you can visit the RDS Package Documentation.
To ensure that the results were representative of the target population, data weighting and adjustment were applied.
Statistical analysis was performed to derive insights and draw conclusions from the data.
By accounting for varying probabilities of selection, RDS-II provides accurate percentage estimates within our survey data. For estimating the total number of unsheltered individuals, it allows us to estimate the population size accurately by correcting for biases and accounting for the network-based sampling method used in our survey. The RDS-II weight type adjusts for variations in network size and recruitment chains, ensuring that each individual’s contribution to the estimate reflects their true representation in the population. Additionally, the bootstrap.contingency.test
function was used to draw bootstrap RDS samples from the population distribution and calculate the chi-squared statistic on the weighted contingency table, with weights calculated using the RDS-II estimator, providing a robust method for hypothesis testing.
To analyze differences in a continuous variable between two groups, we utilized RDS estimation techniques to provide adjusted mean estimates and standard errors for each group. The RDS.II.estimates
function was used to extract weighted mean estimates and their standard errors. We then conducted a Welch t-test to compare the means between the two groups, appropriate for cases with unequal variances. This method computed the t-statistic, degrees of freedom, p-value, and confidence intervals based on the weighted mean estimates and standard errors, offering a robust statistical comparison of the weighted means between the two groups.
This methods section outlines the critical components of the data collection, processing, and analysis methodologies employed in the 2023 UW King County Understanding Homelessness Project. The use of Respondent Driven Sampling (RDS) provides a robust framework for studying hard-to-reach populations, ensuring that the insights drawn from this research are both valid and reliable.