How Many Uninsured Are in the Coverage Gap and How Many Could be Eligible if All States Adopted the Medicaid Expansion?
Technical Appendix B: Immigration Status Imputation
To impute documentation status, we draw on the methods underlying the 2013 analysis by the State Health Access Data Assistance Center (SHADAC) and the recommendations made by Van Hook et. al..1,2 This approach uses the 2023 KFF/LA Times Survey of Immigrants to develop a model that predicts immigration status for each person in the sample.3 We apply the model to a second data source, controlling to state-level estimates of total undocumented population as well as the undocumented population in the labor force from the Pew Research Center.4 Below we describe how we developed the regression model and applied it to the American Community Survey (ACS). We also describe how the model may be applied to other data sets. The programming code, written using the statistical computing package R v.4.3.1, is available upon request for people interested in replicating this approach for their own analysis.
We used the 2023 KFF/LA Times Survey of Immigrants l data to build the regression model. The 2023 Survey of Immigrants dataset contains questions on citizenship and legal status at the person level. The KFF/LA Time Survey of Immigrants5 is a probability-based survey exploring the immigrant experience in the U.S. and draws on three different sampling frames including an address-based sample (ABS), a random digit dial (RDD) sample of pre-paid cell phone numbers, and callbacks to an RDD sample in which the individual did not speak English or Spanish. The survey includes interviews with 3,358 immigrant adults and was offered in ten different languages.
The regression model is designed to be applied to other datasets in order to impute legal immigration status in surveys that do not ask about migration status. The code mentioned above includes programming to apply the model to either the Survey of Income and Program Participation (SIPP) Core files, ACS, or the Current Population Survey (CPS). Because the SIPP Core file contains different survey questions and variable specifications from the ACS and CPS, we create unique regression models to apply the model to each dataset. For the analysis underlying this brief and other KFF estimates of eligibility for ACA coverage, we apply the regression model to the 2013 ACS and then each subsequent year of the ACS.
Due to underreporting of legal immigration status in survey datasets, in imputing immigration status we control to state and national-level estimates of the total undocumented population and also the undocumented population in the labor force from the Pew Research Center. Pew reports these estimates for all states and the District of Columbia.6
Construction of Regression Model
We use the 2023 Survey of Immigrants to create a binomial, dependent variable that identifies a respondent as a potential unauthorized immigrant. The dependent variable is constructed based on the following factors:
- Respondent was not a United States (US) citizen,
- Respondent did not have permanent resident status or a valid work or student visa, and
- Respondent does not have other indicators that imply legal status.7
We use the following independent variables to predict unauthorized immigrant status:
- Year of US entry,
- Job industry classification,
- State of residence,
- Household Income,
- Ownership or rental of residence,
- Number of occupants in the household (< or >= six occupants),
- Whether all household occupants are related,
- Health insurance coverage status,
- Sex, and
The regression model was sub-populated to remove respondents who could not be considered unauthorized. People who could not be considered unauthorized include people who are US citizens or have other indicators that imply legal status.
Imputing Unauthorized Immigrants in Other Datasets
We use the Pew estimates as targets for the total number of unauthorized immigrants that the imputation generates. We first apply this strategy to the 2013 ACS, which contains health insurance information prior to the ACA’s coverage expansions. We stratify the targets by state and the District of Columbia and by participation in the labor force. We impute immigration status within each of these 102 strata.8
To generate the imputed immigration status variable, we first calculated the probability that each person in the dataset was unauthorized based on the 2023 Survey of Immigrants regression model. Next, we isolated the dataset to each individual stratum described above. Within each stratum, we sampled the data using the probability of being unauthorized for each person. After sampling, we summed the person weights until reaching the Pew population estimate for each stratum. The records that fell within the Pew population estimate were considered to be unauthorized immigrants. We repeated the process of sampling using the probability of being unauthorized and subsequently summing the person weights to reach Pew targets five times, creating five different unauthorized variables per record. These five imputed authorization status variables were then incorporated into a standard multiple imputation algorithm, closely matching the imputed variable analysis techniques used by the Centers for Disease Control and Prevention for the National Health Interview Survey.9
We used this first pass on the ACS 2013 to inform our sampling targets for the latest available microdata (ACS 2022). Looking at the results of our undocumented imputation on the ACS 2013, we calculated the share of undocumented immigrants lacking health insurance within each of those 102 strata prior to the ACA’s coverage expansions and transferred that information into a new dimension of sampling strata for the ACS 2022. We split each of the 102 sampling strata used on the pre-ACA ACS 2013 into uninsured versus insured categories, resulting in 204 sampling strata for subsequent years. We then repeated our imputation on the ACS 2022 with the newly-divided strata, allowing for a small decline in the undocumented uninsured rate based off of the percent drop in the uninsured rate among citizens.10
To easily apply the regression model to other data sets, we created a function that applies this approach to a chosen data set. The function first loads the dataset of choice, then standardizes the data to match the independent variables from the 2023 Survey of Immigrants regression model, and finally applies the multiple imputation to generate a variable for legal immigration status.