In print and forthcoming

Lundberg, Ian, Rebecca Johnson, and Brandon M. Stewart. Forthcoming. “What is your estimand? Defining the target quantity connects statistical evidence to theory." American Sociological Review. [replication]

Our framework grounds methodological choices in a clear statement of the estimand: the goal an empirical analysis hopes to achieve.

We make only one point in this article. Every quantitative study must be able to answer the question: what is your estimand? The estimand is the target quantity---the purpose of the statistical analysis. Much attention is already placed on how to do estimation; a similar degree of care should be given to defining the thing we are estimating. We advocate that authors state the central quantity of each analysis---the theoretical estimand---in precise terms that exist outside of any statistical model. In our framework, researchers do three things: (1) set a theoretical estimand, clearly connecting this quantity to theory, (2) link to an empirical estimand, which is informative about the theoretical estimand under some identification assumptions, and (3) learn from data. Adding precise estimands to research practice expands the space of theoretical questions, clarifies how evidence can speak to those questions, and unlocks new tools for estimation. By grounding all three steps in a precise statement of the target quantity, our framework connects statistical evidence to theory.

Lundberg, Ian. 2020. “Does opportunity skip generations? Reassessing evidence from sibling and cousin correlations.” 2020. Demography. [open access] [replication].

  • 2020 Graduate Student Paper Award, American Sociological Association Section on Inequality, Poverty, and Mobility.

What do various data generating processes imply for the similarities of siblings' and cousins' income attainments? This paper links empirical evidence to a series of candidate Markov processes unfolding over many generations.

Sibling and cousin correlations are empirically straightforward: they capture the degree to which siblings' or cousins' outcomes are similar. The meaning of these quantities, however, is complicated. A multitude of theoretical processes can produce any particular set of sibling and cousin correlations. Using multigenerational mobility as a substantive example, I show that sibling and cousin correlations in published research are equally consistent with several theoretical interpretations. While some prior authors have concluded that opportunity must skip parents to directly link the outcomes of grandparents and offspring, I show that this evidence is often consistent with alternative theories of latent transmission (measurement error) or of dynamic transmission (a parent-to-child transmission process that changes over generations). I clarify that point estimates which seem to contradict a given theory may also arise from estimation error. I develop a Bayesian procedure to estimate sibling and cousin correlations and quantify uncertainty about the statistic central to the argument. I conclude by outlining how future research might use sibling and cousin correlations as effective descriptive quantities while remaining cognizant that these quantities could arise from a variety of distinct theoretical processes.

Lundberg, Ian, Sarah L. Gold, Louis Donnelly, Jeanne Brooks-Gunn, and Sara S. McLanahan. 2020. “Government assistance protects low-income families from eviction.Journal of Policy Analysis and Management. [open access] [replication]

America faces an affordable housing crisis. Eviction is alarmingly common. Public policies can help.

A lack of affordable housing is a pressing issue for many low-income American families and can lead to eviction from their homes. Housing assistance programs to address this problem include public housing and other assistance, including vouchers, through which a government agency offsets the cost of private market housing. This paper assesses whether the receipt of either category of assistance reduces the probability that a family will be evicted from their home in the subsequent six years. Because no randomized trial has assessed these effects, we use observational data and formalize the conditions under which a causal interpretation is warranted. Families living in public housing experience less eviction conditional on pre-treatment variables. We argue that this evidence points toward a causal conclusion that assistance, particularly public housing, protects families from eviction.

Salganik, Matthew J., Ian Lundberg, Alex Kindel, Sara McLanahan, and 108 others. 2020. “Measuring the predictability of life outcomes with a scientific mass collaboration.Proceedings of the National Academy of Sciences (latest articles). [replication] [project website]

The Fragile Families Challenge is a scientific mass collaboration designed to measure and understand the predictability of life trajectories using the common task method.

Participants in the Challenge created predictive models of six life outcomes using data from the Fragile Families and Child Wellbeing Study, a high-quality birth cohort study. We evaluated these predictions on holdout data not available to participants. This paper reports the predictive performance of the Fragile Families Challenge and presents implications for scientists and policymakers. We are not posting results of this project online until the paper is published.

Salganik, Matthew J., Ian Lundberg, Alexander T. Kindel, and Sara S. McLanahan. 2019. “Introduction to the special collection on the Fragile Families Challenge.Socius. [replication]

Hundreds of social scientists and data scientists participated in a scientific mass collaboration to predict life outcomes. This paper introduces a special collection of articles written by some who participated.

Participants in the Challenge created predictive models of six life outcomes using data from the Fragile Families and Child Wellbeing Study, a high-quality birth cohort study. This Special Collection includes twelve articles describing participants' approaches to predicting these six outcomes, as well as three articles describing methodological and procedural insights from running the Challenge. This introduction will help readers interpret the individual articles and help researchers interested in running future projects similar to the Fragile Families Challenge.

Lundberg, Ian, Arvind Narayanan, Karen E.C. Levy, and Matthew J. Salganik. 2019. “Privacy, ethics, and data access: A case study of the Fragile Families Challenge.Socius.

Data access is a key barrier to knowledge creation. Our framework for ethical data access, developed in a social science setting, could serve as a model for other data science settings.

Stewards of social science data face a fundamental tension. On one hand, they want to make their data accessible to as many researchers as possible to facilitate new discoveries. At the same time, they want to restrict access to their data as much as possible in order to protect the people represented in the data. In this paper, we provide a case study addressing this common tension in an uncommon setting: the Fragile Families Challenge, a scientific mass collaboration designed to yield insights that could improve the lives of disadvantaged children in the United States. We describe our process of threat modeling, threat mitigation, and third-party guidance. We also describe the ethical principles that formed the basis of our process. We are open about our process and the trade-offs that we made in the hopes that others can improve on what we have done.

Eviction is alarmingly common among American families, suggesting a failure of social policies.

A growing body of research suggests that housing eviction is more common than previously recognized and may play an important role in the reproduction of poverty. The proportion of children affected by housing eviction, however, remains largely unknown. We estimate that one in seven children born in large U.S. cities in 1998–2000 experienced at least one eviction for nonpayment of rent or mortgage between birth and age 15. Rates of eviction were substantial across all cities and demographic groups studied, but children from disadvantaged backgrounds were most likely to experience eviction. Among those born into deep poverty, we estimate that approximately one in four were evicted by age 15. Given prior evidence that forced moves have negative consequences for children, we conclude that the high prevalence and social stratification of housing eviction are sufficient to play an important role in the reproduction of poverty and warrant greater policy attention.

Killewald, Alexandra, and Ian Lundberg. 2017. “New evidence against a causal marriage wage premium.Demography. [open access] [replication]

Marriage does not cause men's hourly wages to increase. They are already increasing prior to the marriage date.

Recent research has shown that men’s wages rise more rapidly than expected prior to marriage, but interpretations diverge on whether this indicates selection or a causal effect of anticipating marriage. We seek to adjudicate this debate by bringing together literatures on (1) the male marriage wage premium; (2) selection into marriage based on men’s economic circumstances; and (3) the transition to adulthood, during which both union formation and unusually rapid improvements in work outcomes often occur. Using data from the National Longitudinal Survey of Youth 1979, we evaluate these perspectives. We show that wage declines predate rather than follow divorce, indicating no evidence that staying married benefits men’s wages. We find that older grooms experience no unusual wage patterns at marriage, suggesting that the observed marriage premium may simply reflect co-occurrence with the transition to adulthood for younger grooms. We show that men entering shotgun marriages experience similar premarital wage gains as other grooms, casting doubt on the claim that anticipation of marriage drives wage increases. We conclude that the observed wage patterns are most consistent with men marrying when their wages are already rising more rapidly than expected and divorcing when their wages are already falling, with no additional causal effect of marriage on wages.


Lundberg, Ian, and Brandon M. Stewart. 2020. “Comment: Summarizing income mobility with multiple smooth quantiles instead of parameterized means.Sociological Methodology. [open access] [replication]

Our methodological comment proposes a visualization to pack more information into summaries of economic mobility.

Studies of economic mobility summarize the distribution of offspring incomes for each level of parent income. Mitnik and Grusky (2020) highlight that the conventional intergenerational elasticity (IGE) targets the geometric mean and propose a parametric strategy for estimating the arithmetic mean. We decompose the IGE and their proposal into two choices: (1) the summary statistic for the conditional distribution and (2) the functional form. These choices lead us to a different strategy: visualizing several quantiles of the offspring income distribution as smooth functions of parent income. Our proposal solves the problems Mitnik and Grusky highlight with geometric means, avoids the sensitivity of arithmetic means to top incomes, and provides more information than is possible with any single number. Our proposal has broader implications: the default summary (the mean) used in many regressions is sensitive to the tail of the distribution in ways that may be substantively undesirable.

Working papers

Goals that appear vaguely descriptive are often actually causal. Framing them as such points to new machine learning estimators and provides language for clear interpretation.

Disparities across social categories such as race, gender, and class are central in social stratification. The complexity of these constructs, however, hinders their placement within a causal framework. On one hand, it is difficult to imagine a manipulation to alter the category to which one is assigned. On the other hand, categories themselves may be mutable across time and place as a result of social forces such as government definitions of racial categories. This paper advances gap-closing estimands that define precise causal research goals without reifying the definitions of social categories or appealing to a hypothetical world in which one's categorization were different. Instead, a gap-closing estimand directs attention to a manipulable treatment variable and asks a causal question: what gap across categories would persist under a local intervention to equalize the treatment? The proposal extends related work from epidemiology in three ways. First, I clarify that the hypothetical intervention is local rather than global in nature; there is no appeal to simultaneously equalize the treatments of the entire population. Second, I formalize equalization at a single treatment value or at a stochastic rule for treatment assignment. Third, I connect these estimands to doubly-robust estimators that combine treatment and outcome modeling. I illustrate with an example about the gap in pay by class origins under an intervention to equalize occupational class destinations. The paper concludes with implications for practice: gap-closing estimands provide tools for the rigorous study of inequality across social categories that could inform policies to close gaps.

By applying the gap-closing estimand, this paper presents new evidence about the degree to which occupational segregation contributes to racial health disparities.

Racial disparities in health are widely understood as the consequences of systemic racism across spheres of American life. This paper focuses on one aspect of structural racial inequality: the racial segregation of occupations. Linked panel data from the Current Population Survey show racial disparities in the onset of work-limiting disabilities among the employed. Further, the onset of work-limiting disability is particularly common in occupations with many non-Hispanic Black or Hispanic employees. I therefore examine a causal gap-closing estimand: the racial disparity that would persist in a counterfactual setting where people were allocated to occupations equitably. Through a causal identification strategy that involves adjustment for lagged measures of demographic, health, and human capital variables, I estimate racial disparities in work-limiting disability that would persist in that counterfactual setting. Eliminating occupational segregation would reduce but not eliminate racial disparities in the onset of work-limiting disabilities: the Black-white disparity would reduce by one-third. This result illustrates the interconnectedness of the labor market and public health: disparate exposures to hazardous occupational conditions become embodied in population health disparities.