In print and forthcoming

Salganik, Matthew J., Ian Lundberg, Alex Kindel, Sara McLanahan, and 108 others. “Measuring the predictability of life outcomes with a scientific mass collaboration.Proceedings of the National Academy of Sciences (latest articles). [replication] [project website]

The Fragile Families Challenge is a scientific mass collaboration designed to measure and understand the predictability of life trajectories using the common task method.

Participants in the Challenge created predictive models of six life outcomes using data from the Fragile Families and Child Wellbeing Study, a high-quality birth cohort study. We evaluated these predictions on holdout data not available to participants. This paper reports the predictive performance of the Fragile Families Challenge and presents implications for scientists and policymakers. We are not posting results of this project online until the paper is published.

Lundberg, Ian, Arvind Narayanan, Karen E.C. Levy, and Matthew J. Salganik. 2020. “Privacy, ethics, and data access: A case study of the Fragile Families Challenge.” Forthcoming in Socius.

Data access is a key barrier to knowledge creation. Our framework for ethical data access, developed in a social science setting, could serve as a model for other data science settings.

Stewards of social science data face a fundamental tension. On one hand, they want to make their data accessible to as many researchers as possible to facilitate new discoveries. At the same time, they want to restrict access to their data as much as possible in order to protect the people represented in the data. In this paper, we provide a case study addressing this common tension in an uncommon setting: the Fragile Families Challenge, a scientific mass collaboration designed to yield insights that could improve the lives of disadvantaged children in the United States. We describe our process of threat modeling, threat mitigation, and third-party guidance. We also describe the ethical principles that formed the basis of our process. We are open about our process and the trade-offs that we made in the hopes that others can improve on what we have done.

Salganik, Matthew J., Ian Lundberg, Alexander T. Kindel, and Sara S. McLanahan. 2020. “Introduction to the special collection on the Fragile Families Challenge.” Forthcoming in Socius. [This is a companion to a paper under review. Email me for a copy.] [replication]

Hundreds of social scientists and data scientists participated in a scientific mass collaboration to predict life outcomes. This paper introduces a special collection of articles written by some who participated.

Participants in the Challenge created predictive models of six life outcomes using data from the Fragile Families and Child Wellbeing Study, a high-quality birth cohort study. This Special Collection includes twelve articles describing participants' approaches to predicting these six outcomes, as well as three articles describing methodological and procedural insights from running the Challenge. This introduction will help readers interpret the individual articles and help researchers interested in running future projects similar to the Fragile Families Challenge.

What do various data generating processes imply for the similarities of siblings' and cousins' income attainments? This paper links empirical evidence to a series of candidate Markov processes unfolding over many generations.

Sibling and cousin correlations are empirically straightforward: they capture the degree to which siblings' or cousins' outcomes are similar. The meaning of these quantities, however, is complicated. A multitude of theoretical processes can produce any particular set of sibling and cousin correlations. Using multigenerational mobility as a substantive example, I show that sibling and cousin correlations in published research are equally consistent with several theoretical interpretations. While some prior authors have concluded that opportunity must skip parents to directly link the outcomes of grandparents and offspring, I show that this evidence is often consistent with alternative theories of latent transmission (measurement error) or of dynamic transmission (a parent-to-child transmission process that changes over generations). I clarify that point estimates which seem to contradict a given theory may also arise from estimation error. I develop a Bayesian procedure to estimate sibling and cousin correlations and quantify uncertainty about the statistic central to the argument. I conclude by outlining how future research might use sibling and cousin correlations as effective descriptive quantities while remaining cognizant that these quantities could arise from a variety of distinct theoretical processes.

Lundberg, Ian, Sarah L. Gold, Louis Donnelly, Jeanne Brooks-Gunn, and Sara S. McLanahan. “Government assistance protects low-income families from eviction.” Forthcoming in Journal of Policy Analysis and Management. [replication]

America faces an affordable housing crisis. Eviction is alarmingly common. Public policies can help.

A lack of affordable housing is a pressing issue for many low-income American families and can lead to eviction from their homes. Housing assistance programs to address this problem include public housing and other assistance, including vouchers, through which a government agency offsets the cost of private market housing. This paper assesses whether the receipt of either category of assistance reduces the probability that a family will be evicted from their home in the subsequent six years. Because no randomized trial has assessed these effects, we use observational data and formalize the conditions under which a causal interpretation is warranted. Families living in public housing experience less eviction conditional on pre-treatment variables. We argue that this evidence points toward a causal conclusion that assistance, particularly public housing, protects families from eviction.

Lundberg, Ian, and Brandon M. Stewart. 2020. “Comment: Summarizing income mobility with multiple smooth quantiles instead of parameterized means.” Comment forthcoming in Sociological Methodology. [replication]

Our methodological comment proposes a visualization to pack more information into summaries of economic mobility.

Studies of economic mobility summarize the distribution of offspring incomes for each level of parent income. Mitnik and Grusky (2020) highlight that the conventional intergenerational elasticity (IGE) targets the geometric mean and propose a parametric strategy for estimating the arithmetic mean. We decompose the IGE and their proposal into two choices: (1) the summary statistic for the conditional distribution and (2) the functional form. These choices lead us to a different strategy: visualizing several quantiles of the offspring income distribution as smooth functions of parent income. Our proposal solves the problems Mitnik and Grusky highlight with geometric means, avoids the sensitivity of arithmetic means to top incomes, and provides more information than is possible with any single number. Our proposal has broader implications: the default summary (the mean) used in many regressions is sensitive to the tail of the distribution in ways that may be substantively undesirable.

Eviction is alarmingly common among American families, suggesting a failure of social policies.

A growing body of research suggests that housing eviction is more common than previously recognized and may play an important role in the reproduction of poverty. The proportion of children affected by housing eviction, however, remains largely unknown. We estimate that one in seven children born in large U.S. cities in 1998–2000 experienced at least one eviction for nonpayment of rent or mortgage between birth and age 15. Rates of eviction were substantial across all cities and demographic groups studied, but children from disadvantaged backgrounds were most likely to experience eviction. Among those born into deep poverty, we estimate that approximately one in four were evicted by age 15. Given prior evidence that forced moves have negative consequences for children, we conclude that the high prevalence and social stratification of housing eviction are sufficient to play an important role in the reproduction of poverty and warrant greater policy attention.

Killewald, Alexandra, and Ian Lundberg. 2017. “New evidence against a causal marriage wage premium.Demography. [open access] [replication]

Marriage does not cause men's hourly wages to increase. They are already increasing prior to the marriage date.

Recent research has shown that men’s wages rise more rapidly than expected prior to marriage, but interpretations diverge on whether this indicates selection or a causal effect of anticipating marriage. We seek to adjudicate this debate by bringing together literatures on (1) the male marriage wage premium; (2) selection into marriage based on men’s economic circumstances; and (3) the transition to adulthood, during which both union formation and unusually rapid improvements in work outcomes often occur. Using data from the National Longitudinal Survey of Youth 1979, we evaluate these perspectives. We show that wage declines predate rather than follow divorce, indicating no evidence that staying married benefits men’s wages. We find that older grooms experience no unusual wage patterns at marriage, suggesting that the observed marriage premium may simply reflect co-occurrence with the transition to adulthood for younger grooms. We show that men entering shotgun marriages experience similar premarital wage gains as other grooms, casting doubt on the claim that anticipation of marriage drives wage increases. We conclude that the observed wage patterns are most consistent with men marrying when their wages are already rising more rapidly than expected and divorcing when their wages are already falling, with no additional causal effect of marriage on wages.

Working papers

Lundberg, Ian, Rebecca Johnson, and Brandon M. Stewart. “Setting the target: Precise estimands and the gap between theory and empirics." [replication]

Our framework grounds methodological choices in a clear statement of the estimand: the goal an empirical analysis hopes to achieve.

The link between theory and quantitative empirical evidence is a longstanding hurdle in sociological research. Ambiguity about the role that statistical evidence plays in an argument may produce misleading conclusions and poor methodological practice. This ambiguity could be reduced if researchers would state the theoretical estimand---the central quantity at the core of a given paper---in precise language. Our approach envisions three choices in the research process: (1) choice of a theoretical estimand, which will be informative for theory, (2) choice of an empirical estimand, which is informative about the theoretical estimand under some identification assumptions, and (3) choice of an estimation strategy to learn the empirical estimand from data. Key advantages of this approach include improved clarity on the object of interest, transparency about how empirical evidence contributes to knowledge of that quantity, and the ability to easily plug in new statistical tools for estimation.

Goals that appear vaguely descriptive are often actually causal. Framing them as such points to new machine learning estimators and provides language for clear interpretation.

Gaps in socioeconomic outcomes by race, gender, and class are central to the study of social stratification and inequality. Studies of these gaps often avoid causal claims because these demographic categories are non-manipulable. Drawing on literature in epidemiology, this paper reviews a causal estimand that may be of interest: the gap across demographic categories which would persist under a local intervention to equalize a treatment. Unlike most causal estimand, this gap-closing estimand involves potential outcomes under a single treatment, instead of two treatment levels. After reviewing the identification assumptions outlined in prior literature, I formalize a double machine learning gap-closing estimator. This estimator uses observed cases in the treatment level of interest to learn a function mapping pre-treatment covariates to the potential outcome under this treatment, and it combines this model with a propensity score model to yield efficient estimates. The paper concludes with implications for practice: a gap-closing estimator directly targets the research goal of greatest substantive interest, thereby providing tools for the rigorous study of categorical inequality across social groups.