Ian Lundberg

Assistant Professor, Department of Information Science, Cornell University

Will join UCLA Sociology as of 1 July 2024

I study topics in stratification and inequality. I strive to produce substantive findings that are conceptually precise and which rely on credible assumptions. These principles often lead me to computational and machine learning methods and the development of new approaches.

My CV contains links to all papers and replication files. Scroll down for an overview of my research targeting estimands that are predictive, descriptive, and causal.

Estimands are the starting point for methodological choices

In a vision of social science research laid out with Rebecca Johnson and Brandon Stewart, I argue that social scientists should state the research goal in precise terms: about what target population do we want to infer, and under what factual or counterfactual condition?

By stating the goal outside of any regression model, our framework improves the precision of theory and its link to empirical evidence

Our framework uncovers new questions

To what degree would gaps across race, class, and gender close if we intervened to equalize a treatment variable? In the gap-closing estimand, I show how to answer this type of question in a causal framework. Drawing on epidemiological methods and sociological theory, I conceptualize race, class, and gender as labels that partition the population into subgroups. I show that this way of conceptualizing these categories yields new insights about how to close gaps.

A pivot away from regression coefficients and toward more general nonparametric estimands empowers social scientists to convey more information to readers under more credible assumptions. A methodological comment written with Brandon Stewart proposes a new visualization to summarize economic mobility.

Predictive estimands call for the direct application of data science

It is well known that social science models do not predict well. But is this just for lack of trying?

A new way of doing science. We collaborated with hundreds of social scientists and data scientists in a research design optimized for prediction. Teams trained predictive models on a standard social science dataset. We evaluated them on a holdout set locked away until the end.

We learned new things. The best predictive performance observed holds new weight because (1) it was evaluated on holdout data and (2) it represents the best out of many diverse attempts.

Descriptive estimands show where policies are failing

Housing eviction is more common than you think

Demographers often summarize events per person-year. By this metric, eviction is rare: only 2-3 % of households per year. 

A new goal. But often we care whether someone ever experiences an event over a longer period, such as at any point in childhood. It only takes one eviction to upend a child's life.

The goal matters. More than 1 in 4 children born into poverty in a large U.S. city from 1998 to 2000 experienced eviction by age 15.

Causal estimands prescribe policy solutions

Public housing protects families from eviction

 Public housing provides tenants with reduced rent as well as an internal grievance procedure to resolve conflicts with the housing authority. Does public housing reduce eviction?

A new goal. An explicitly causal estimand clarifies precise assumptions for observational data point to a policy solution.

The goal matters. In our target population, public housing reduces eviction from 11 percent to 3 percent. It is difficult to argue that this large difference arises from confounding alone: a causal effect is more plausible.

Interpretation of complex empirical quantities often requires sharpened theory

Cousins' incomes are sometimes similar. This does not imply a direct grandparent effect.

The cousins' incomes are sometimes remarkably similar. This might suggest something about how family background constrains life chances.

A new goal. What do we really mean by the "influence" of family background? We can formalize our theoretical model mathematically.

The goal matters. Several plausible theoretical models could generate any given set of sibling and cousin correlations.