Early Career Faculty Spotlight

AUGUST 2021: LAURA K. NELSON

Dr. Laura K. Nelson (she/her) is an assistant professor of sociology at the University of British Columbia. She uses computational methods – principally text analysis, natural language processing, machine learning, and network analysis techniques – to study social movements, culture, gender, and organizations and institutions. Substantively, her research has examined processes around the formation of collective identities and social movement strategy in feminist and environmental movements, continuities between cycles of activism, intersectionality, and gender inequality in entrepreneurship and STEM fields. Methodologically, she has proposed frameworks to combine computational methods and machine learning with qualitative methods, including the computational grounded theory framework and leveraging the alignment between machine learning and intersectionality research paradigms. She has developed and taught courses introducing undergraduate and graduate social science and humanities students to computational methods, undergraduate data science courses, and graduate-level sociological theory. She is currently co-PI on a large grant through the National Science Foundation to study the spread of gender equity ideas related to STEM fields through higher education networks. She has published in the American Journal of Sociology, Gender & Society, Poetics, Mobilization: An International Quarterly, and Sociological Methods & Research, among other outlets. Previously, she was an assistant professor of sociology at Northeastern University where she was affiliated with NULab for Texts, Maps, and Networks, the Network Science Institute, and the Women's, Gender, and Sexuality Studies program. She received postdoctoral training at the Berkeley Institute for Data Science and Digital Humanities @ Berkeley, and at the Kellogg School of Management at Northwestern University, where she was an affiliate of the Northwestern Institute on Complex Systems. She received her Ph.D. and M.A. in sociology from the University of California, Berkeley, and her B.A. in sociology from the University of Wisconsin, Madison.


1. Please describe your areas of methodological expertise and how you were trained in these areas.

I specialize in combining qualitative methods with computational techniques, including machine learning, natural language processing, and some network analysis techniques. I was a Ph.D. student when these techniques were just beginning to make inroads into social science fields, most prominently, at the time, linguistics. Most social science departments were not teaching computational methods, or the programming skills needed to employ them. Like many of my peers starting off learning computational methods during these early years, I cobbled together the best methods training I could through a variety of resources.

In addition to my B.A. from the University of Wisconsin, Madison I received a Concentration in Analysis and Research (CAR). Madison's CAR program was phenomenal, setting me off on my current path. In addition to three semesters of courses in social statistics, the CAR program also required a course in computing in the social sciences, where I was introduced to basic programming skills. While at UW I also worked for the Wisconsin Longitudinal Study (WLS), where I wrote SAS programs to translate their raw data into user-facing variables. In addition to a strong foundation in statistics, the CAR program and working for WLS helped me get comfortable using computers in research - enough of a foundation to explore computational methods on my own during my graduate training.

During graduate school at the University of California, Berkeley, I received deep and broad training in sociological theory and a wide variety of methods, including qualitative methods. The department's general philosophy to training PhDs, in which they encourage independent and ambitious dissertation projects, also gave me the space I needed to branch out of sociology and get training from more computationally minded fields. While I was away from California doing archival work, for example, I took Christopher Manning and Dan Jurafsky's online course Natural Language Processing (seriously one of the best courses you can take if you want to get into this field), which introduced me to using computers to analyze language. Back at Berkeley, I took a Digital Humanities course in the I-School with Marti Hearst, an expert in corpus-based computational linguistics. That course helped me integrate my statistics background, my theory and qualitative methods training, and my brief training in Natural Language Processing (NLP) and apply these different skills to the types of questions around social movements and gender I was interested in researching. This formed the basis for my dissertation.

I feel lucky that my education began with years of more traditional statistics and theory training, all rooted in sociology. I came to computational methods only relatively late in my education. I ventured into these methods not because they were trendy (they were looked at with a huge dose of suspicion in sociology at the time), but because of what I perceived as a genuine methodological need. I was gathering thousands of pages of texts produced by women's movements organizations over the longue durée: ~1840 to ~1980. Because language, in particular the meaning of words, shifts so much over time, traditional content analysis techniques are ill-equipped to chart changes and continuities over time, particularly around tricky-to-define sociological concepts such as collective identity. With my background in using computers in social science research, I was able to independently dip into the developing fields of computational linguistics and corpus analysis as an alternative to the time consuming and notoriously unreliable method of human coded content analysis. As I adapted corpus analysis/NLP to my own work, I mixed it with my training in theory and qualitative methods to track changes and continuities in feminist collective identity over time and across cities, in ways I'm confident I would not have been able to do with traditional content analysis methods alone.

Even though my computational training was much more patchwork than the training available today (through programs such as the Summer Institute for Computational Social Science - a wonderful program and a fantastic entry into the field!), I think I benefitted from my years of traditional training prior to my computational training. For young scholars, don't short shrift your disciplinary training. The computational stuff should complement, not supplant, deep disciplinary training in your area of sociology, whatever it might be.

2. Can you tell me about a recent methodological project of yours? What was most exciting to you in that project?

I've long been interested in studying how social movements have impacted discourse over long time periods. With recent, very public and politicized, debates about how we should teach history in the U.S. and Canada, I became interested as well in what parts of social movements make it into historical accounts, what aspects are left out, and how social categories around race, class, and gender are implicated in historical omissions. These types of broad questions, questions that require large amounts of data over long time periods, are exactly the types of questions I think computational methods can best help us answer. Using Wikipedia as a summary of popular history, and primary source material from different sections of women's movements, I am exploring how computational methods can help us evaluate the recall of history: of all the possible information that could be included in historical accounts of a social movement, what gets included and what is omitted? And what can we learn from patterns in these omissions? I have found phrase mining methods - methods to identify high-quality, relevant, or important phrases in a body of text - to be best suited to identifying information from primary historical sources that could be included in secondary histories. As computational text analysis techniques get more and more sophisticated, with the introduction of deep learning, neural networks, and transformer models greatly improving the accuracy of many machine learning methods applied to text, this project is reminding me that sociology should not forget the power of more simple - and more straight-forward - methods such as phrase mining, which are easier to interpret and can give us a surprising amount of insight into a large body of text.

3. What type of methodological work do you hope to see or expect to see in the future of sociology?

I would love to see the continued and increasing integration of computational and qualitative methods. Many of the new studies coming out using digital trace data, for example, would benefit immensely from incorporating qualitative methodology into their work. While digital trace data can provide a lot of data (millions of Tweets, comments, clicks, links, etc.), often the case is actually just one. If you're studying Twitter, that's an N of 1, Reddit is an N of 1, Facebook is an N of 1, and so on. Qualitative methodologists have done a lot of work thinking carefully about how to draw insightful, potentially generalizable, conclusions from a small number of cases, as well as how to build theory and insights from unusual cases (like Twitter) that don't represent the average or typical people in a population. On the flip side, qualitative scholars use a wide array of diverse approaches, and each way of doing qualitative research could benefit from incorporating computational methods, but in different ways. I would love to see more work developing frameworks that can assist qualitative research from all varieties and perspectives (ethnographies, interviews, participant observation) to take advantage of new sources of data, and to leverage computational methods to improve already existing approaches.

4. How do you think graduate students or early career scholars can improve their methodological work?

While this may sound counterintuitive, prioritize your substantive and theory training early in your education. The best methodological advances, in my experience, are those that result from true methodological dilemmas that arise when trying to answer difficult substantive, sociological questions. Knowing how to ask the right type of question comes from that deep disciplinary training. For your methods training, start broad: first understand the full range of methods available to you as a sociologist before going deep into any one method. By starting broad, you avoid the "when you only have a hammer everything looks like a nail" problem.

Then, as you're pursuing your own research and chasing after the answers to the questions that truly motivate and excite you, pay attention to moments when you're frustrated by the inadequacy of the existing methods or tools. If there's a moment where you think, "I wish there was a way to do X," lean into that. Is there a way to do X? If not, what adjustments to existing methods would you need to get there? What other tools, perhaps from outside sociology, could help you answer your question more effectively? Those are the exciting moments where methods can be pushed in new, productive directions.

5. What's next? What sorts of projects are you hoping to work on in the future?

I would love love love to see different types of research output formally published in top ASA publications (looking at you, Sociological Methodology and ASR). I'm super excited - and have been for some time - about executable notebooks (or computational essays) as an alternative to static papers. It would be amazing to see at least some research published in top journals in executable form, where I, as a reader, can immediately both reproduce the authors' analysis and play around with it, trying different parameters or approaches to the authors' questions and data. Or, at the very least, the online version of publications could include interactive visualizations that engage the reader more. Obviously, this doesn't work for all research, and definitely not for all data (privacy and ethics concerns are real!). But mixed in with the traditional paper - which admittingly is most appropriate for many types of research - I would really love to see more creative and engaging approaches to scholarly publishing.



JULY 2021: ALYASAH ALI SEWELL

Dr. Alyasah Ali Sewell (they/them/their) is Associate Professor of Sociology at Emory University and Founder and Director of The Race and Policing Project. A widely-published medical sociologist and social science research methodologist, they assess the political economy of race, neighborhoods, and health. Their research has garnered support and recognition from the National Institutes of Health, the Ford Foundation, the National Science Foundation, and the Baden-Württemberg Foundation, among others. In 2016, Planned Parenthood designated them “The Future: Innovator and Visionary Who Will Transform Black Communities.” They received postdoctoral training in Demography from the Population Studies Center at the University of Pennsylvania, their Ph.D. and M.A. in Sociology from Indiana University, and their B.A. summa cum laude in Sociology from the University of Florida with a minor in Women’s Studies.

1. Please describe your areas of methodological expertise and how you were trained in these areas.

I specialize in developing quantitative methods for studying systemic racism and, thus, fine tuning social science measurements of race, racism, and racial inequity. I focus primarily on exploiting nested research designs that were developed to improve the causal identification of neighborhood effects on psychosocial development in urban areas. Primarily, I pull on the capacities of mixed effect models, complex sample surveys, and synthetic cohort analysis to evaluate ethnoracial health disparities – that is, inequalities in life and death by race, ethnicity, nation, and religion.

I was really hand-raised by Indiana methodologists. My first teacher was David Heise, who specialized in causal analysis and developed the mathematical tools behind Affect Control Theory. We published a paper where I identified intracultural (within culture) differences in the emotional sentiments that Black and White youth attached to words, which was the first of its kind. I also studied under Quincy Thomas Stewart, where we laid out the fundamental methods for studying racial inequality. Those, I would say, were the bookends to my pursuit of a Ph.D. minor in Social Science Research Methods at Indiana, where I took the normal set of upper-level courses in quantitative analysis.

In so doing, however, I started taking courses at the ICPSR Summer Program in Quantitative Methods of Social Research, and would eventually become an instructor there. I took two short introductory courses on innovative data collection efforts that are the root of my research agenda – one on the Project on Human Development in Chicago Neighborhoods and the other on the Collaborative Psychiatric Epidemiology Surveys. Respectively, I received formal training in a wide range of mixed effects models to disentangle interdependent causal mechanism, and how to analyze and build cross-cultural social surveys that could both precisely and efficiently identify ethnoracial inequalities in health.

Shortly thereafter, I was fortunate to be invited to teach at the Mannheim Centre for European Social Research (Universität Mannheim) and took the opportunity to develop a methodological course on the science of studying race and racism – there multilevel models were also strongly favored, so I became even more appreciative of them. For the next four summers, I served inside the teaching core for ICPSR’s course, Methodological Issues in Quantitative Research on Race and Ethnicity, which was spearheaded by leading social survey methodologists from the Program for Research on Black Americans. I was a teaching assistant to the foremost research on Latino politics -- John Garcia, who was the Director of the Resource Center for Minority Data at ICPSR, and then I taught the course as its sole instructor during my last two summers of graduate school. I accepted a postdoc in the Department of Sociology at the University of Pennsylvania, where I further honed my study of demography and social change.


2. Can you tell me about a recent methodological project of yours? What was most exciting to you in that project?

My most recent methodological project was laying out a sampling strategy for studying LGBTQ+ women in the U.S. I partnered with the think tank, Justice Work, to field and administer the National LGBTQ+ Women*s Community Survey, which launched at the end of June 2021. We are conducting a comprehensive national study designed for people who have identified as women at any point on their journeys and want to share their experiences of centering women in their sexual, emotional, familial, and social lives. The most exciting aspect of this project is that we have developed an intersectional research design that will serve as a model for crafting nationally representative sampling frames for the study of hypermarginalized, vulnerable, and hard to reach populations. Because there is no demographic guide to precisely identify the location of LGBTQ+ women, we are creating it by drawing on a multistage sampling frame, which nests economic and gender strata inside age and ethnoracial strata. We may not be able to, at this moment, achieve national representative results about LGBTQ+ women, but we will be able to reliably identify differences within the population by wealth, income, and gender expression and that can also efficiently capture the full generational, racial, and ethnic nuances of an underserved population. Without precedent, we invite into community lesbian, bi, pansexual, trans, intersex, asexual, and queer women who partner with women; trans men who want to report on their experience of partnering with women when they identified as or were perceived to be girls or women; and non-binary people who partner with or have partnered with women.


3. What type of methodological work do you hope to see or expect to see in the future of sociology?

I would like to see more innovative use of comparative and historical data by this next generation of sociologists. There is so much information to be gleaned from primary documents, from a historicist view of causality, and from cross-cultural approaches to measuring group difference. There is data everywhere. There are tools from every sector of science. Yes, some should be retired, or elevated. But mostly it is a matter of fit and appropriateness. We need also to look more closely at the interconnections of institutions themselves -- It is impossible to forward research that can transform the systems of meaning that people attach to race and ethnicity without interrogating institutional gatekeepers, full stop. We also have to study the social fabrics

4. How do you think graduate students or early career scholars can improve their methodological work?

At the earliest stages of your career, flexibility is crucial. Change your dataset, your data sources, not just your concepts. Not just a data project "just like" the one you already know. Try to prove yourself wrong. Go from there.


5. What's next? What sorts of projects are you hoping to work on in the future?

I am building on work with The Race and Policing Project and the Du Boisian Scholars Network to build a data platform for the analysis of the health effects of redlining in Atlanta. In concert, we will overlay comparative historical databases that document urban housing and neighborhood development initiatives onto a composite of health outcomes and biophysiological mechanisms. This study links the sociocultural and political economic histories of the sprawling communities of Atlanta to the racial and economic divides that showed forth in Atlanta after the Civil War.


JUNE 2021: NATE BREZNAU

Nate Breznau is a postdoctoral researcher at the SOCIUM Center for Research on Inequality and Social Policy and a principal investigator in the German Science Foundation funded project "The Reciprocal Relationship of Public Opinion and Social Policy" at the University of Bremen. He is also a researcher at the university's Collaborative Research Center investigating the "Global Dynamics of Social Policy". His research focuses on social policy and public opinion in a macro-comparative perspective. He is also a supporter of open science acting as a BITSSS Catalyst and Alumni of the Wikimedia Open Science Fellow Program. His research spans social science disciplines and topics in particular related to social inequality, race and immigration, the role of government and methods for improving public opinion research and science in general. Occasional blogger and regular preprint poster.

1. Please describe your areas of methodological expertise and how you were trained in these areas.

With a bit of irony intended, you might call me something like a pantologist. I have now conducted research that is published or forthcoming using factor analysis, path modeling, causal inference, multilevel and longitudinal regression, event history analysis, simultaneous feedback models, qualitative comparative analysis and qualitative content analysis. I've analyzed data on 186 countries and I've analyzed data on counties in the single state of Michigan. I use the method that seems best suited to answer my research questions. The problem is that I am constantly chasing interesting new research questions, and as a result have not developed mastery in any one method. I was mostly a C student in BA methods courses and did not enjoy them. It wasn't until I met Jonathan Kelley and M.D.R. Evans during my MA studies that I really became fascinated with sampling and populations. They got me working with survey data, and the rest is history. After learning basic statistics and structural equation modeling practices with them I continued my learning mostly by playing around with data, replicating others work, reading books and articles, and occasionally taking courses. But I really am a learning-by-doing kind of student.


2. Can you tell me about a recent methodological project of yours? What was most exciting to you in that project?

For the past 3 years, I worked on a project that we called the "Crowdsourced Replication Initiative". With my co-principal investigators Eike Mark Rinke and Alexander Wuttke we got together 86 teams of researchers from around the world to replicate a study published in the American Sociological Review by David Brady and Ryan Finnigan (2014). There were two experimental conditions in this study. The first was that half the teams were given all the original replication materials and the other half were given only a methods and results description. Both groups had the same starting data. Not surprisingly the group with less transparent materials had a much harder time coming to the same results as the original study. Next, we asked them to design their own ideal tests of the hypothesis in the Brady and Finnigan paper that 'immigration undermines public support for social policy'. We introduced a second experimental treatment where one group engaged in an online deliberation forum. It was an exciting and terrifying project all at once. We did not plan for the amount of work it would take, and I spent many nights, weekends, and holidays qualitatively investigating the code submitted by the teams (in 5 different software languages) to identify 107 different decisions in their workflows, make sure the code was reproducible and to somehow put together the insane amount of information produced by this project into coherent findings. All this time I was fielding hundreds of emails from the teams, for example when their code didn't work or they were late submitting something. Once I spent an entire week going back and forth with one team only to discover that different default settings in Mplus v7 and v8 lead the same code to produce different results! The most exciting part about the project honestly is that it is basically finished now and we are just looking to get two papers published, and I can go back to my 'normal' areas of research in public opinion and social policy. At the beginning, we did a pre-analysis plan for the study and knew that we needed at least 40 teams to participate. It was really exhilarating then to get over 100 register to take part at first and have 86 complete the replication.


3. What type of methodological work do you hope to see or expect to see in the future of sociology?

All of the meta-science and specification analysis we had to do in the Crowdsourced Replication Initiative was completely new to me and I think it will become a really important part of sociology in the future. At least I hope. I also think it is time for sociologists and the American Sociological Association to start making their work reproducible and transparent. This is not just as simple as it sounds. Having a reproducible workflow requires understanding new methods and approaches to social science work. I am no guru in this area. I only recently started using GitHub and R Studio for example. But these tools and many others will really revolutionize the future of sociology. Again, I hope.

4. How do you think graduate students or early career scholars can improve their methodological work?

I think that the best way to improve methodological work depends on learning style. For me, just playing with data and trying to test hypotheses forced me to learn new methods. Others may be much more comfortable taking a well-structured course and following others' workflows and instruction. I guess graduate programs could really serve the students well by helping them identify their learning styles and offering ways to excel using the different learning styles. One of my favorite ways of learning and teaching is in small groups sitting at a computer together (or 'together' virtually) and cleaning and analyzing data. Jonathan Kelley and M.D.R. Evans started a "Stats Club" at the University of Nevada, Reno where we would meet and just basically play with data. A student might come to the club with a problem or question, or Professor Kelley would give us some challenge, and we would all work on it together. It was great. I also had a lot more free time then.


5. What's next? What sorts of projects are you hoping to work on in the future?

The more time I spend playing around learning methods, the more I am convinced that theory is the most important method. If you look at the small causal inference revolution that is starting to take hold in sociology, especially now that Felix Elwert is the new editor of SMR, you will realize that the only way we can truly know if something is a confounder or collider, or free from such bias, is theory. If we develop a causal model that says: if the lawn is wet and the roof is not wet, it rained, but if the lawn is wet and the roof is dry that it did not rain and instead the sprinkler was running, we need to be 100% sure that Michael Moore didn't show up and spray Flint drinking water on the lawn. Without strong theory, we can't make causal inferences, and without causal inferences, we are limited in our ability to truly understand problems. That doesn't mean predicted likelihoods are not useful to science and society. But being able to isolate causes is something like sociology 2.0, and I look forward to it. In addition, using technology like preprints (here's our crowdsource replication study https://osf.io/j7qta/) and social media (@BreznauNate and https://crowdid.hypotheses.org/author/crowdid) can really accelerate our progress in social science.

MAY 2021: ETHAN FOSSE


Ethan Fosse is an Assistant Professor of Sociology at the University of Toronto. Prior to coming to Toronto, he received his Ph.D. from Harvard University and worked as a Postdoctoral Research Associate at Princeton University in the Department of Sociology and Department of Politics, where he designed and implemented a series of open-source statistical programming workshops. Professor Fosse’s research focuses on using novel quantitative methods to understand social change. He is currently working on three interrelated projects: first, creating a new set of techniques for analyzing age-period-cohort data, with wide applicability in sociology and related fields; second, explaining social and cultural change, focusing on the religious and political views of recent birth cohorts in the United States; and finally, building off his methodological work on age-period-cohort analysis, examining the individual-level consequences of downward social mobility. His research has been published in a number of volumes and journals, with recent work appearing in Demography, Sociological Science, and the Annual Review of Sociology. He is also co-authoring (with Christopher Winship) a forthcoming book on age-period-cohort analysis to be published by Cambridge University Press. He currently teaches courses on quantitative methods, social change, and computational social science.

1. Please describe your areas of methodological expertise and how you were trained in these areas.

My main area of expertise is in age-period-cohort (APC) analysis. As it is conventionally understood, the goal of an APC analysis is to use time-series cross-sectional data to identify the distinct contributions of age, period (or survey year), and cohort (or birth year) on some outcome of interest. At first glance APC analysis would seem to be uncomplicated, but it in fact throws up a host of thorny conceptual and methodological issues. The most obvious problem is that one cannot put age, period, and cohort in a regression model and obtain a set of point estimates, because age is the difference between period and cohort. There is simply not enough information to estimate three separate coefficients for age, period, and cohort.

However, in many respects the so-called APC identification problem is the least interesting – and least problematic – issue with analyzing APC data. What I found fascinating in my own work is that, in developing methods to analyze APC data, I have had to grapple with “first-order” questions of relevance to a much wider set of problems. Most of these questions initially seem simple but upon reflection have answers that are not at all straightforward. For example, what does it mean for sociological variables such as age and cohort (or, similarly, race, social class, and gender) to have an “effect”? What forms of uncertainty should we be quantifying other than that arising from sampling variability, which is arguably the least of our concerns in many real-world examples? What does it mean for an estimator to be “unbiased” or “estimable” if unique estimates are impossible? How can we come to terms with the fact that, as a general rule, the more realistic a model, the more likely it is to be underidentified? To what extent do our models reflect social reality rather than impose particular, possibly unrealistic, understandings of social reality? Should we use different models for prediction than description or explanation and, if so, why? When are complicated models preferable to simpler, informal approaches -- or, to put it another way, can the statisticians save us from ourselves?

In short, part of the difficulty with APC analysis is that virtually all of the methodological issues, despite being superficially mathematical, are fundamentally interpretive, with few clear-cut answers. I suspect this is generally true for many methodological problems in sociology, which is why I like to say, modifying the aphorism about war and politics, that “sociological methodology is social theory by other means.” An additional complication is that being an “expert” in APC analysis is inherently oxymoronic, because it requires knowledge of a wide range of methods. It’s no exaggeration to state that APC analysts have deployed nearly every technique in the sociological armory, including ANOVA-style models, Moore-Penrose estimators, Bayesian techniques, multilevel models, time-series analyses, ridge and Lasso regressions, structural equation models, and so on.

While in graduate school I took the typical sequence of methods courses offered at Harvard and audited several others. However, I noticed people were calling me a “methodologist” so I jumped at the opportunity to teach a few courses in the statistics department, first as a teaching fellow and then as an instructor. Smart students asking tough questions were a gift, and prompted me to delve deeper into some methods than I would have otherwise. After graduate school, I took a position as a Postdoctoral Research Associate at Princeton, with the aim of furthering my knowledge of statistical methods, especially demographic techniques given my interest in APC analysis and the excellent demography program at Princeton.

However, most of my expertise is a byproduct of my long-term goal of understanding social change using time-series cross-sectional data, in particular the U.S. General Social Survey. Simply put, I’ve learned a great deal just by working with data and writing programming scripts, replicating other people’s analyses when I wanted to explore the messy details of a particular technique. There’s a lot to be said for this kind of pragmatic approach to learning methods and there’s a strong case to be made that it’s impossible to become a methodologist in sociology or demography without some experience with substantive applications.

One of my favorite quotes on methods is by the statistician M.G. Kendall (1968), who observed that, “although musicians are often precocious, poets never are.” He continued: “One can draw the same kind of distinction between mathematicians, who are usually precocious, and statisticians who, as statisticians, are not. There is a certain apprenticeship in handling real-life situations to be served before an individual is mature enough to tackle important statistical problems.” The same could be said for a methodologist in sociology or demography.


2. Can you tell me about a recent methodological project of yours? What was most exciting to you in that project?

Recently I published an article in Demography (“Bounding Analyses of Age-Period-Cohort Effects”) with Christopher Winship that developed a bounding approach to APC analysis. Drawing on the work of the economist Charles Manski, we show how one can use constraints implied by the data along with explicit theoretical claims to bound one or more of the APC effects. Furthermore, using data on prostate cancer and homicide rates, we demonstrate that bounds on these parameters may be nearly as informative as point estimates, even under relatively weak theoretical assumptions. In a related paper (“Bayesian Age-Period-Cohort Models”) I illustrate, using political party identification as an example, how the bounding approach can be extended to a Bayesian framework.

Intellectually, I found my work on bounding analyses exciting for two main reasons. First, the bounding approach is a generalization of an extraordinarily broad class of APC methods dating back to early work by Bill Mason and Steve Fienberg in the 1970s and 1980s. In fact, virtually any method that attempts to separate out distinct age, period, and cohort effects can be viewed as a special case of our bounding approach. This includes older techniques such as the equality constraints approach as well as newer methods such as the intrinsic estimator and hierarchical age-period-cohort model. Second, unlike most previous APC methods, our bounding approach incorporates a wider, more flexible class of constraints, including those based on assumptions about the overall shape of the age, period, and cohort effects. In principle this will enable researchers to adopt the weakest possible assumptions for a particular substantive application.


3. What type of methodological work do you hope to see or expect to see in the future of sociology?

We are in the midst of two ongoing “revolutions”: first, the rapid accumulation of new kinds of data, including temporal, relational, and textual, as well as the development of algorithmic techniques to extract meaning from such data; and second, the creation and application of methods for identifying causal effects from observational data. I believe sociologists are well-suited to take advantage of both of these revolutions, inasmuch as the former requires in-depth knowledge of data structures and the latter relies on substantive expertise and sociological theory. We are certainly at a comparative advantage in these respects over, say, computer scientists and statisticians.

In my own research on APC models, I’ve gained a lot of insight from recent developments in causal inference, particularly work on stochastic counterfactuals and mediation analysis. In fact, some of the most innovative methodological research on causal inference is coming out of biostatistics and epidemiology. I don’t think it’s an overstatement to claim that recent advancements in mediation analysis have laid the groundwork for a 21stcentury version of path analysis. Sociologists would do well to capitalize on these developments.

4. How do you think graduate students or early career scholars can improve their methodological work?

The most important practical suggestion I have for graduate students or early career scholars is to learn a statistical programming language, such as R or Python. It’s more difficult to do “easy” tasks in R or Python, such as recoding a variable, but the difficult tasks are much, much easier. Furthermore, there are now very welcoming online communities that can help any budding scholar with just about any data wrangling or debugging problem.

The most important general advice I have for new scholars is to clearly distinguish, in your own work, whether or not the analysis is descriptive or causal. To borrow the language of Judea Pearl, the former entails “seeing” while the latter entails “doing.” As an example, consider the practice of foot binding, which was a popular Chinese custom of breaking and wrapping the feet of young women. Let denote foot size and hand size. Suppose we were to fix (or set) foot size so that it were smaller through some intervention, such as foot binding. Using Pearl’s do-operator, we can write such an intervention as . In general, we would have no reason to believe that physically modifying foot size would also alter anyone’s hand size. That is, the distribution (and hence the expected value) of hand size () would remain unchanged after intervening to modify foot size. However, people with smaller feet will in general have smaller hands. Thus, we would expect the distribution (and hence the expected value) of hand size () to vary, perhaps considerably, across different subpopulations defined by foot size (). In short, we would expect “seeing” to be very different from “doing.”

It is readily apparent from this example that “seeing” and “doing” are two fundamentally different operations entailing distinct processes and purposes, one descriptive and the other causal. Unfortunately, this distinction is lost on most applied researchers because the workhorse tools of quantitative modeling, such as regression analysis, make no distinction between these two interpretations. The fact that regression coefficients – or any number of other parameters from quantitative models – are agnostic in this regard is among the least-understood but most important insights in all of the social sciences. If we observe a set of individuals, fit a model, and obtain a parameter, the choice is ours whether or not we interpret the parameter in terms of “seeing” or “doing.” In any observational dataset, the “doing” interpretation necessarily relies on theoretical assumptions external to the data, such as assumptions about no unobserved confounding. By contrast, the “seeing” interpretation requires fewer assumptions, because it entails descriptive comparisons of distributions across subpopulations.


5. What's next? What sorts of projects are you hoping to work on in the future?

My current work entails developing new techniques for APC analysis that are based, as closely as possible, on theoretical ideas from Norman Ryder’s writings. The modern origins of APC analysis can be traced to Ryder’s 1965 article on cohort analysis (“The Cohort as a Concept in Social Change”). Although Ryder’s ideas have been incredibly influential, he never actually explained how to conduct cohort analysis! In fact, his most famous article is notable in that it contained essential no technical details, unlike most of his other work. As a result, he left generations of sociologists to guess what he meant and naturally they came up with their own interpretations, for better or worse.

In light of this gap in the APC literature, I’m currently working on two papers that attempt to revisit and resurrect Ryder’s vision of cohort analysis. In the first article, drawing on an archive of unpublished papers, memos, letters by Ryder and others, I outline a number of broad principles for APC analysis. What emerges from Ryder’s corpus is a quite different approach to APC analysis than what has prevailed over the past fifty years. In the second paper, drawing on the insights derived from my archival research, I outline how one can achieve Ryder’s vision for cohort analysis using various quantitative models.

Ultimately, my hope is that these papers will provide some practical guidance for sociologists wanting to examine social change in a rigorous, transparent way. It is undeniable that APC analysis has frustrated generations of sociologists. As Otis Dudley Duncan, arguably the greatest quantitative sociologist of the 20th century, remarked: “Such data throw up a host of tantalizing clues, but I am never clever enough to formalize the interpretations they seem to suggest” (unpublished letter to Norman Ryder, 1981). Yet, I’m optimistic that in the coming years we will finally have some degree of consensus on what can and cannot be known from APC data, as well as agreement on what methods will enable applied researchers to derive meaningful, substantively-important results on any number of outcomes.

APRIL 2021: ELIZABETH WRIGLEY-FIELD


Elizabeth Wrigley-Field is Assistant Professor of Sociology at University of Minnesota. She is a formal demographer and a sociologist. Her work integrates demographic methods, designed to shift perspectives between population-level patterns and individual-level transitions between social statuses, with a sociological approach to the study of inequality, in which multiple dimensions of stratification interact in specific settings. She uses formal, statistical, and simulation analysis to reveal new challenges in recovering information about inequality in the presence of such selection. Her work explores how demographic theory can be revised to incorporate more substantively realistic models of heterogeneity and inequality within populations. Her work brings formal demographic techniques to the sociology of inequality and sociologically informed concerns about hidden dimensions of racial inequality to formal demography.

1. Please describe your areas of methodological expertise.

I’m really fascinated by two kinds of methods problems. First, questions that involve shifting back and forth between different levels of aggregation and understanding how complex causal processes play out across several levels simultaneously.

One example is understanding how racism affects health over the whole life course. Something that makes this a hard problem is that racism makes people die younger than they should have, which changes who’s in the population. Put simply, to live to an old age as a Black person in the conditions that this country puts Black people in requires you to be extraordinary in a way that simply is not required of white people to live so long. So then when you compare Black and white people at an old age, it’s an intrinsically biased comparison: our measures of disadvantage inherently understate the consequences of racism because of the way that racism matters not just at one moment, but repeatedly across the entire lifespan. A lot of my work has been exploring ways of trying to understand that macro-level population selection process and see the micro-level consequences of racism without this distortion.

This kind of problem is a good match for me methodologically because I’m a formal demographer and the formal demography toolkit really excels at iteratively shifting back and forth between individual and population perspectives—although sometimes our methods also impose assumptions that simplify the relationship between those perspectives too much, and some of my work explores ways to break out of those assumptions.

The other kind of problem that fascinates me is when there’s a severe gap between what we want to know and what we can measure, so making progress involves a lot of creative triangulation between different kinds of data and analyses.

As an example of that kind of problem, I’m wrapping up a project (with a big research team, co-led by an incredibly talented Berkeley grad student named Martin Eiermann and me) analyzing racial disparities in the 1918 flu. Some of the hypotheses we explore that I’m especially excited about are in the realm of “social immunology,” involving things like long-term consequences of early-life exposure to an 1890 flu pandemic, which would’ve had really distinctive racial patterning because of the very particular social histories and migration patterns that white and non-white cohorts had at that time.

For a whole slew of reasons, it’s hard to get direct data on what we want: we certainly don’t have much direct virology on the 1890 flu; we don’t even have the 1890 Census because it was destroyed in a fire; the 1920 Census reflects a lot of population distortion from the flu itself; it’s hard to link women over time because of gendered patterns of name change… all kinds of things. So we had to think hard about what would need to be true if this hypothesis were true, across lots of different domains—migration patterns, historical flu attack rates, mortality rates conditional on unobserved exposures—and find ways to “suggestively test” each one, mixing empirical data with simulations.

This kind of work is really, really fun for me. It’s a giant logic puzzle, but it’s a lot more open-ended than most of the logic puzzles I ever had in school (and I took a lot of logic classes!).

2. Can you tell me about a recent methodological project of yours? What was most exciting to you in that project?

Last year I published an article (“Why Individual Dimensions of Frailty Don’t Act Like Frailty: Multidimensional Mortality Selection and the Black-White Mortality Crossover”) that argued that widely used demographic models of unobserved heterogeneity can’t speak to the most common empirical situation for mortality researchers: the situation where some important population heterogeneity is observed and some is not. I show that, in this situation, neither observed nor unobserved heterogeneity will necessarily behave like classic “frailty” models predict.

The upshot is that if we want to understand how, say, racism and poverty interact to raise the risk of dying, we have to model the intersectional risks directly; we can’t infer them from models developed for simpler contexts.

This paper was fun because the question is very narrow—do these models behave the way we assume they do?—but what I got to do in answering it is very broad. The main analysis is just math, but to make sure that the math was answering the right question, I had to think a lot about what kind of counterfactual population histories make sense, how to decide what’s a “realistic” parameter space for parameters that are intrinsically unobservable, all sorts of stuff like that.

3. What type of methodological work do you hope to see or expect to see in the future of sociology?

I want sociology to take descriptive work seriously. Some of the research I’m proudest of is purely descriptive. For example, last summer I published a paper (“US Racial Inequality May be as Deadly as COVID-19”) estimating that, for U.S. white mortality in 2020 to rise to the level of the best ever recorded U.S. Black mortality, there would need to be 400,000 excess white deaths in 2020. In other words, white mortality during the COVID pandemic is probably still less than Black mortality has always been, every single year.

The argument that I made around this work is ethical, not descriptive. I argued that, in the same way that we rethought the foundations of how our workplaces, our economy, our family and personal lives, our movement patterns, our schooling, and everything else work in order to try to limit the pandemic, we should be just as willing to embrace disruptive, transformative change to combat racism.

But the empirical piece is simply giving a new metric for describing the size of the Black-white mortality gap that exists every year in the United States. I think we often give short shrift to the value of simply describing the world in ways that help us to really see it.

It’s always fun to have a flashy new method, but often I think we make the most progress by using really basic methods well. Right now, I’m studying COVID mortality in Minnesota. There’s a strong association between COVID and Alzheimer’s on Minnesota death certificates. How much of that is because Alzheimer’s is a medical or behavioral risk factor for COVID, and how much is because the way we care for Alzheimer’s patients (i.e., in big institutional settings whose staff members are paid so little that they work multiple jobs) put people with Alzheimer’s at particular risk?

Clinical researchers will be able to offer one kind of evidence to answer that. Sociologists and demographers who are really attuned to asking, “Which population’s risk are we trying to describe?” will bring something else to that question. I don’t think anything we need to do to understand this is methodologically fancy. It’s just difficult, and we need to think hard to make sure that we set up the problem right, so that the question we answered was the question we were actually intending to ask.

4. How did you get trained in your methodological expertise? How do you think graduate students or early career scholars can improve their methodological work?

I took a ton of quantitative classes in grad school, but once I had that foundation, a lot of my “training” comes from noodling around with stuff.

My first publication in grad school came about because, in pursuit of another idea (which I realized later on wasn’t a very good idea), I decided I needed to understand some articles really well and so I started writing equations to reflect what I thought was happening in them, and eventually I realized that things I’d assumed about how the models worked were wrong. Then I realized that everyone was making those assumptions, not just me! But that was something I stumbled onto when I was just trying to understand someone else’s work in depth.

Some work I’m really excited about now came from an insight I had way back when I was studying for my prelims, and the reason I had that insight was that I studied in a very self-indulgent way where I took the prelim as a license to spend a lot of time mucking around with things that seemed interesting, far in excess of what was likely to help me on the exam. I worked though the Preston, Heuveline, and Guillot demography textbook in a lot of depth, and because it was so present in my mind, I was able to recognize an equation from one context when it came up in a very different form somewhere else, which got me thinking. So probably the lesson there is to make time to learn stuff, even when the payoff isn’t totally obvious (and maybe even when you’re using it as an excuse to put off studying for a less interesting part of your prelim). It takes a really, really long time!

Relative to other formal demographers, I don’t have a traditional background—I have a lot less math training than almost anyone else in this field (where many people worked in math or engineering before making their way to demography). But I bring something to the table because I ask interesting questions, I try not to take my own answers at face value, and I’ve worked really hard to develop enough math to be able to pursue my own goals.

So it’s really important to me to tell students that, even if you don’t have the traditional background of a methodologist, there is a place for you in this field. In fact, we need you, because we need people with different perspectives to make progress on the hardest problems.

5. What's next? What sorts of projects are you hoping to work on in the future?

Right now I’m on a crusade to convince people that a concept called “length-biased sampling” should be considered a foundational methodological concept in the social sciences. Length-biased sampling is when we see units in proportion to their size, and it comes up all over the place in epidemiology, demography, criminology, and network science, to name just a few.

This is the insight I mentioned that I had the core of long ago, when studying for my prelims: “Lifespans cluster time the way that families cluster children.” I taught a workshop and wrote a popular-press treatment to start thinking about it, and then a few years later, Dennis Feehan and I started working on it in earnest. (Dennis is a perfect coauthor for this because he has thought incredibly hard about sampling, as well as being an easy-to-work-with mensch.)

We just published an initial piece and have more on the way, and the bit I’m especially excited about is that I realized that you can use length-bias math to find some very cool new ways of describing the population’s present as a function of its past and future. These measures relate core demographic concepts to one another in ways that really surprised me.

In methods research, that’s what I’m looking for most of all: ways to shift our perspective, to see something that we thought we already understood in a strange new light, to understand how we could see something familiar as a case of something else.


MARCH 2021: MARIA ABASCAL


Maria Abascal is an Assistant Professor of Sociology at New York University (NYU). She received her PhD in Sociology and Social Policy from Princeton University in 2016. Broadly, Maria’s work deals with intergroup relations and boundary processes, especially as they pertain to race, ethnicity and nationalism. Most of her research explores the impact of demographic diversification––real and perceived––on intergroup relations in the United States. This work draws on a range of quantitative methods and data sources, primarily lab and survey experiments.


I am currently working on a series of papers about how people evaluate diversity. “Diversity” is a term that people commonly, perhaps increasingly, use. This term, however, can refer to different things. For example, diversity can refer to different attributes: race, gender, class, etc. In addition, even when diversity refers to one attribute, like race, it can refer to both heterogeneity, i.e., mixture, or to the representation of disadvantaged groups, e.g., percent Black. In a working paper with Flavien Ganter (Columbia University), we examine the census tract attributes––racial and economic––that predict Chicago area residents’ decision to describe their neighborhoods as diverse. For a forthcoming Science Advances paper with Janet Xu (Princeton University) and Delia Baldassarri (NYU), we fielded a conjoint experiment to study which dimension of race––heterogeneity and/or non-White representation––US Americans use to decide how diverse a neighborhood is. By linking assessed diversity to objective group properties, both studies depart from the approach of prior interview research, which typically asks people what diversity means in the abstract. The goal is to shed light on the multiple meanings of the term “diversity,” which is used in contradictory ways, even in academic work, where the effect has been to promote premature and harmful conclusions about the alleged negative effects of “diversity.”

These projects are part of my ongoing research on the relationship between racial diversity, on the one hand, and trust and cooperation, on the other. With generous funding from an NSF CAREER award, the next phase of this research will use incentivized experiments to capture prosocial behavior in NYC neighborhoods characterized by both racial diversity and high levels of collective action. In this and other work, I aim to use innovative methods that minimize the gap between theory and empirics. As I strive to teach the students in my research and experimental methods courses, methodological

FEBRUARY 2021: RAVARIS MOORE


Ravaris Moore is Assistant Professor of Sociology at Loyola Marymount University in Los Angeles, California. Professor Moore is a quantitative sociologist with training in the fields of Social Stratification and Social Demography. His work employs quantitative methods with large-scale microdata to explore questions of inequality at the intersection of race and ethnicity, education, and health. His present research studies the effects of gun-violence exposure on the educational outcomes of students attending California public schools. Work with co-authors studies heterogeneous effects of parental divorce on the educational attainment of children.


Professor Moore earned his B.A. at Morehouse College with a double major in Mathematics and Economics. He completed his doctoral studies in Sociology, as well as M.A. degrees in Sociology and Economics at UCLA. Prior to matriculating at UCLA, he contributed to several national evaluations as a Research Programmer at Mathematica Policy Research, Inc. in the areas of education, health, and child and family well-being.

  1. Please describe your areas of methodological expertise and how you were trained in these areas?

My methodological toolbox developed over many years of technical coursework and applied research experience. Undergraduate training in mathematics provided a foundation for understanding the language of quantitative reasoning. Graduate coursework in statistics, econometrics, demographic methods, and event history analysis provided familiarity with the workhorse models of empirical work. This training, coupled with a great deal of experience working with senior colleagues on both policy research and academic research, has left me prepared to work through the methodological challenges encountered in empirical research.

One of my main areas of methodological interests include heterogenous effects and the general question of understanding how the same event affects different people in different ways. I am also interested in mediation models as a means of understanding why we observe effect heterogeneity.

  1. Can you tell me about a recent methodological project of yours? What was most exciting to you in that project?

Thus far, all of my projects have had significant methodological components. Recently, I’ve been working on a Stata package that executes an iterative propensity score search procedure described in Rubin and Imbens (2015). On another project, I am comparing my approach to estimating unobservable thresholds in observational data to an alternative approach that is common in the economics literature. In other work, I recently obtained new results after modifying my approach to operationalizing the main independent variable so that the empirical approach better aligns with the paper’s theoretical model. Something interesting tends to emerge in every project.

I find all of my project’s exciting for similar reasons. They all address important questions; and they all challenge me to first understand the data’s relevant dynamics, and then present the most well-supported and comprehensive narrative that the data permits. While this process differs on every project, it is always an exciting puzzle, and it is one of the reasons that I love research.

  1. What type of methodological work do you hope or expect to see in the future of sociology?

Methodologists are placing a greater emphasis on extending widely used methods and developing models that better reflect assumptions that are appropriate for sociological research. I expect this work to become more valued as it becomes increasingly clear that the benefits of an expanded technical toolbox are worth the costs of learning and adopting new methods.

I also expect graduate programs to continue enhancing their methodological course sequences to ensure that new PhD’s have the technical training required to continue to develop new tools.

  1. How do you think graduate students or early career scholars can improve their methodological work?

For your work in general, do not put the paper before the result. In other words, before prioritizing the completion of a paper, take the time to investigate your question thoroughly and gain a clear understanding of the dynamics in your data. When you find a result that is truly worth communicating, a good paper will almost write itself.

Useful methodological advancements tend to grow from the needs of good empirical work. Start with whatever interests you and allow your methodological innovations to develop from there.

If you are still in graduate school and you have an interest in methodology, seek coursework that provides experience with linear algebra and asymptotic theory. Matrix algebra is the language of estimation and asymptotic arguments are fundamental for any new estimator.

When you work with new packages, read the documentation and educate yourself on the unfamiliar details. If possible, work through a semi-manual example in a program built for matrix operations to confirm that you understand the estimation process.

Work consistently. Work with colleagues who have complementary skill sets. Welcome constructive criticism at every stage of your work. Give your work time to mature.

  1. What's next? What sorts of projects are you hoping to work on in the future?

I am presently in my first year of a Presidential Postdoctoral Research Fellow in the Princeton University Department of Sociology. I am taking this opportunity to fast-track several interesting projects. I also look forward to developing projects with colleagues at Princeton and throughout the northeast. Eventually, I hope to gain access to Internal Revenue Service tax data to assess the role of tax law in the varied rates of intragenerational wealth mobility among different subgroups of tax filers.

JANUARY 2021: ZACK ALMQUIST


Zack W. Almquist is currently an Assistant Professor of Sociology and Senior Data Science Fellow in the eScience Institute at the University of Washington, where he holds affiliation with the Center for Studies in Demography and Ecology (CSDE), the Center for Statistics and the Social Sciences (CSSS), Urban@UW and the Center for Environmental Politics. From 2018-2020 he was a Research Scientist on the Demography and Survey Science team at Facebook, Inc; 2017-2018 he was a Visiting Scholar in the Graduate School of Education at Stanford University; and 2013-2018 he was an Assistant Professor of Sociology and Statistics at the University of Minnesota.

Prof. Almquist currently serves on CSDE’s Executive Committee, and as CSDE’s Core Training Director and co-chair of CSDE’s Primary Research Area - Demographic Measurements and Methods (with Adrian Raftery). He leads the Computational Demography Working Group which meets on Mondays at 4:30-5:30 PM virtually. He is also the Training Director for the OBSSR T32 in Data Science Training in Demography and Population Health at the University of Washington. He serves on the Editorial Boards for the journals Social Networks, Sociological Perspectives, Population and Environment, and Sociological Methodology; and as the Secretary Treasurer of the American Sociological Association’s Section on Mathematical Sociology. Almquist’s dissertation won the Outstanding Dissertation Award from the American Sociological Association’s section on Mathematical Sociology, and he is a recipient of the Army Research Office’s Young Investigator Award. He has received other awards including the Best Methodological Poster from the Political Networks Conference and UCI’s A. Kimball Romney Award for Outstanding Graduate Paper. Prof Almquist’s research has been funded by the National Science Foundation, the Army Research Office, the National Institutes of Health, the University of Minnesota, and the University of Washington.

His research centers on the development and application of mathematical, computational and statistical methodology to problems and theory of social networks, demography, education, homelessness, and environmental action and governance. Currently, his research program is focused on understanding, modeling, and predicting the effects that space (geography) and time have on human interaction (communication or needle sharing) and social processes (information passing or disease transmission). Dr Almquist’s research has been published in highly regarded peer-reviewed journals such as the Proceedings of the National Academy of Sciences, Journal of Computational and Graphical Statistics, Sociological Methodology, Mathematical Population Studies, American Journal of Human Biology and Political Analysis.

Research Statement

My research builds on concepts and methods developed for the study of social networks. Networks offer a powerful and compelling framework for understanding fundamental social relations, whether these are the relations driving an individual’s life and success (e.g., advice, friendship) or the relations governing large organizations (e.g., informations passing between FEMA and local governments) – a central theme of sociological inquiry. While these tools are powerful in their combination of generalizability and precision in measuring our social world, too often they are limited in the contextual information they include. For example, network researchers have historically modeled network data using only formal features of the network, such as the size of an individual’s friend group, while ignoring other important characteristics such as how often the friends interact (time), and how closely they live to one another (space). This limited use of contextual information suggests a problem with much of the social network literature in that this area of the field often seems to overlook the enormous impact of our environment on our behavior (e.g., how space often limits who we meet which can then have an effect on whom we might end up spending our lives together with, etc.). By addressing contextual mechanisms through the powerful social network lens, we can improve both our understanding of social processes (e.g., information passing) and social action.

Methodology Questions

  1. Please describe your areas of methodological expertise and how you were trained in these areas.

My primary area of methodological expertise is in measurement, survey methods, statistics, computational methods (big data, data science, machine learning) and demography.

My formal background, as in degrees, are in mathematics, statistics, demography, and social network analysis and sociology. Over the course of my education and expertise development I h had the great pleasure to intern at the National Opinion Research Center at the University of Chicago and participate in the NSF/NAS funded Young Scientist Summer Program with the Global Health Initiative and World Population Groups at the International Institute for Applied Systems Analysis in Vienna, Austria. I have also continued my growth and expertise development though my professional career as a graduate student research at UCI with Carter Butts, as Research Scientist at Facebook and as faculty member in Sociology and Statistics at the University of Minnesota, and as a faculty member in Sociology and Senior Data Science Fellow at the University of Washington. As I have progressed through my career I have had the chance to leverage my formal training into novel developments and growth in my own understanding through research projects and teaching.

2. Can you tell me about a recent methodological project of yours? What was most exciting to you in that project?

Most recently I have been working on improving and understanding the spread of COVID-19 in human populations. This has taken on two methodological hats: (1) improving and developing network models for understanding the spread of COVID-19 (see https://doi.org/10.1073/pnas.2011656117 and https://doi.org/10.1002/ajhb.23512 for the first couple papers in this line of work); and (2) longitudinal survey data collection on changes in behavioral dynamics in response to a global pandemic (see this NSF award https://www.nsf.gov/awardsearch/showAward?AWD_ID=2028160). I view these two activities as the core of my future methodological work in understanding disease spread and behavioral change in human populations. This basic framing has underlied my work and view on progress in methodology -- one should first be driven by the sociological problem (how does COVID-19 spread through human populations and how does a major pandemic change our behavior and response to classic questions on prosocial and antisocial behavior); followed up with plausible models and hypotheses, and last data to test these ideas and help inform future research agendas.

Another project I have been working on is looking to understand the social structure and support of Homeless populations and its impact homelessness. This work has begun by understanding what publicly available data there is available (see https://doi.org/10.1080/08898480.2019.1636574) and how far I can get with network modeling and simulation analysis (see https://doi.org/10.1177/2399808318785375). I am currently in the process of applying for grants and developing a survey to ask the more basic question I started with around the social support and structure of homeless communities and how this impacts on the lived-experience of people experiencing homelessness.

3. What type of methodological work do you hope to see or expect to see in the future of sociology?

I am really excited about the intersection of data collection (e.g. surveys and interviews), administrative data and trace data to answer basic sociological questions about inequality, groups, etc. This is a really exciting area methodologically, because combing these data in a meaningful and impactful way requires pulling in skills, training and knowledge from classic survey methods, computer science, statistics and the social sciences.


  1. How do you think graduate students or early career scholars can improve their methodological work?

Looking back at my own experiences I think making sure you have solid training in the basics such as data collection and measurement, mathematics, statistics and programming (almost all methods development now requires some level of programming). Then finding a good mentor and good project to develop the necessary skills to not only produce a new methodological development, but to learn how to communicate to both the methods community and the substantive community the development is targeting.


  1. What's next? What sorts of projects are you hoping to work on in the future?

I am looking to develop my work in and around themes of impact of climate change on organizational and individual behavior through survey and text methods. I am also looking to expand my methodological work in network measurement methods with application to understanding homelessness and prevention; and also developing new methods for using big data for understanding how social care workers are feeling during this pandemic and its impact on their quality of life. I also plan to continue my work in understanding and modeling spatial and temporal networks. Last, I am very excited to be developing and running a graduate program meant to train future researchers in data science, demography and health.

DECEMBER 2020: EMMA ZANG


Emma Zang is Assistant Professor in the Department of Sociology at Yale University. She received her PhD in Public Policy in 2019 and MA in Economics in 2017, both from Duke University. As a demographer, her research interests lie at the intersection of health, family, and inequality. Her work aims to improve the understanding of 1) how early-life conditions affect later-life health outcomes; 2) social stratification and health; 3) spillover effects within the household exploiting policy changes. She is also interested in developing and evaluating methods to model trajectories and life transitions in order to better understand how demographic and socioeconomic inequalities shape the health and well-being of individuals from life course perspectives. Her work has appeared in journals such as the American Journal of Sociology, Demography, Social Science & Medicine, Journal of Marriage and Family, and International Journal of Epidemiology. Her research has been reported by major media outlets in the United States and in China, such as CNN, the Boston Globe, The Economist, and ThePaper.cn.

Research Description

I develop quantitative methodological tools that can be employed to understand a wide range of sociological questions. I am particularly interested in developing and applying methods to model trajectories and life transitions in order to better understand how demographic and socioeconomic inequalities shape the health and well-being of individuals from a life course perspective. I have been working on two methods so far: the Bayesian multistate life table method and the Bayesian group-based trajectory model.

My work on Bayesian multistate life tables (MSLT) with Scott Lynch (Professor of Sociology at Duke University) develops an extension of the Bayesian approach introduced in Lynch and Brown (2005) to high dimensional state spaces with partially absorbing states. MSLTs are an important tool for producing easily understood measures of population health. Among all extant methods, the Bayesian approach developed by Lynch and Brown (2005) offers several key advantages over other approaches, including the ability to incorporate prior information, direct and probabilistic interpretations of estimates, and the flexibility to incorporate model changes to handle idiosyncratic data. However, this approach has been limited to only two states, such as “healthy” vs. “unhealthy”, and cannot handle partially absorbing states. A methodology paper from this project has received R&R from Demography. We are finalizing our code and developing an R package to promote a wider usage of our method. To deal with the high-dimensionality problem computationally, we take advantage of a recent methodological advancement in the statistics literature: the estimation of Bayesian multinomial logit models using Polya-Gamma latent variables. This technique makes the MCMC algorithm in our R package more computationally efficient than alternative MSLT packages. Applying this method, we have demonstrated geographical variations in the impact of diabetes on population health in the U.S. Using data from the Health and Retirement Survey, we examined disparities in life expectancies at age 50 between populations with and without diabetes by birth region and current residence (Zang et al. 2020). We also have a couple of other applications of this method that are under review.

With my collaborator Justin Max (Lead Data Scientist at Consumer Edge Research, LLC), I also developed a Bayesian approach to estimating group-based trajectory models. We draw on recent advancements in the statistics literature on Bayesian model averaging in finite mixtures of regressions to provide an efficient variable selection method for group-based trajectory models (GBTMs) through model averaging. The GBTM (Nagin, 2005) is one of the most widely used models by social scientists to model life trajectories. Utilizing a finite mixture of regressions approach, the model captures group-level heterogeneity in the trajectories of outcomes over a period of time. Our model averaging technique yields better predictions than selecting a single optimal model and saves researchers a considerable amount of time by relieving them of the need to fit dozens, or even thousands, of models and compare their fits. From a computational perspective, our customized Gibbs sampling code uses a series of data augmentation steps achiever faster convergence than what would available in more general Bayesian estimation packages such as Stan or JAGS. A manuscript for this research is forthcoming in Psychological Methods and a working paper version can be found at SSRN (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3315029). Our R package is available in GitHub at https://github.com/jtm508/bayestraj.

Currently, my research team is working on a couple of COVID-19 related projects, which apply computational methods to handle big data. In one project, using micro data from millions of anonymous mobile devices in the U.S., we constructed a social distancing score capturing the percentage of devices staying at home for each county averaged during the period Jan-Sep 2020. To reduce bias related to sampling and small samples, we performed a stratified reweighting procedure and a Bayesian Hierarchical model to produce more reliable estimates. In another project with Daniel Karell (Assistant Professor of Sociology at Yale) and Caglar Koylu (Assistant Professor of Geography at University of Iowa), we aim to examine the role of social media in how the Black Lives Matter protests in summer 2020 affect social distancing behaviors. We have constructed a continuous time-varying measure of social media usage, which indicates how widely a BLM protest was discussed on Twitter, by conducting text analysis using a geolocated dataset including all tweets about BLM protests from May 25 to July 25, 2020.

References

Lynch, S. M., & Brown, J. S. (2005). A new approach to estimating life tables with covariates and constructing interval estimates of life table quantities. Sociological Methodology, 35(1), 177-225.

Nagin, D. S. (2005). Group-based modeling of development. Harvard University Press.

Zang, E., Lynch, S. M., & West, J. (2020). Regional differences in the impact of diabetes on population health in the USA. J Epidemiol Community Health, online first. http://dx.doi.org/10.1136/jech-2020-214267.

Zang, E., & Max, J. (2020). Bayesian Estimation and Model Selection in Group-Based Trajectory Models. Psychological Methods, forthcoming.

NOVEMBER 2020: JULIA BURDICK-WILL

Julia Burdick-Will is Assistant Professor of Sociology and the School of Education at Johns Hopkins. She arrived at Hopkins in 2014. She received both her BA and PhD in Sociology from the University of Chicago and spent two years at Brown as a Postdoctoral Research Associate in the Population Studies and Training Center. Her article on school violence won the 2014 James Coleman Award for the best article in the Sociology of Education from the American Sociological Association. Her research combines the sociology of education and urban sociology to study the roots of educational inequality and examine the dynamic connections between communities and schools that shape opportunities to learn both in and out of the classroom. She has studied the effects of concentrated neighborhood poverty on cognitive development, the geography of elementary school openings and closings, and the impact of neighborhood and school violence on student test scores, the national distribution of school quality across urban, suburban, and rural areas, and the degree to which the increased availability of school choice may lead to the fragmentation of social life in poor neighborhoods.


1. Please describe your areas of methodological expertise and how you were trained in these areas.

My methodological work has focused on causal inference, spatial data analysis, network clustering, and the use of administrative data. I was fortunate to be able to take some really excellent courses in these areas in graduate school, both in the Sociology department, but also in Biostatistics. In fact, even when I didn’t have any more required coursework, I continued to audit methods and programming courses until I graduated. I have also had to learn a lot on my own through trial and error, reading, and learning from example. Most of my formal methods training was with traditional survey data, but my dissertation ended up using population-level enrollment records from the Chicago Public Schools and incident-level crime data from the Chicago Police Department. These data are really different from sample survey data. They include a full population, but only the measures of the behavior that was recorded by that agency. They rarely have the kind background measures that we are used to being able to control for with extensive surveys. Figuring out how to get the most out of these data has been an exciting challenge in all of my work and forced me to constantly learn new methods.

2. Can you tell me about a recent methodological project of yours? What was most exciting to you in that project?

My most recent project is an example of how I have tried to exploit the benefits of population-level administrative data. In an article recently published in ASR, we created a network of school ties based on student mobility and used clustering algorithms to identify sets of schools that are likely to be sharing the same students. We then used what we knew about the geography and demographics of the schools and region to predict what was driving those divisions and how students moved within them. This project was really exciting because it allowed us to look at and think about the spatial connections between people and places without having to rely on set definitions of space, such as Euclidean distance or tract boundaries. The clustering was done without any spatial measures, but when you mapped the network you could clearly see structural boundaries between communities. This project also took a long time to get from a kernel of an idea to the point where it was ready to be published. I had to teach myself new network methods and gravity models that I hadn’t been trained to use in grad school. I wouldn’t have been able to do that with the generous help of colleagues, seminar participants, and reviewers, who even when rejecting the paper pointed me in better and more rigorous directions with the methods. Being able to take a quick mention of an author or method and go read to the point where I thought I could actually answer the questions I wanted answers to is invigorating and exciting.

3. What type of methodological work do you hope to see or expect to see in the future of sociology?

This feels like a really big question because I think we need a lot of different kinds of methods to answer different kinds of questions. That said, I think that I would love to see more Sociologists working with administrative data sources. Much of the existing work with these data has been really narrow policy analysis, but I think there is room for using these data to provide rich description of social facts that we just couldn’t see otherwise. Specifically, these data let you look at heterogeneity and distributions, not just mean behavior. They also allow you to look at social and spatial connections that you might not otherwise see. As Sociologists we think context and structure matter, but I think our methods are often too focused on the individual, or simple aggregations of individuals. Methods that really lean into the importance of social structure either with network analysis or simulations are really exciting to me.

4. How do you think graduate students or early career scholars can improve their methodological work?

I think the most important thing for graduate students to do is take as many methods courses as possible and to look beyond your own department. Sometimes bringing methods that are standard in other fields can help you think of and answer new and interesting questions that are relevant to Sociology. Reading is important and can be done any time, but the opportunity to ask specific questions in a classroom environment is invaluable. (I really miss it!) I would also recommend that everyone invest in programming skills. Being able to manipulate and merge large datasets is a skill that needs to be practiced, but can open up all sorts of new project ideas and methodological techniques.

5. What's next? What sorts of projects are you hoping to work on in the future?

My next projects are an extension of the school mobility project. I will be exploring patterns of out-of-zone enrollment (elementary school students who attend a traditional public school, but don’t live in that catchment area) and open enrollment high school choice in Baltimore. I’m hoping to learn more about bimodal network analysis since it is important to take into account where kids live and go to school as well as choice modeling. In both projects I’m interested in combining network and spatial analysis to explore school and residential segregation dynamics, but also to understand how connections are formed and maintained by students in disparate parts of the city.

OCTOBER 2020: JOSCHA LEGEWIE


Joscha Legewie (Ph.D. 2013, Columbia University) is the John L. Loeb Associate Professor of the Social Sciences in the Department of Sociology at Harvard University. His research focuses on social inequality/stratification, education, race/ethnicity, quantitative methods, urban sociology, and computational social science. It is motivated by a theoretical interest in the social, spatial, and temporal processes that lead to inequality. It examines how peer groups, schools, neighborhoods, and the sequencing of events produce macro patterns of social inequality and influence the relations between social groups. His work builds on rigorous causal inference based on innovative, natural or quasi-experimental research designs with a keen interest in "big data" as a promising source for future social science research.

Can you tell me about a recent methodological project of yours? What was most exciting to you in that project?

I have been working on various projects around the social costs of policing for the education and health of minority youth using large-scale administrative data from New York City. The key methodological challenges are about causal identification, mechanisms, and the processing of administrative data from the NYC Department of Education, NYS medicaid data and the New York City Police Department. The most exciting aspects of this work is the collaboration with a group of smart and dedicated graduate students in sociology and economics: Nino Cricco, Kalisha Dessources Figures, Roland Neil, Nefara Riesch, John Tebes, and Michael Zanger-Tishler. We try to follow recent methodological developments and adopt them to our own work. They are all brilliant and keep me on my toes.

How do you think graduate students or early career scholars can improve their methodological work?

I am teaching a class on computational sociology, which tries to fill some of what I perceive as gaps in current methods training. The course provides an applied introduction to computational methods and data science for sociologists. It focuses on programming skills for social scientists, machine learning and applied causal analysis with concrete exercises that force students to apply the programming skills they learned in the first part of the semester. I think the importance of programming skills is ignored in many statistics classes. So I would encourage graduate students to invest in learning how to program. It makes you both more effective at what you are already doing and enables you to innovate methodologically.

SEPTEMBER 2020: XI SONG

Xi Song is an Associate Professor of Sociology and an affiliate of the Population Studies Center at the University of Pennsylvania. She previously taught at the University of Chicago. Song’s major area of research centers on the origin of social inequality from a multigenerational perspective. Her research uses demographic, statistical, and computational tools to study the rise and fall of families in human populations across time and place. She has investigated long-term family and population changes by exploring the values of genealogical microdata. These data sources include historical data compiled from family pedigrees, population registers, administrative certificates, church records, and surname data; and modern longitudinal and cross-sectional data that follow a sample of respondents, their offspring, and descendants prospectively or ask respondents to report information about their family members and relatively retrospectively.

1. Please describe your areas of methodological expertise and how you were trained in these areas.

I have applied mathematical, statistical, and computational models to study the rise and fall of families in human populations across time and place. My work demonstrates the values of genealogical microdata for studying long-term family and population changes. These data sources include historical data compiled from family pedigrees, population registers, administrative certificates, church records, and surname data; and modern longitudinal and cross-sectional data that follow a sample of respondents, their offspring, and descendants prospectively or ask respondents to report information about their family members and relatives retrospectively.

As a quantitative methodologist, I have developed Markov chain demography models for genealogical processes, population estimation methods for overlapping lifespan between generations, multivariate mixed-effects location-scale models for inter- and multigenerational data, and weighting methods for reconciling prospective and retrospective mobility estimates.

I earned a Ph.D. in Sociology and an M.S. in Statistics from UCLA. I have completed coursework in sociology, demography, statistics, biomathematics, and epidemiology as well as training through the multidisciplinary California Center for Population research. This diverse course of study has exposed me to many different perspectives in applied mathematics and statistics.

2. Can you tell me about a recent methodological project of yours? What was most exciting to you in that project?

I recently finished a paper that shows how to incorporate demographic thinking into traditional social mobility analyses. From a demographic perspective, generations within families are linked not only by their socioeconomic statuses but also by their fertility, mortality, and marriage, among other demographic behaviors. These demographic outcomes, often stratified by social class, lead to variations between families in resource allocation, household formation, and changes in kinship structure, which, in turn, limit and condition the amount of family capital that can be inherited by subsequent generations. However, most previous research on mobility has not addressed the role of these demographic processes or restricted their analysis to only two generations. My paper introduces a joint demography-mobility model that combines demographic mechanisms of births, deaths, and mating into statistical models of social mobility. It shows not only the trend in social mobility but also the process through which demography shapes long-run family and population dynamics.

3. What type of methodological work do you hope to see or expect to see in the future of sociology?

Important theoretical and empirical advances in sociology have always accompanied the development of new methods of data collection, measurement, and analysis. Although new methods developed in causal inference, machine learning, and computational statistics have provided new tools for researchers to tackle new problems, I believe that we also need to reflect on what brilliant ideas have been lost in the evolution of sociology. For example, I recently read two papers by Leo Goodman published in the 1950s on the statistical inference about Markov chains and the mover-stayer model. Now it is rare to see such stochastic models used in sociology partly because graduate students are no longer exposed to such training in sociology programs.

4. How do you think graduate students or early career scholars can improve their methodological work?

Most quantitative training in sociology has increasingly emphasized programming skills and the application of statistical software. However, ambitious theoretical questions often pose methodological challenges without ready solutions. Most students do not ask questions that they do not know how to answer—they fit their research questions to methods rather than vice versa. For example, I am teaching a panel data analysis course this semester. Most students in my class have used mixed-effects models before in their own research using either STATA or R. But if I asked them to write out their models in equations or propose alternative models by revising current assumptions in their models, many of them would have difficulties in doing so especially if such alternative models cannot be estimated by existing statistical packages. I hope sociology students can learn not only how to disseminate new methods but also invent our own methods motivated by sociological theories. I encourage students to take more fundamental courses on probability and mathematical statistics based on multivariate calculus and matrix algebra as well as research design and formal modeling before they take more applied statistical courses.

5. What's next? What sorts of projects are you hoping to work on in the future?

I am currently working on a book-length project on social mobility methods, which aims to provide a comprehensive review of past theories and techniques in the study of social mobility, ranging from classic social mobility indices, Markov chain models, and loglinear analysis to autoregressive models, heritability measures, copula and other dependence measures, and parametric models of intra- and intergenerational trajectories.

AUGUST 2020: JACOB FOSTER

Jacob Foster is a computational sociologist interested in the social production of collective intelligence, the evolutionary dynamics of ideas, and the co-construction of culture and cognition. In his empirical work, he blends computational methods with qualitative insights from science studies to probe the strategies, dispositions, and social processes that shape the production and persistence of scientific and technological ideas. He uses machine learning to mine and model the cultural meanings buried in text, and computational methods from macro-evolution to understand the dynamics of cultural populations. He also develops formal models of the structure and dynamics of ideas and institutions, with an emerging theoretical and empirical focus on the rich nexus of cognition, culture, and computation. He is currently writing a book on knowledge as an emergent feature of complex adaptive systems. He is founding co-Director of the Diverse Intelligences Summer Institute, a program that aims to build community, collaboration, and creative thinking among early-career scholars and storytellers interested in the study of mind, cognition, and intelligence of diverse forms and formats—from ants and apes to humans and AI.

1 & 2. Can you tell me about a recent project of yours where you did a lot of data work, modeling work, or other sort of methodological heavy lifting? Why were you proud of the methods in that project?

These questions have been a real gift. Reflecting on them has helped me understand my distinctive methodological style!

I really have an applied mathematics or mathematical modeling sensibility; this is perhaps unsurprising given my background in statistical physics. I am attracted to situations where there’s an interesting but elusive social phenomenon. The fun (for me) comes from finding the right mathematical or computational objects to capture important features of the phenomenon. This also means that I am always learning new things; always getting to play with my “shiny toys,” as my collaborator Susan Cochran likes to joke.

A lot of my recent work has focused on cultural phenomena. What drives the evolutionary dynamics of cultural populations? How does cultural learning occur? How do we identify the building-blocks of meaning in a particular discourse?

Let me mention two projects that will be presented at the 2020 Virtual ASA Meeting. Like most of my work, they’re both collaborative projects; each has been led by one of my fantastic graduate students. The first project tackles the question of population dynamics for cultural lineages. It’s joint work with my student Bernie Koch and a computational evolutionary biologist named Daniele Silvestro. Bernie was able to scrape an amazing dataset of Metal bands, richly annotated by members of the community. It’s about as close to a complete population as one could hope for. The paper develops a “populational” approach to culture and cultural change, synthesizing theory from cultural sociology, cognitive science, and cultural evolution. The exciting methods (Bayesian models of birth-death processes) come from computational evolutionary biology. We use these models to test different evolutionary mechanisms—from exogenously-driven mass extinction to competition for limited resources—as potential explanations for the changing population of bands and sub-genres. Bernie will be presenting this paper in a Culture Section panel, “Mind and Matter: Synthesizing Cognition and Materiality.” There’s also a preprint on SocArXiv and a tutorial on the methods, if folks are interested in learning more.

The other paper builds on a line of work with my student Alina Arseniev-Koehler, in which we have been using word embedding methods to study different facets of culture. We have a preprint on SocArXiv that makes the case that the neural version of these methods provides a reasonable first-cut at a cognitively plausible model of cultural learning; we also develop a range of techniques for assessing the robustness of findings derived from embeddings. Alina is presenting the next step in this series at the “Computational Sociology” session, a paper with the evocative title “Discourses of Death.” We should have a preprint on SocArXiv in the next couple of days. It struck me that this paper and the Metal paper both have rather grim titles involving the word “death.” Don’t read too much into that; I’m generally a cheerful person!

I’m really proud of this paper. It comes out of a very deep reading of the theoretical machine learning literature on embeddings, which tries to explain some of their fascinating and exciting properties (e.g., the correspondence between semantic and algebraic properties). Alina and I embarked on this deep dive because of a methodological itch; we wanted to really understand how these methods work. The work of Sanjeev Arora, a theoretical computer scientist at Princeton, was our primary guide. Without getting too much into the weeds, we realized that we could put together a bunch of the methods in this theoretical literature to provide another solution to a long-standing problem in sociological text analysis: how do you identify the building-blocks of a discourse?

The classic approach to this problem uses some variant of topic modeling, where topics are probability distributions over words and documents are mixtures of topics. Building on Arora’s work with various collaborators, we instead model documents as trajectories through a sort of “meaning space.” This meaning space is derived in the usual fashion by learning word embeddings. Where’s the magic? It turns out that there is a beautiful generative model interpretation for embeddings. In the simplest version (we use something slightly more complicated in the paper) you can imagine a “discourse vector” making a slow random walk through the meaning space. Wherever it currently points, that defines a sort of instantaneous topic model—a probability distribution over words. Under this generative model, given a sequence of words in the text, you can obtain the MAP estimate of the discourse vector’s position by averaging the corresponding word vectors.

So now we have a way of mapping from the text back to the embedding space. To make things more interpretable, we need some way of discretizing the embedding space. So we use a trick that Arora et al. originally developed to deal with polysemy--the fact that words can have multiple meanings. We use sparse-coding methods to find a (relatively) small number of vectors such that each word vector can be represented (with some error) as a sparse linear combination of these vectors. We call these vectors “discourse atoms,” following the theoretical machine learning terminology. Of course each discourse atom also defines a topic model—-and these models are beautiful! Incredibly crisp little bundles of meaning. At one stroke, we’ve simplified this huge meaning space into nice, interpretable building blocks.

When we work with actual text, instead of looking at where the inferred discourse vector points in a 200- or 300-dimensional space, we just map it into the nearest discourse atom. You can reduce a document down to a sequence of little elementary units of meaning, where those elementary units are specifically adapted to the corpus you’re analyzing. Just like a molecule can be decomposed into its constituent atoms, a document can be decomposed into its constituent discourse atoms.

We’ve applied this method to a corpus of narratives from the CDC’s National Violent Death Reporting System, in collaboration with Vickie Mays, Susan Cochran, and Kai-Wei Chang from UCLA. I still remember the first time I saw some of the topics derived from the discourse atoms. They were so easy to interpret! It was exhilarating. We have a whole suite of follow up projects using these methods, but for me it really drove home the importance of deeply understanding a basic technique. If we hadn’t taken the time to dig into this seemingly esoteric theoretical machine learning literature, we never would have come up with this new approach to extracting meaning from text.

3. What type of methodological work do you hope to see or expect to see in the future of sociology?

Since I went on so much about current work, I’ll be more brief here! As you can probably tell, I think there’s huge potential in leveraging contemporary advances in machine learning and artificial intelligence. But as will probably please the readers of a Methodology newsletter, I think those advances will only reach their full potential if we approach them not as blackboxed tricks to be taken for granted and deployed unreflectively, but as powerful and somewhat mysterious new tools that need to be put through their paces and really understood.

In the 2019-2020 academic year, I helped organize two workshops at the Institute for Advanced Study, working with Didier Fassin (IAS), Marion Fourcade (UC Berkeley/IAS), and Sanjeev Arora (Princeton/IAS). These were joint events between the School of Mathematics and the School of Social Science. As far as we know, this is the first time that these two Schools at that storied institution have ever held an event together. The second workshop was on “Machine Learning, Theory, and Method in the Social Sciences.” It was an incredible workshop, and I came away from it convinced of two things. First, the social sciences are an incredibly rich source of inspiring problems for machine learning researchers. Second—precisely because of the characteristics of social science data (often small, often missing, usually complex) and the demands that we make on our methods (interpretability and transparency are highly valued)— some of our best methodological partnerships may be with researchers in theoretical machine learning. They are concerned with developing methods whose inner workings are well understood and which often possess some provable guarantees. Of course, this is a partnership that appeals both to my strengths (as someone with a very strong mathematics background) and my other interests (in the nature of “intelligent entities,” broadly construed).

4. How do you think graduate students or early career people can improve their methodological work?

Building your mathematical skills is an investment that will pay off handsomely. If you can read the primary literature in, say, machine learning, you’ll have a much deeper understanding of how various methods work and why they may be appropriate or inappropriate for your particular problem.

It will also “level up” your ability to collaborate with folks in mathematics, statistics, or computer science. You’ll have a more extensive and expressive common language. By working with folks from these other disciplines, you can help to create the tools you really need, rather than making do with those that are available “off the shelf.”

Finally, when I teach graduate courses in network science or machine learning, I always try to emphasize the “modeling” aspect of what we’re doing. Whenever you are using a particular mathematical object to represent some aspect of the social world, you’re doing modeling. And when the technique in question has an underlying generative model, then you are really doing modeling. In many cases, it’s easier to understand both the mechanics and the interpretation of a method from the generative perspective. So I think it would benefit graduate students and early career folks to expose themselves to modeling in multiple contexts. When I was a graduate student, I took an amazing course on mathematical and computational models in biology. It had a great name, too: “Bytes of Life.” And of course physics is all about modeling! This has been one of the most valuable “meta” skills for me as a methodologist. Along with being an inveterate perfectionist, of course!

5. What's next? What sorts of projects are you hoping to work on in the future?

I am fundamentally a restless person, so I don’t like to till one field for too long. Of course there is much more to do developing and deploying the methods I discussed above, and I expect to be spending more time exploring the frontier between (theoretical) machine learning and sociology. I’ll just mention one other thing we’re working on currently in my lab; I hope we’ll have something ready to present at ASA 2021. As part of my broader interest in using ideas from machine learning and AI to clarify our models of human social behavior, I’ve long wanted to play around with the many approaches to deep reinforcement learning. In the past few months, working with a wonderful team of current (Pablo Geraldo) and former students (Chase McDonald, Prateek Malhotra), we’ve been building up the codebase to do multi-agent (deep) reinforcement learning. We’re just starting to run experiments in this setting, and I cannot wait to see what new light this throws on some of my core interests: the social production of collective intelligence and the co-construction of culture and cognition. It is an exciting time to be a computational sociologist!