Duke Computer Science
CompSci 590.06:
Causal Inference, Fairness, and Explanations in Data Analysis
Spring 2025
Duke Computer Science
Spring 2025
Welcome to CompSci 590.06: Causal Inference, Fairness, and Explanations in Data Analysis graduate seminar course, Spring 2025 semester!
The first class is on January 8, Wednesday.
The website is under construction, please check frequently for updates.
In this class, we will learn techniques to do formal and rigorous causal analysis based on observational (collected data), and see its applications in inferring fairness and explainability in data analysis. As commonly known as "Correlation is not Causation", the problem of causal inference goes far beyond simple correlation, association, or model-based prediction analysis, and is practically indispensable in health, medicine, social sciences, and other domains. For example, a medical researcher may want to find out whether a new drug is effective in curing cancer of a certain type, or an economist may want to understand whether a job-training program helps improve employment prospects. Causal inference lays the foundation of sound and robust policy making by providing a means to estimate the impact of a certain "intervention" to the world. While the gold standard of causal inference is performing randomized controlled experiments, often they are not possible due to ethical or cost reasons. Hence for practical applications, "observational studies" or causal inference based on observational data is used. A dataset can tell us very different stories based on how we look at it (e.g., Simpson Paradox), so it is important to understand the right way to look at a given dataset, in particular, what variables to condition on before making any conclusions, especially causal conclusions in data analysis. In this class we will discuss two models for observational causal inference: Probabilistic Graphical Causal Model (Pearl) more prevalent in Artificial Intelligence (AI) research, and the Potential Outcome Framework (Rubin) more prevalent in Statistical research, along with related concepts and techniques. The other topics we will discuss are (1) Fairness and (2) eXplainable Artificial Intelligence (XAI), and how they relate to causal inference. The growing concerns about the complexity and opacity of data-driven decision making systems, deployed for making consequential decisions in healthcare, criminal justice systems, and finance, has led to a surge of interest in research in these topics. These topics broadly fall under "Responsible Data Science", and we will discuss Responsible Data Science in general in this class.
Prerequisites:
There are no hard prerequisites, but this is an advanced graduate-level seminar and basic knowledge in CS, e.g., graphs, probability theory, algorithms, machine learning, databases (equivalent of CompSci 201, 230, 330, 371, 316) will be assumed, otherwise the students taking this class should be willing to learn preliminary concepts as needed themselves. Students should also be willing to read a number of research papers, book chapters, and other materials. We will revise some of the concepts used in this class (not all) as needed.
Lectures: Mondays and Wednesdays, 10:05 am - 11:20 am, LSRC D106