PFW LDS - Academic Programs

Academic Programs

LDS aims to deliver lectures and hold discussions about data science topics of interest to PFW students and faculty and of interest to those practitioners working in the theory and applications of data science.

Data Science Forums 2025/2026

LDS holds discussion groups about data science topics for the general audience with the goal of disseminating knowledge about data science of interest to the community.

2025-2026 Series - Allen Downey. (2024). Probably Overthinking it (blog) on WebEx @ https://purdue.webex.com/meet/aselvite

First Meeting: Tuesday 7th October 2025 @ 1pm-2pm | A future of data science

During the first meeting of this edition of the Data Science Forums, we talked about:

Personalized and predictive analytics.
Industry ipmpact of data science.
How is AI shaping our present and future?

Second Meeting: Tuesday 21stOctober 2025 @ 1pm-2pm | Chapter 1: "Are you normal? Hint: No."

During today's meeting, the discussion of Chapter 1, “Are You Normal? Hint: No,” verted on:

Why the concept of a “normal” person is statistically misleading and rarely applies when examining multiple traits simultaneously?
How the combination of otherwise “normal” characteristics across multiple dimensions reveals that nearly every observation is unusual?
How multidimensional data analysis (using traits like the Big Five personality factors) demonstrates that only a tiny fraction of people fall within average ranges on all measures?
What this means for understanding human diversity? Answer?: Variation, not normality, is the true statistical reality.

Third Meeting: Tuesday 4th November 2025 @ 1pm-2pm | The inspection paradox

The question of interest for today's meeting was: What is the Inspection Paradox?

The Inspection Paradox is a statistical phenomenon that affects our perception of real-world scenarios, from classroom sizes to relay races and even broader contexts like criminal justice and infectious disease tracking.

The inspection paradox occurs when the probability of "inspecting" or encountering an event or a group is not representative of its average size or frequency, leading to misleading conclusions. For example, most students tend to find themselves in larger classes compared to the average class size, simply because large classes have more students, meaning the experiences sampled by students are skewed compared to the average experienced by an outside observer.

Fourth Meeting: Tuesday 24th February 2026 @ 1pm–2pm | Chapter 8: “The Long Tail of Disaster”

During this meeting, the discussion of Chapter 8, “The Long Tail of Disaster,” has focused on:

How disasters (natural, technological, financial, etc.) often follow heavy‑tailed distributions rather than “normal” (bell‑curve) patterns.
How hard is to plan, design, and make policy in a world where “once‑in‑a‑century” events may be more common than we intuitively think?

Core takeaway: Risk is not just about the “average” event; the long tail of rare, extreme outcomes can dominate the overall consequences and demands a different way of thinking about uncertainty, prevention, and resilience.

Fifth Meeting: Tuesday 17th March 2026 @ 1pm–2pm | Chapter 9: “Fairness and Fallacy”

Today, the discussion of Chapter 9, “Fairness and Fallacy,” focused on:

What the base rate fallacy is and why ignoring background frequencies leads to systematically wrong conclusions.
How misinterpreting medical test results (for example, confusing “the probability of a positive test if you have the disease” with “the probability you have the disease if the test is positive”) can cause unnecessary alarm or false reassurance.
How failing to account properly for base rates distorted public understanding of COVID infection, testing, and positivity statistics during the pandemic.

To reason fairly with data, we must combine test performance with base rates; without them, even “accurate” models and tests can mislead us and amplify inequities.

Sixth Meeting: Thursday 26th March 2026 @ 1pm–2pm | Chapter 10: “Penguins, Pessimists, and Paradoxes”

During this meeting of the Data Science Forums, we talked about:

How optimism in the General Social Survey produces a Simpson’s paradox?
Penguins, dogs, and levels of analysis: in the Palmer penguins data, beak length and depth are positively correlated within each species but negatively correlated across species.
Why aggregation can mislead in social and medical data?

Data Science Forums 2024/2025

LDS holds discussion groups about data science topics for the general audience with the goal of disseminating knowledge about data science of interest to the community.

2024-2025 Series - Nate Silver. (2012). The signal and the noise: why so many predictions fail - but some don't on WebEx @ https://purdue.webex.com/meet/aselvite

First Meeting: Tuesday 8th October 2024 1pm-2pm | Chapter 4 "For years you have been telling us that rain is green" [Weather Forecasting]

The first meeting of the year was dedicated to the problem of weather forecasting & prediction of hurricanes. In particular, we talked about:

How can Digital Twins help predictions of extreme climatic events?
What's the role of chaos in weather forecasting?
Problems of inaccuracies in the data might cause large flaws of computer models.

Second Meeting: Tuesday 15th October 2024 1pm-2pm | Chapter 5 "Desperately seeking signal" [Earthquake Predictions]

Today, we have discussed:

Is it harder to forecast hurricanes or earthquakes?
The Gutenberg-Richter Law relates the magnitude and the frquency of earthquakes.
Earthquales are systems with noisy data and underdeveloped theory.
How can complexity theory and chaos theory help in predicting earthquales?

Third Meeting : Tuesday 29th October 2024 1pm-2pm | Chapter 7 "Role models" [Infectious Disease Dynamics]

The discussion of today included topics such as

How hard it is to extrapolate, especially on an exponential scale?
What is the basic reproduction number and what does it measure exactly?
How does spatial information help with predictions?
How long does it take to create a vaccine?

Fourth Meeting: Tuesday 18th February 2025 1pm-2pm | Chapter 8 "Less and less wrong" [Sports Analytics]

Interesting questions came up in today's meeting:

How can we use data science to improve the performance of a sport team?
How do we formulate probabilistic beliefs about the world when we encounter new data? Bayes Theorem!
Finding patterns is easy in data-rich environments.

The key is determining wheter the patters represent signal or noise.

Fifth Meeting: Tuesday 25th February 2025 1pm-2pm | Chapter 2 "Are you smarter than a television pundit?" [Election Polls]

The conversation of today touched interesting points in the use of data science methods in election polls:

Are political scientists better than pundits?
How hard is it to predict better than random chance? Are high-dimensional models always better than low-dimensional ones?
Who's better at prediction: foxes or hedgehogs?

Why political predictions tend to fail?

Sixth Meeting: Tuesday 4th March 2025 1pm-2pm | Chapter 12 "A climate of healthy skepticism" [Climate Science]

The sixth and last meeting of the year was on Climate Forecasts! Relevant matters/questions included:

Noise can obscure the signal, even when the signal exists.
Are people skeptical about both climate predictions and/or of the cause-effect relationship motivating those predictions?
Are computer models reliable to forecast climate?

It is hard to solve problems where there is not a well defined cause and effect mechanism!

Data Science Forums 2023/2024

LDS holds discussion groups about data science topics for the general audience with the goal of disseminating knowledge about data science of interest to the community.

2023-2024 Series - Perspectives on Data and Data Science - Harvard Data Science Review on WebEx @ https://purdue.webex.com/meet/aselvite

First Meeting - Tuesday 10th October 2023 1pm-2pm: What Are the Values of Data, Data Science, or Data Scientists? by Xiao-Li Meng

Today, we discussed questions such as:

Do we currently have a lack of high quality data? How should we sample?
What is the value of providing to students an educational classroom that interacts with an industry-like environment?
How using data to support an agenda devalues data science as a trustworthy evidence builder?

And also... Is it true that most published research findings are false?

Reference

Ioannidis, J.P.A. (2005). Why Most Published Research Findings Are False. PLoS Med 2(8): e124. https://doi.org/10.1371/journal.pmed.0020124

Second Meeting - Tuesday 24th October 2023 1pm-2pm: What Is Your List of 10 Challenges in Data Science? by Xiao-Li Meng

Great conversation today on important Challenges in Data Science:

What are the biggest unsolved problems/interesting questions in data science? Here is a non-exhaustive list: data privacy, noisy/incomplete data, interpretable learning, heterogeneous data sources, selection bias & causal inference, generalizability vs specificity, confounding factors in observational studies, complex and unstructured data, post-selection inference, study design, reproducibility and replicability, education and communication, ...
Why are we using black box models when we don’t need to? Hope or Hype?
"Current education system teaches deterministic mathematical manipulation as students’ native language for quantitative reasoning, with probabilistic and statistical thinking as a second or even third language." by Xiao-Li Meng. Can we introduce computational, statistical, algorithmic thinking in elementary schools? Here is a nice set of lectures on Computational and Inferential Thinking: The Foundations of Data Science by Ani Adhikari, John DeNero, David Wagner.

Third Meeting - Thursday 9th November 2023 1pm-2pm: Information and Uncertainty: Two Sides of the Same Coin by Xiao-Li Meng

Today, we discussed:

Information vs Uncertainty & Signal vs Noise
What's the difference between equality and fairness in algorithms?
"Only when the predictive distribution is invariant to sub-populations can we achieve simultaneous equalization in length and coverage at any level"
How important are team work and domain knowledge?

The key question is:

Do the variations revealed in our sample capture the variations in our target population?

Fourth Meeting - Thursday 16th November 2023 1pm-2pm: Building Data Science Infrastructures and Infrastructural Data Science by Xiao-Li Meng

Today's discussion revolved around key aspects on how to build data science infrastructures:

We need internationally-renowned data scientists, highly skillful research, computing support staff
Does every topic in statistics belong to data science? What about the other way around?
Is data science a new pillar of scientific research after theory, experiment, and computing?

Slow Data = Meaningful Data?

Fifth Meeting - Tuesday 6th February 2024 1pm-2pm: Data Science: An Artificial Ecosystem by Xiao-Li Meng

Today we talked about:

What data science is not: not just machine learning or just statistics; not all about prediction; not about data analysis; not only STEM; not a single discipline.
What does an educated citizen need to know about DS?
The term data scientist comes with great expectations!

The evolution of data science.

Sixth Meeting - Tuesday 20th February 2024 1pm-2pm: The Lives and After Lives of Data by Christine L. Borgman

Today, we had fun chatting on questions such as:

Are there implicit assumptions about data?
What is the typical data life cycle? How important it is to keep data alive for long periods of time?
What types of data can be re-created and what data are to be reused?

And also... The science is in the data!

Seventh Meeting - Tuesday 27th February 2024 1pm-2pm: Five Immersive 3D Surroundings of Data Science by Xiao-Li Meng

A great discussion took place today on the following topics:

Is data science an ecosystem?
How do scientists find, reuse, and interpret data which they did not collect themselves?
What is Data Linkage?

And also... Why are we using black box models in AI when we don’t need to?

Data Science for Biology Program

LDS delivers workshops on theoretical and computational aspects of data science that serve the needs of students in biology and those researchers interested in the applications of statistical and mathematical methods in the biological sciences.

Data Science for the Biological Sciences - Workshop (1st edition held during - Data Science Week 2022)

LAB 1 - Tuesday 29th November 2022 - Kathleen Lois Foster & Alessandro Maria Selvitella

Background on how to use R, the use of variables to store information, how to call particular elements of a matrix, how to install and use packages.

LAB 2 - Thursday 1st December 2022 - Kathleen Lois Foster & Alessandro Maria Selvitella

Basic statistical methods and visualization. Hypothesis tests, such as t-test, ANOVAs, ANCOVAs, simple linear regression, and multiple linear regression.

Slides

R- code

Data Science Forums 2022/2023

LDS holds discussion groups about data science topics for the general audience with the goal of disseminating knowledge about data science of interest to the community.

2022-2023 Series - Weapons of Math Destruction: how big data increases inequality and threatens democracy on WebEx @ https://purdue.webex.com/meet/aselvite

Monday 7th November 2022 noon - 1pm: CHAPTER 1 - BOMB PARTS: What Is a Model?

Today, we discussed questions such as:

What models can be weapons of math destruction?
What can we do to prevent that the people are treated unfairly because of the wrong implementation of data-driven algorithms?
What decisions have you witnessed that damaged people in the most disadvantageous situations as a consequence of the misuse of statistics?

And also... Pay attention to Spurious Correlations!

Spurious correlations

Monday 14th November 2022 noon - 1pm: CHAPTER 2 SHELL SHOCKED: My Journey of Disillusionment

What emerged in today's discussion...

People cannot be used as "data trails" and models cannot be "separated from people".
It might be scary for a student entering the data science world thinking about how many things can go bad with the wrong model...
Honesty, knowledge, and transparency are too important values.

And also... How can a model based on past observations be good to predict the future if the future is different from the past?

Monday 21st November 2022 1pm - 2pm: CHAPTER 3 ARMS RACE: Going to College

This session concentrated on...

How can we measure educational excellence?
Often, most relevant data is inaccessible. At the same time, proxies are easy to manipulate!
What is the objective of the modeler?

Is it a good idea to decide which college to attend using rankings?

Monday 28th November 2022 noon - 1pm: CHAPTER 4 PROPAGANDA MACHINE: Online Advertising

Today's discussion centered around social and technical aspects of data science:

How does Bayesian Analysis work?
Machine learning is not as efficient as the human brain: What can we do to improve machine learning algorithms?
How is data science misused f for predatory advertisement?

Predatory journals: no definition, no defenceLeading scholars and publishers from ten countries have agreed a definition of predatory publishing that can protect scholarship. It took 12 hours of discussion, 18 questions and 3 rounds to reach.

Not seeing anything above? Reauthenticate

Tuesday 6th December 2022 noon - 1pm: CHAPTER 5 CIVILIAN CASUALTIES: Justice in the Age of Big Data

In the last meeting of the term, we discussed questions such as:

What consequences does using geography as a proxy for race can have?
Is it always true that bigger data is better data?
How can data scientists balance fairness and efficacies of their algorithms in questions of public interest such as crime prevention?

Mathematicians urge colleagues to boycott police work in wake of killingsMore than 1,400 researchers have signed a letter calling on the discipline to stop working on predictive-policing algorithms and other models.

Tuesday 14th February 2023 noon - 1pm: CHAPTER 6 INELIGIBLE TO SERVE: Getting a Job

During the first meeting of the new term, we enjoyed talking about:

How do automatic systems judge us when we seek jobs?
Machines do not discriminate, humans do!
There is No Free lunch: Optimization algorithms perform similarly when their performance is averaged across all possible learning tasks.

Tuesday 21st February 2023 noon - 1pm: CHAPTER 7 SWEATING BULLETS: On the Job

In today's meeting, we went through many interesting topics:

What are Applied Mathematics and Operation Research, the science of logistics?
What are the consequences of Clopening and why companies opt for such a scheduling process?
The Simpson's Paradox is the phenomenon that appears in some datasets, where subgroups with a common trend (say, all negative trend) show the reverse trend when they are aggregated (say, positive trend) link link

Tuesday 28th February 2023 noon - 1pm: CHAPTER 8 COLLATERAL DAMAGE: Landing Credit

The discussion of this meeting revolved around the following questions:

What are the consequences of using the race of a person or a ZIP code as input in an e-score model?
What are the differences between group analysis and individual analysis?
Can humans and AI collaborate to defuse weapons of math destruction?

Tuesday 14th March 2023 noon - 1pm: CHAPTER 9 NO SAFE ZONE: Getting Insurance

Today we talked about:

What's the difference between Correlation and Causation?
How does stratification help understanding variability?
Is it true that that those who act alike will take on similar levels of risk?

Tuesday 21st March 2023 noon - 1pm: CHAPTER 10 THE TARGET CITIZEN: Civic Life

Today's meeting involved a conversation on the following topics:

How can social media campaign influence elections?
What is the relationship between Data Science and Confirmation Bias?
What is Computational Social Science?

Data Science for Complex Systems Program

Prof. Drake Olejniczak will deliver a series of lectures on graph theory tailored towards students and researchers interested in data science and its application to social and natural phenomena.

Mini-Course (upcoming!)

Fifth Meeting: Tuesday 25th February 2025 1pm-2pm | Chapter 2 "Are you smarter than a television pundit?" [Election Polls]

The conversation of today touched interesting points in the use of data science methods in election polls:

Are political scientists better than pundits?
How hard is it to predict better than random chance? Are high-dimensional models always better than low-dimensional ones?
Who's better at prediction: foxes or hedgehogs?

Why political predictions tend to fail?

Page updated

Google Sites

Report abuse