LDS aims to deliver lectures and hold discussions about data science topics of interest to PFW students and faculty and of interest to those practitioners working in the theory and applications of data science.
LDS holds discussion groups about data science topics for the general audience with the goal of disseminating knowledge about data science of interest to the community.
2024-2025 Series - Nate Silver. (2012). The signal and the noise: why so many predictions fail - but some don't on WebEx @ https://purdue.webex.com/meet/aselvite
First Meeting: Tuesday 8th October 2024 1pm-2pm | Chapter 4 "For years you have been telling us that rain is green" [Weather Forecasting]
The first meeting of the year was dedicated to the problem of weather forecasting & prediction of hurricanes. In particular, we talked about:
How can Digital Twins help predictions of extreme climatic events?
What's the role of chaos in weather forecasting?
Problems of inaccuracies in the data might cause large flaws of computer models.
Second Meeting: Tuesday 15th October 2024 1pm-2pm | Chapter 5 "Desperately seeking signal" [Earthquake Predictions]
Today, we have discussed:
Is it harder to forecast hurricanes or earthquakes?
The Gutenberg-Richter Law relates the magnitude and the frquency of earthquakes.
Earthquales are systems with noisy data and underdeveloped theory.
How can complexity theory and chaos theory help in predicting earthquales?
Third Meeting : Tuesday 29th October 2024 1pm-2pm | Chapter 7 "Role models" [Infectious Disease Dynamics]
The discussion of today included topics such as
How hard it is to extrapolate, especially on an exponential scale?
What is the basic reproduction number and what does it measure exactly?
How does spatial information help with predictions?
How long does it take to create a vaccine?
Fourth Meeting: Tuesday 18th February 2025 1pm-2pm | Chapter 8 "Less and less wrong" [Sports Analytics]
Interesting questions came up in today's meeting:
How can we use data science to improve the performance of a sport team?
How do we formulate probabilistic beliefs about the world when we encounter new data? Bayes Theorem!
Finding patterns is easy in data-rich environments.
The key is determining wheter the patters represent signal or noise.
Fifth Meeting: Tuesday 25th February 2025 1pm-2pm | Chapter 2 "Are you smarter than a television pundit?" [Election Polls]
The conversation of today touched interesting points in the use of data science methods in election polls:
Are political scientists better than pundits?
How hard is it to predict better than random chance? Are high-dimensional models always better than low-dimensional ones?
Who's better at prediction: foxes or hedgehogs?
Why political predictions tend to fail?
Sixth Meeting: Tuesday 4th March 2025 1pm-2pm | Chapter 12 "A climate of healthy skepticism" [Climate Science]
The sixth and last meeting of the year was on Climate Forecasts! Relevant matters/questions included:
Noise can obscure the signal, even when the signal exists.
Are people skeptical about both climate predictions and/or of the cause-effect relationship motivating those predictions?
Are computer models reliable to forecast climate?
It is hard to solve problems where there is not a well defined cause and effect mechanism!
LDS holds discussion groups about data science topics for the general audience with the goal of disseminating knowledge about data science of interest to the community.
2023-2024 Series - Perspectives on Data and Data Science - Harvard Data Science Review on WebEx @ https://purdue.webex.com/meet/aselvite
First Meeting - Tuesday 10th October 2023 1pm-2pm: What Are the Values of Data, Data Science, or Data Scientists? by Xiao-Li Meng
Today, we discussed questions such as:
Do we currently have a lack of high quality data? How should we sample?
What is the value of providing to students an educational classroom that interacts with an industry-like environment?
How using data to support an agenda devalues data science as a trustworthy evidence builder?
And also... Is it true that most published research findings are false?
Reference
Ioannidis, J.P.A. (2005). Why Most Published Research Findings Are False. PLoS Med 2(8): e124. https://doi.org/10.1371/journal.pmed.0020124
Second Meeting - Tuesday 24th October 2023 1pm-2pm: What Is Your List of 10 Challenges in Data Science? by Xiao-Li Meng
Great conversation today on important Challenges in Data Science:
What are the biggest unsolved problems/interesting questions in data science? Here is a non-exhaustive list: data privacy, noisy/incomplete data, interpretable learning, heterogeneous data sources, selection bias & causal inference, generalizability vs specificity, confounding factors in observational studies, complex and unstructured data, post-selection inference, study design, reproducibility and replicability, education and communication, ...
Why are we using black box models when we don’t need to? Hope or Hype?
"Current education system teaches deterministic mathematical manipulation as students’ native language for quantitative reasoning, with probabilistic and statistical thinking as a second or even third language." by Xiao-Li Meng. Can we introduce computational, statistical, algorithmic thinking in elementary schools? Here is a nice set of lectures on Computational and Inferential Thinking: The Foundations of Data Science by Ani Adhikari, John DeNero, David Wagner.
Third Meeting - Thursday 9th November 2023 1pm-2pm: Information and Uncertainty: Two Sides of the Same Coin by Xiao-Li Meng
Today, we discussed:
Information vs Uncertainty & Signal vs Noise
What's the difference between equality and fairness in algorithms?
"Only when the predictive distribution is invariant to sub-populations can we achieve simultaneous equalization in length and coverage at any level"
How important are team work and domain knowledge?
The key question is:
Do the variations revealed in our sample capture the variations in our target population?
Fourth Meeting - Thursday 16th November 2023 1pm-2pm: Building Data Science Infrastructures and Infrastructural Data Science by Xiao-Li Meng
Today's discussion revolved around key aspects on how to build data science infrastructures:
We need internationally-renowned data scientists, highly skillful research, computing support staff
Does every topic in statistics belong to data science? What about the other way around?
Is data science a new pillar of scientific research after theory, experiment, and computing?
Slow Data = Meaningful Data?
Fifth Meeting - Tuesday 6th February 2024 1pm-2pm: Data Science: An Artificial Ecosystem by Xiao-Li Meng
Today we talked about:
What data science is not: not just machine learning or just statistics; not all about prediction; not about data analysis; not only STEM; not a single discipline.
What does an educated citizen need to know about DS?
The term data scientist comes with great expectations!
The evolution of data science.
Sixth Meeting - Tuesday 20th February 2024 1pm-2pm: The Lives and After Lives of Data by Christine L. Borgman
Today, we had fun chatting on questions such as:
Are there implicit assumptions about data?
What is the typical data life cycle? How important it is to keep data alive for long periods of time?
What types of data can be re-created and what data are to be reused?
And also... The science is in the data!
Seventh Meeting - Tuesday 27th February 2024 1pm-2pm: Five Immersive 3D Surroundings of Data Science by Xiao-Li Meng
A great discussion took place today on the following topics:
Is data science an ecosystem?
How do scientists find, reuse, and interpret data which they did not collect themselves?
What is Data Linkage?
And also... Why are we using black box models in AI when we don’t need to?
LDS delivers workshops on theoretical and computational aspects of data science that serve the needs of students in biology and those researchers interested in the applications of statistical and mathematical methods in the biological sciences.
Data Science for the Biological Sciences - Workshop (1st edition held during - Data Science Week 2022)
LAB 1 - Tuesday 29th November 2022 - Kathleen Lois Foster & Alessandro Maria Selvitella
Background on how to use R, the use of variables to store information, how to call particular elements of a matrix, how to install and use packages.
LAB 2 - Thursday 1st December 2022 - Kathleen Lois Foster & Alessandro Maria Selvitella
Basic statistical methods and visualization. Hypothesis tests, such as t-test, ANOVAs, ANCOVAs, simple linear regression, and multiple linear regression.
LDS holds discussion groups about data science topics for the general audience with the goal of disseminating knowledge about data science of interest to the community.
2022-2023 Series - Weapons of Math Destruction: how big data increases inequality and threatens democracy on WebEx @ https://purdue.webex.com/meet/aselvite
Monday 7th November 2022 noon - 1pm: CHAPTER 1 - BOMB PARTS: What Is a Model?
Today, we discussed questions such as:
What models can be weapons of math destruction?
What can we do to prevent that the people are treated unfairly because of the wrong implementation of data-driven algorithms?
What decisions have you witnessed that damaged people in the most disadvantageous situations as a consequence of the misuse of statistics?
And also... Pay attention to Spurious Correlations!
Monday 14th November 2022 noon - 1pm: CHAPTER 2 SHELL SHOCKED: My Journey of Disillusionment
What emerged in today's discussion...
People cannot be used as "data trails" and models cannot be "separated from people".
It might be scary for a student entering the data science world thinking about how many things can go bad with the wrong model...
Honesty, knowledge, and transparency are too important values.
And also... How can a model based on past observations be good to predict the future if the future is different from the past?
Monday 21st November 2022 1pm - 2pm: CHAPTER 3 ARMS RACE: Going to College
This session concentrated on...
How can we measure educational excellence?
Often, most relevant data is inaccessible. At the same time, proxies are easy to manipulate!
What is the objective of the modeler?
Is it a good idea to decide which college to attend using rankings?
Monday 28th November 2022 noon - 1pm: CHAPTER 4 PROPAGANDA MACHINE: Online Advertising
Today's discussion centered around social and technical aspects of data science:
How does Bayesian Analysis work?
Machine learning is not as efficient as the human brain: What can we do to improve machine learning algorithms?
How is data science misused f for predatory advertisement?
Tuesday 6th December 2022 noon - 1pm: CHAPTER 5 CIVILIAN CASUALTIES: Justice in the Age of Big Data
In the last meeting of the term, we discussed questions such as:
What consequences does using geography as a proxy for race can have?
Is it always true that bigger data is better data?
How can data scientists balance fairness and efficacies of their algorithms in questions of public interest such as crime prevention?
Tuesday 14th February 2023 noon - 1pm: CHAPTER 6 INELIGIBLE TO SERVE: Getting a Job
During the first meeting of the new term, we enjoyed talking about:
How do automatic systems judge us when we seek jobs?
Machines do not discriminate, humans do!
There is No Free lunch: Optimization algorithms perform similarly when their performance is averaged across all possible learning tasks.
Tuesday 21st February 2023 noon - 1pm: CHAPTER 7 SWEATING BULLETS: On the Job
In today's meeting, we went through many interesting topics:
What are Applied Mathematics and Operation Research, the science of logistics?
What are the consequences of Clopening and why companies opt for such a scheduling process?
The Simpson's Paradox is the phenomenon that appears in some datasets, where subgroups with a common trend (say, all negative trend) show the reverse trend when they are aggregated (say, positive trend) link link
Tuesday 28th February 2023 noon - 1pm: CHAPTER 8 COLLATERAL DAMAGE: Landing Credit
The discussion of this meeting revolved around the following questions:
What are the consequences of using the race of a person or a ZIP code as input in an e-score model?
What are the differences between group analysis and individual analysis?
Can humans and AI collaborate to defuse weapons of math destruction?
Tuesday 14th March 2023 noon - 1pm: CHAPTER 9 NO SAFE ZONE: Getting Insurance
Today we talked about:
What's the difference between Correlation and Causation?
How does stratification help understanding variability?
Is it true that that those who act alike will take on similar levels of risk?
Tuesday 21st March 2023 noon - 1pm: CHAPTER 10 THE TARGET CITIZEN: Civic Life
Today's meeting involved a conversation on the following topics:
How can social media campaign influence elections?
What is the relationship between Data Science and Confirmation Bias?
What is Computational Social Science?
Prof. Drake Olejniczak will deliver a series of lectures on graph theory tailored towards students and researchers interested in data science and its application to social and natural phenomena.
Mini-Course (upcoming!)
Fifth Meeting: Tuesday 25th February 2025 1pm-2pm | Chapter 2 "Are you smarter than a television pundit?" [Election Polls]
The conversation of today touched interesting points in the use of data science methods in election polls:
Are political scientists better than pundits?
How hard is it to predict better than random chance? Are high-dimensional models always better than low-dimensional ones?
Who's better at prediction: foxes or hedgehogs?
Why political predictions tend to fail?