Academic Programs
LDS aims to deliver lectures and hold discussions about data science topics of interest to PFW students and faculty and of interest to those practitioners working in the theory and applications of data science.
Data Science Forums 2023/2024
LDS holds discussion groups about data science topics for the general audience with the goal of disseminating knowledge about data science of interest to the community.
2023-2024 Series - Perspectives on Data and Data Science - Harvard Data Science Review on WebEx @ https://purdue.webex.com/meet/aselvite
First Meeting - Tuesday 10th October 2023 1pm-2pm: What Are the Values of Data, Data Science, or Data Scientists? by Xiao-Li Meng
Today, we discussed questions such as:
Do we currently have a lack of high quality data? How should we sample?
What is the value of providing to students an educational classroom that interacts with an industry-like environment?
How using data to support an agenda devalues data science as a trustworthy evidence builder?
And also... Is it true that most published research findings are false?
Reference
Ioannidis, J.P.A. (2005). Why Most Published Research Findings Are False. PLoS Med 2(8): e124. https://doi.org/10.1371/journal.pmed.0020124
Second Meeting - Tuesday 24th October 2023 1pm-2pm: What Is Your List of 10 Challenges in Data Science? by Xiao-Li Meng
Great conversation today on important Challenges in Data Science:
What are the biggest unsolved problems/interesting questions in data science? Here is a non-exhaustive list: data privacy, noisy/incomplete data, interpretable learning, heterogeneous data sources, selection bias & causal inference, generalizability vs specificity, confounding factors in observational studies, complex and unstructured data, post-selection inference, study design, reproducibility and replicability, education and communication, ...
Why are we using black box models when we don’t need to? Hope or Hype?
"Current education system teaches deterministic mathematical manipulation as students’ native language for quantitative reasoning, with probabilistic and statistical thinking as a second or even third language." by Xiao-Li Meng. Can we introduce computational, statistical, algorithmic thinking in elementary schools? Here is a nice set of lectures on Computational and Inferential Thinking: The Foundations of Data Science by Ani Adhikari, John DeNero, David Wagner.
Third Meeting - Thursday 9th November 2023 1pm-2pm: Information and Uncertainty: Two Sides of the Same Coin by Xiao-Li Meng
Today, we discussed:
Information vs Uncertainty & Signal vs Noise
What's the difference between equality and fairness in algorithms?
"Only when the predictive distribution is invariant to sub-populations can we achieve simultaneous equalization in length and coverage at any level"
How important are team work and domain knowledge?
The key question is:
Do the variations revealed in our sample capture the variations in our target population?
Fourth Meeting - Thursday 16th November 2023 1pm-2pm: Building Data Science Infrastructures and Infrastructural Data Science by Xiao-Li Meng
Today's discussion revolved around key aspects on how to build data science infrastructures:
We need internationally-renowned data scientists, highly skillful research, computing support staff
Does every topic in statistics belong to data science? What about the other way around?
Is data science a new pillar of scientific research after theory, experiment, and computing?
Slow Data = Meaningful Data?
Fifth Meeting - Tuesday 6th February 2024 1pm-2pm: Data Science: An Artificial Ecosystem by Xiao-Li Meng
Today we talked about:
What data science is not: not just machine learning or just statistics; not all about prediction; not about data analysis; not only STEM; not a single discipline.
What does an educated citizen need to know about DS?
The term data scientist comes with great expectations!
The evolution of data science.
Sixth Meeting - Tuesday 20th February 2024 1pm-2pm: The Lives and After Lives of Data by Christine L. Borgman
Today, we had fun chatting on questions such as:
Are there implicit assumptions about data?
What is the typical data life cycle? How important it is to keep data alive for long periods of time?
What types of data can be re-created and what data are to be reused?
And also... The science is in the data!
Seventh Meeting - Tuesday 27th February 2024 1pm-2pm: Five Immersive 3D Surroundings of Data Science by Xiao-Li Meng
A great discussion took place today on the following topics:
Is data science an ecosystem?
How do scientists find, reuse, and interpret data which they did not collect themselves?
What is Data Linkage?
And also... Why are we using black box models in AI when we don’t need to?
Data Science for Biology Program
LDS delivers workshops on theoretical and computational aspects of data science that serve the needs of students in biology and those researchers interested in the applications of statistical and mathematical methods in the biological sciences.
Data Science for the Biological Sciences - Workshop (1st edition held during - Data Science Week 2022)
LAB 1 - Tuesday 29th November 2022 - Kathleen Lois Foster & Alessandro Maria Selvitella
Background on how to use R, the use of variables to store information, how to call particular elements of a matrix, how to install and use packages.
LAB 2 - Thursday 1st December 2022 - Kathleen Lois Foster & Alessandro Maria Selvitella
Basic statistical methods and visualization. Hypothesis tests, such as t-test, ANOVAs, ANCOVAs, simple linear regression, and multiple linear regression.
Data Science Forums 2022/2023
LDS holds discussion groups about data science topics for the general audience with the goal of disseminating knowledge about data science of interest to the community.
2022-2023 Series - Weapons of Math Destruction: how big data increases inequality and threatens democracy on WebEx @ https://purdue.webex.com/meet/aselvite
Monday 7th November 2022 noon - 1pm: CHAPTER 1 - BOMB PARTS: What Is a Model?
Today, we discussed questions such as:
What models can be weapons of math destruction?
What can we do to prevent that the people are treated unfairly because of the wrong implementation of data-driven algorithms?
What decisions have you witnessed that damaged people in the most disadvantageous situations as a consequence of the misuse of statistics?
And also... Pay attention to Spurious Correlations!
Monday 14th November 2022 noon - 1pm: CHAPTER 2 SHELL SHOCKED: My Journey of Disillusionment
What emerged in today's discussion...
People cannot be used as "data trails" and models cannot be "separated from people".
It might be scary for a student entering the data science world thinking about how many things can go bad with the wrong model...
Honesty, knowledge, and transparency are too important values.
And also... How can a model based on past observations be good to predict the future if the future is different from the past?
Monday 21st November 2022 1pm - 2pm: CHAPTER 3 ARMS RACE: Going to College
This session concentrated on...
How can we measure educational excellence?
Often, most relevant data is inaccessible. At the same time, proxies are easy to manipulate!
What is the objective of the modeler?
Is it a good idea to decide which college to attend using rankings?
Monday 28th November 2022 noon - 1pm: CHAPTER 4 PROPAGANDA MACHINE: Online Advertising
Today's discussion centered around social and technical aspects of data science:
How does Bayesian Analysis work?
Machine learning is not as efficient as the human brain: What can we do to improve machine learning algorithms?
How is data science misused f for predatory advertisement?
Tuesday 6th December 2022 noon - 1pm: CHAPTER 5 CIVILIAN CASUALTIES: Justice in the Age of Big Data
In the last meeting of the term, we discussed questions such as:
What consequences does using geography as a proxy for race can have?
Is it always true that bigger data is better data?
How can data scientists balance fairness and efficacies of their algorithms in questions of public interest such as crime prevention?
Tuesday 14th February 2023 noon - 1pm: CHAPTER 6 INELIGIBLE TO SERVE: Getting a Job
During the first meeting of the new term, we enjoyed talking about:
How do automatic systems judge us when we seek jobs?
Machines do not discriminate, humans do!
There is No Free lunch: Optimization algorithms perform similarly when their performance is averaged across all possible learning tasks.
Tuesday 21st February 2023 noon - 1pm: CHAPTER 7 SWEATING BULLETS: On the Job
In today's meeting, we went through many interesting topics:
What are Applied Mathematics and Operation Research, the science of logistics?
What are the consequences of Clopening and why companies opt for such a scheduling process?
The Simpson's Paradox is the phenomenon that appears in some datasets, where subgroups with a common trend (say, all negative trend) show the reverse trend when they are aggregated (say, positive trend) link link
Tuesday 28th February 2023 noon - 1pm: CHAPTER 8 COLLATERAL DAMAGE: Landing Credit
The discussion of this meeting revolved around the following questions:
What are the consequences of using the race of a person or a ZIP code as input in an e-score model?
What are the differences between group analysis and individual analysis?
Can humans and AI collaborate to defuse weapons of math destruction?
Tuesday 14th March 2023 noon - 1pm: CHAPTER 9 NO SAFE ZONE: Getting Insurance
Today we talked about:
What's the difference between Correlation and Causation?
How does stratification help understanding variability?
Is it true that that those who act alike will take on similar levels of risk?
Tuesday 21st March 2023 noon - 1pm: CHAPTER 10 THE TARGET CITIZEN: Civic Life
Today's meeting involved a conversation on the following topics:
How can social media campaign influence elections?
What is the relationship between Data Science and Confirmation Bias?
What is Computational Social Science?
Data Science for Complex Systems Program
Prof. Drake Olejniczak will deliver a series of lectures on graph theory tailored towards students and researchers interested in data science and its application to social and natural phenomena.
Mini-Course (upcoming!)