Reading
Readings for the course
The required book for this class is “Confident Data Skills” by Kirill Eremenko. The book is meant for a wide audience, is very recently published, and is inexpensive. We will only be reading chapters 1-5, after which the book goes into techniques that will be covered in more detail in your other classes.
Much of the readings for this course are taken from popular publications that do a good job of communicating arguments using data.
Week 1: Readings for Getting Started
"Confident Data Skills" Chapter 1: Defining Data
and
French election results: Macron’s victory in charts
Week 2: Data is messy
"Confident Data Skills" Chapter 2: "How data fulfills our needs" and Chapter 3: "The data science mindset"
and
I also love This collection of "falsehoods programmers believe in". It isn't assigned reading, but I recommend you poke around. A good example is Falsehoods programmers believe about time, which starts with "There are always 24 hours in a day."
Week 3: Constructing arguments out of our measurements
"Confident Data Skills" Chapter 4: "Identify the question" and Chapter 5: "Data Preparation"
and
We’re Measuring the Economy All Wrong
Week 4: Fairness and Bias in unstructured text
The following articles explore a popular method of processing English text (the word2vec algorithm) and how it encodes societal biases of the training data. It is a pretty tough read, so you may need to google some terms and take some time to work through it.
And a fun article:
Week 5: Summarizing with visualizations
and
Murder rates don't tell us everything about gun violence
Week 6: You are willingly wiretapping yourself?
How does Spotify know you so well?
and
YouTube, the Great Radicalizer
Week 7: Digital images and their (mis)uses.
1. Scientific Racism's new face:
https://medium.com/@blaisea/physiognomys-new-clothes-f2d4b59fdd6a
2. An exploration into predicting someones gender from their face.
http://gendershades.org/index.html
3. Don't trust your data: how to spot photoshopped images. This is a long read, and I don't expect you to get through all of it. But we are using techniques from it in class, and I think it is super interesting.
http://blackhat.com/presentations/bh-dc-08/Krawetz/Whitepaper/bh-dc-08-krawetz-WP.pdf
Week 8: Fairness and Bias
Ethics and Data Science by Mike Loukides, Hilary Mason, DJ Patil. Two ways to get this:
- As a Free Ebook (needs Amazon account)
- As blog posts (read in reverse of presented order, starting with "Doing good data science")
This reading is pretty long. It is ok to skip around and pick some interesting parts, but I think it is really worth the time.
Week 9: Inference is hard
Scientists rise up against statistical significance
and
We Experiment on Human Beings!
bonus (optional, hard, deep)
Why most published research findings are false
Week 10: Data and the world
The media has a probability problem
Look, I know you are more worried about finals than doing the reading. So why don't you save this one for over break? This, to me, gets at the core of data science: