Reading

Readings for the course

The required book for this class is “Confident Data Skills” by Kirill Eremenko. The book is meant for a wide audience, is very recently published, and is inexpensive. We will only be reading chapters 1-5, after which the book goes into techniques that will be covered in more detail in your other classes.

Much of the readings for this course are taken from popular publications that do a good job of communicating arguments using data.

Week 1: Readings for Getting Started

"Confident Data Skills" Chapter 1: Defining Data

and

French election results: Macron’s victory in charts

Week 2: Data is messy

"Confident Data Skills" Chapter 2: "How data fulfills our needs" and Chapter 3: "The data science mindset"

and

I also love This collection of "falsehoods programmers believe in". It isn't assigned reading, but I recommend you poke around. A good example is Falsehoods programmers believe about time, which starts with "There are always 24 hours in a day."

Week 3: Constructing arguments out of our measurements

"Confident Data Skills" Chapter 4: "Identify the question" and Chapter 5: "Data Preparation"

and

We’re Measuring the Economy All Wrong

Week 4: Fairness and Bias in unstructured text

The following articles explore a popular method of processing English text (the word2vec algorithm) and how it encodes societal biases of the training data. It is a pretty tough read, so you may need to google some terms and take some time to work through it.

And a fun article:

Week 5: Summarizing with visualizations

women in congress

and

Murder rates don't tell us everything about gun violence

Week 6: You are willingly wiretapping yourself?

How does Spotify know you so well?

and

YouTube, the Great Radicalizer

Week 7: Digital images and their (mis)uses.

1. Scientific Racism's new face:

https://medium.com/@blaisea/physiognomys-new-clothes-f2d4b59fdd6a

2. An exploration into predicting someones gender from their face.

http://gendershades.org/index.html

3. Don't trust your data: how to spot photoshopped images. This is a long read, and I don't expect you to get through all of it. But we are using techniques from it in class, and I think it is super interesting.

http://blackhat.com/presentations/bh-dc-08/Krawetz/Whitepaper/bh-dc-08-krawetz-WP.pdf

Week 8: Fairness and Bias

Ethics and Data Science by Mike Loukides, Hilary Mason, DJ Patil. Two ways to get this:

This reading is pretty long. It is ok to skip around and pick some interesting parts, but I think it is really worth the time.

Week 9: Inference is hard

Scientists rise up against statistical significance

and

We Experiment on Human Beings!

bonus (optional, hard, deep)

Why most published research findings are false

Week 10: Data and the world

The media has a probability problem

Look, I know you are more worried about finals than doing the reading. So why don't you save this one for over break? This, to me, gets at the core of data science:

The Truth Continuum