Lecture 21

For today you should:

1) Prepare for a quiz

2) Read "A few useful things to know about machine learning"

Today:

1) ML

2) Quiz

Optional reading: Data science done well looks easy

Preliminary reports

After getting feedback from us, please revise your reports and send them to the sponsor on Monday 13 April (because they are less likely to get read if we send them Friday).

Think about working with the writing tutors for another iteration before you send.

Also, write a cover letter that frames the report and communicates "the ask":

1) Give a very brief overview of the content, with the goal of making them want to read it.  Accentuate the positive; resist the temptation to load it with apologies.

2) Communicate the ask and connect the response to a future event: if you have a meeting coming up, that's the deadline; otherwise, try something like "It would be ideal if we can hear back from you by...", and give them at least 3 days.

To get credit, you must include Paul and me on the outgoing email.  But other than that, you are "on your own recognizance". 

Things I guess I should have told you

1) Put a title on your report.

2) The title should make sense to the reader.  It should indicate what the report is about.  It should not contain the words "data", "science", "preliminary" or "report"

However, there should be a subtitle that identifies the class.

3) Put the date on the report

4) Put your names on the report

5) Write a first sentence that makes sense to the reader.  For example, not:

"The data comes from the data tracked from the [application]"

"My data science project is on ..."

"For our project, we are working with ..."

"In this paper, we are reporting..."

6) The first time you mention a person, place, or thing, you should provide all the details.  After that you can use shorthand.

"The mayor of Boston, Marty Walsh, recently released a statement..."

"Dimagi is a privately held social enterprise founded in 2002 with its headquarters in Cambridge, Massachusetts, USA."

Some additional things I have not said yet

1) You are not responsible for the outcome of an experiment.  If the goal is to check out a bunch of variables to see if any of them have predictive value, it's ok if the answer turns out to be no.  That is an informative and useful result.  (And the most common result :(

2) Suggestion for the M->R transition: draw a data cartoon that shows what you expect the results to look like.

3) Despite all the red ink Paul and I have been spilling, we feel pretty good about how things are going.  We have more structure this year than last, and I think it's paying off in the outcomes.  Hang in there!

Intro to Machine Learning

First, clone the MLTutorials github repo:

git clone https://github.com/paulruvolo/MLTutorials.git

Download additional datasets:

cd MLTutorials

./download_datasets.sh

Install scikit-learn:

Using Anaconda:

conda install scikit-learn

using pip:

sudo apt-get install python-sklearn

sudo pip install -U scikit-learn

Bring up the ipython notebook in the tutorial_01 directory.