Lecture 26

For today you should:

1) Work on your report

Today:

1) Next steps

2) Parting thoughts

For next time

1) Draft report due on Thursday 30 April (remember my definition of "draft").  Give a hard copy to either Paul or me, whichever did not read your preliminary report.

2) Final version goes to the sponsor on 7 May.  Cc Paul and me to stop the clock.

3) "Final exam" period is May 7, officially 12-3 pm.  Let's meet at 1 pm for celebratory snacks and informal presentations (no preparation, or a few slides if you want).  

4) Public archival version goes to Paul and me on 7 May or maybe later

The public archival version of your report will go on the class web page:

1) If your project is not secret, the final report and the archival report are the same.

2) If there are secrets, you should ask your liaison to tell you which parts of the final report to redact, then remove them, smooth over the holes, and send me a version we can publish.

How to turn in:

When you have a public version of the report:

1) Check it in to a public repository and 

2) use this survey to submit a link.

Do not submit a non-public report using this survey.

Final report

1) Start with a copy of your preliminary report (keep the original)

2) Fix any issues Paul and I indicated

3) Update anything that has changed 

4) Remove anything that is no longer correct (but it's ok to keep some things that are not in a straight line to the primary results)

5) Add new analysis since the preliminary report

6) Apply the QMRI checklist

7) Apply the style guide

Start with a copy of the preliminary report.  The audience of the final report is the same.  The goal is only slightly different.

Remember: accentuate the positive.  Talk about what you did, and minimize or omit apologies for what you didn't do.  It's fine to have a "Next steps" section, especially if you think the sponsor will continue work in this area.

Next steps

I believe this class is a good one-semester introduction to Data Science, but Data Science is a huge, interdisciplinary mess of topics.

Drew Conway's famous Data Science Venn diagram:

Steven Geringer's version 2.0:

How to become a unicorn.  Learn more:

1) Machine learning, including Bayesian methods

2) Software engineering, maybe add R to the mix

3) Databases, including distributed systems.  Seriously.

4) Visualization (read some books and blogs like Junk Charts)

5) Applied statistics (notice "applied" and also where on the list I put this)

What about "domain knowledge" or "subject matter expertise"?  That's up to you.

Final thoughts

The tools and processes you learned this semester are powerful, rare, and in great demand!

With more learning and practice, you will have enormous capability and opportunity.  So much, you might feel overwhelmed: so much to learn!  so many things you could do, what should you do?

My unsolicited advice:

1) Choose projects you are excited about, working with people who know more than you, in places you want to be, with a good commute.

2) Consider jobs that are consulting-like in the sense that you work with a series of clients in a series of domains.

3) Consider extracurricular opportunities like data competitions, hackathons and volunteer organizations like DataKind.

4) If you want to be a Data Scientist, most graduate programs in statistics and computer science are not worth the opportunity cost.  There are a few Data Science programs now, but I don't know enough to endorse them.

As I said in my "TED talk"

1) The first data revolution made factual information ubiquitous: it changed our environment, experience, and expectations about information.

2) The second data revolution is making data and the tools for working with it ubiquitous (including cognitive tools).  It has changed the environment, is changing our experiences, and will change our expectations.

Present:

1) Lies, damn lies and statistics (ha, so witty!)

2) Uninformed debate and polarization

Soon, we will share the expectation that “Data combined with practical methods can answer questions and guide decisions under uncertainty.”   Think Stats, Chapter 1.