Project‎ > ‎


This page contains information for collaborators interested in working with students in Data Science at Olin College.

Do you have data?  Do you have questions?  Would you like a student team to help?

Starting in January 2014, students in my Data Science class will be looking for external groups to collaborate with.  You might be interested in working with us if:
  1. You have real-world data you can share with a small group of students (or there is freely-available data you would like them to explore).
  2. You would like help with exploratory data analysis and visualization.
  3. You are willing to interact with the team at least a few times over the course of the semester to help them get started with the data, to tell them about questions you are interested in, and to review their preliminary findings and provide feedback and suggestions.
This semester we are particularly interested in projects related to health, biology, and medicine, but we are open to projects in all areas.

What kind of work can the students do?

Students in this class are engineering undergraduates with Python programming skills.  They will be learning a practical set of tools they can use to turn data into useful knowledge.

They will work in teams of 2-3 students.  Projects that would be a good match for this class might include:
  1. Datasets that require custom cleaning and transformation, or merging data from several sources.
  2. Open-ended questions and exploratory analysis.
  3. Custom visualizations, and possibly interactive visualization tools.
  4. Simulation of systems that change over time, and predictive analysis.
For more specifics, see this Outline of Topics.

Next steps

If you are interested in working with a student team, here are the steps:
  1. Please send email to Allen Downey ( and tell me about the data and the kinds of questions you would like the student team to address.
  2. I will work with you to craft a project description, which I will post here (providing a level of detail you are comfortable with).
  3. At the beginning of the semester, students will choose topics and assemble teams of 2-3 students.
  4. The students will contact you (or a designated liaison) to discuss your questions and goals, and to arrange access to the data.
  5. At least once during the semester, the team will send preliminary findings, discuss them with you, and then refine or expand their analysis.
  6. At the end of the semester, they will give you a final report with their findings.  If practical, they will also make a presentation to you and your team.  And if appropriate, they will deliver software they developed.

Some frequently asked questions

1) What programming languages will the students use?

I expect most projects to be in Python, but some students know other languages.  If your project requires another language or a particular tool set, we should talk.

2) Will the students deliver their code along with the final report?

I will encourage students to make their code public, if appropriate, or at least deliver it to the sponsor, in order to make their results replicable.  The details might be different for different projects.

Students in this class are familiar with version control systems like Git and Subversion.

3) How much background do the students have in statistics?

Some of them have taken the Statistics AP, but for most this will be their first college-level statistics class.  They will learn basic methods of exploratory data analysis, estimation, regression, and hypothesis testing.  We will also cover some machine-learning tools.

If a project requires students to learn additional methods, they can do that.  But projects that require advanced statistical analysis might not be appropriate.

The students have programming skills that might allow them to develop novel analyses and visualizations.

4) Is there any cost for collaborating with this class?

There is no cash cost.  But you should consider the cost of your time.  We will need your help to get the project started and get the students up to speed with the data.  Depending on the project, we might need you to provide some domain expertise, or at least pointers to resources.  And we hope you will be able to provide guidance and feedback as the project proceeds.  Of course, we hope every project yields benefits for you that justify this investment.

A little about Olin

Olin College of Engineering is a new undergraduate college founded with the mission to fix engineering education.  We have professors who are passionate about teaching, and students who are engaged and excited about learning.

Our curriculum is hands-on and, like this Data Science class, includes many projects where students engage with real-world problems, working with external collaborators.

Olin is in Needham, MA, about 10 miles from Boston.