Policies
Coursework
Your work in this course will include
A 1.5 week highly-scaffolded machine learning project.
A 2.5 week machine learning project where you can choose from several options.
An open-ended project in using open data to "save the world". This project offers a substantial amount of autonomy.
An open-ended 6.5 week project on a data science topic of your choice.
Out of class readings and in-class exercises.
Participation and professionalism
We will do lots of fun and useful things during class time. You should plan to be there!
As students, you have some responsibility for creating and maintaining a classroom atmosphere that is conducive to everyone's learning and enjoyment. I hope you will think about how your participation contributes to the learning environment.
Some things you can do to help:
Come to class on time! I will do my best to use class time effectively. Late arrivals are disruptive.
Come to class prepared.
If you find yourself falling behind, be sure to speak up rather than struggle in silence.
Be professional.
Be respectful toward the instructors, your fellow students, and any external collaborators.
Be generous with your ideas and your time. Help each other.
Be reflective. Think about what's working and what's not, and take responsibility for making the class work for you.
Have fun!
Grading
Your grade in this class will be determined as a weighted average of the following assignments.
The Value of Data: 2.89%
Warmup Project: 8.65%
Choose your own adventure: 14.42%
Save the world: 17.31%
Final project: 31.73%
Reading assignments: 15%
Participation and professionalism: 10%
Each of these assignments will be graded using the course rubrics (defined below).
Late Work
Late projects will be penalized 20% per day late. I will forgive one late project per term as long as it is turned in no more than 5 days late.
Reading assignments will not be accepted late unless you communicate with me ahead of time and explain why you are unable to complete a particular assignment on time.
Rubrics
Code
100%:
Functionality: For assignments that have well-specified behavior, the code should be able to pass (or very nearly pass, e.g. there may be minor output formatting issues) automated unit testing of all required features. For open-ended assignments the code must be easy to run without modification and implement all of the required functionality.
Documentation: all functions are commented with appropriate doc strings. For open-ended assignments there is a README file discussing how to run the program and what it is supposed to do.
Style: the program exhibits effective modular design. The code does not have unnecessary cut and paste code or magic numbers. Variable and function names are sensibly chosen.
80%:
Functionality: For assignments that have well-specified behavior, the code should implement all of the required functionality. For this grade, it is possible that 10-20% of the functionality may be broken. For open-ended assignments it will be possible to get the code running with modest effort (i.e. it will not be as well documented as in a 5, but it isn’t too hard to intuit how the code works). For these types of assignments all required features must be present, however, some (10-20%) may not be functioning properly or otherwise poorly implemented.
Documentation: some functions are missing doc strings. Comments are fairly minimal.
Style: some aspects of the design of the program could be improved to reduce cut and paste code. Variable and function names are for the most part well-chosen.
60%:
Functionality: The code should implement almost all of the required features (it is okay if roughly 20% are not implemented). A significant portion, 30-50%, of the code may not work as it is supposed to.
Documentation: Docstrings are mostly absent For well-defined assignments the code does not run as it should based on the assignment spec. For open-ended assignments there may not be any indication of how to run the program, and it is not easy for a NINJA to figure out how the code works (a good test is if you have to e-mail someone to ask them how their code runs, they are probably at this level).
Style: the program design needs improvement. The code would be a lot cleaner if the author had done a better job thinking through the appropriate functional decomposition. The code has lots of cut and paste and magic numbers. Variable and function names are hard to interpret.
40%:
Functionality: the assignment is incomplete (~50% of the functionality is not implemented). The functionality that is implemented is not 100% correct.
Documentation: mostly absent.
Style: design is poor. Very little attention has been paid to choosing a sensible functional decomposition. Variable and function names are chosen almost arbitrarily.
20%:
Functionality: only minimal functionality is present.
Documentation: little or none.
Style: Code is not “readable”. Poor choice of variable and function names.
0%: nothing turned in.
Notebook Assignments
100%: thorough engagement with each section of the reading. Each exercise has either been completed, or you have articulated a well-thought out set of questions for each of the problems you didn't answer.
Points will be deducted from 100% for the following reasons:
Exercises are not filled out (either with a solution or a question)
The answer to the exercise does not demonstrate a good faith attempt to complete it.
Writing
Writing will be evaluated for clarity, concision, and comprehensiveness. The particular style that you should use will depend on the assignment and your audience. The bottom line is don't cut corners with writing in this class. Your ability to communicate what you have done is just as important as what you have done. Don't make the mistake of spending all of your time writing code, and then rush through the writing parts of the assignment. See below for some stylistic guidance.
Since the dawn of time, college students have been writing introductions that are too grandiose (and often false). Just get into it; you don't have to zoom in from space (and make stuff up). Write the abstract last and summarize the most important result.
The passive voice is a hoax. Please write in the first person (singular or plural).
Write in the present tense whenever possible. Some actions happen in the past, but all research exists in an eternal present: "Using data from BRFSS, they plot weight versus height and find that tall respondents are heavier, on average, than shorter respondents. Our results are consistent with this conclusion."
When you edit, try to remove words.
Avoid using "significant" unless you mean "statistically significant;" a good alternative is "substantial." Avoid using "correlation" unless you mean a coefficient of correlation. In general, there might be a "relationship" between variables and you might characterize it with a correlation.
"Trend" usually means something is changing in time; "pattern" is more general.
Provide indicators of logical flow. If I ask you to answer a set of question in your report, don't just paste the questions as your section headings. You should create your own document structure based on what makes sense for your project.
Explain your motivation. Avoid "We were interested" and "We wanted" as pseudo-motivation. And don't look at the camera; that is, don't use "because we were told to" as motivation, and avoid "for this project".
When you refer to figures and tables, capitalize "Figure" and "Table." Formal figures have a number and a caption, and they can float. Informal figures are part of the text flow. If a figure doesn't contain much information, summarize it in text.
Find an authorial voice that is casual enough to be engaging (without overdoing it).