Welcome to DSC 10 at UC San Diego! This course aims to teach you the basics of how to explore, make predictions from, and draw conclusions about data. We will learn some of the core techniques of data science, and we will practice applying them to real data sets from a variety of different disciplines. Programming is an essential tool to help us analyze and manage large data sets, and so we will learn how to program in Python towards this goal. We will also learn some basic probability and statistics.
Prerequisites: None. This course is an introduction to data science with no prior background assumed beyond high school algebra. If you are not planning on entering the DSC major/minor and have already taken a programming class and a statistics class, you may wish to take a more advanced course. If you are a DSC major/minor, DSC 10 is absolutely required, as later courses heavily reference its specific content.
Instructor: Dr. Mikio Aoi
Teaching Assistants: Sanjay Damani
Tutors: Dylan Lee, Karthikeya Manchala, Ayush More, Trinity Pham, Yash Potdar,
Contact Information: Please contact us through the course discussion board on Campuswire.
Office Hours: Please see the course calendar on Canvas for Office hours and Zoom links.
Lecture (Remote, Synchronous):
MWF 1-:50pm PST through Zoom (link available on Canvas)
Discussion (Remote, Synchronous):
Monday 4-4:50pm through Zoom (link available on Canvas)
Midterm Exam: due Wednesday, May 5 at 11:59pm PST
The midterm will be available starting Tuesday, May 4 at 11:59pm PST and must be completed in 1.5 hours.
Final Exam: due Friday, June 11 at 11:59pm PDT
The final will be available starting Thursday, June 10 at 11:30am PST and must be completed in 3 hours.
Lectures and discussion section will be given synchronously through Zoom. You can find the Zoom links in our Canvas course. If you are able, please turn your video on during lecture and discussion.
Discussion section focuses on solving concrete problems using the techniques introduced in lecture, and is excellent preparation for the week's assignments, as well as exams. We expect that students in this class will have a wide range of backgrounds and relevant experience. If you are new to programming, you will especially benefit from taking advantage of the opportunity to attend discussion section and review the material from lecture. Attendance in discussion is highly recommended but not required.
Lecture and discussion sections will be recorded and posted on Canvas for students in other time zones who cannot attend synchronously. If you are able, you are encouraged to attend synchronously to ask questions and to participate in concept-check polls and discussion.
Note that the schedule of courses also includes a designated Laboratory time for this class (Wednesday 4-4:50pm). We will not be utilizing this time for anything this quarter, so you can ignore this Laboratory time and schedule other things during this time.
Since this is a large class, we will break up students into smaller teams to simulate the environment of a small class within the big one. This will be an opportunity for you to get to know the people on your team and the tutor assigned to mentor your team. For data science majors (minors), it will be valuable to form connections with people in your major (minor) who have common interests, and perhaps down the road you'll pursue a project together or work as teammates in another data science course.
Each discussion section will be split into two teams, and each team will be assigned a tutor as a mentor to oversee the team discussion and help answer questions. Your team assignment is available here, sorted by PID.
Within our Campuswire discussion board, each team will have a private chatroom. We ask that you first post questions to your team's chatroom before posting on the main discussion board. If your question gets resolved within the chatroom, great! If not, then feel free to post to the main discussion board. Research shows that discussion boards work best with moderate size discussion groups, and we encourage you to be an active participant on Campuswire!
For some assignments in this class (homeworks and projects) you will be able to work with a partner, but that partner must come from your same team. You can use your team's Campuswire chatroom to connect with potential partners. You are not committed to the same partner for the whole quarter. This is to encourage everyone to be a good partner and pull their weight. If it's not working out, just look for a new partner within your team.
As you'll learn soon, in this class we'll be heavily using a Python library called babypandas. Going with the theme of baby animals, the teams will have the following names:
Joeys (baby kangaroos)
Fawns (baby deer)
Piglets (baby pigs)
Tadpoles (baby toads)
Ducklings (baby ducks)
Lambs (baby sheep)
Bunnies (baby rabbits)
Porcupettes (baby porcupines)
We encourage friendly competition between the teams! If you would like to request to switch teams you can fill out this form.
In order to get started in this class, you'll need to set up a few things.
1) Campuswire.
We'll be using Campuswire as our course message and discussion board. Please make a post on Campuswire if you have any questions about course content or logistics, or if you need to get in touch with the staff. If you didn't already get an invitation to our Campuswire course, you can sign up here. Make sure to join the chatroom for your assigned team as well as the #autograder chatroom.
2) Gradescope.
You'll submit labs, homeworks, and projects to Gradescope. Most questions will be autograded, meaning that a computer will run your code and check that it passes certain tests to verify that it works properly. If you didn't already get an invitation to our Gradescope course, contact your instructor.
3) Datahub.
Most assignments in this course will involve programming in Python. Datahub is UCSD's online data science and machine learning platform, where you will work on assignments. Everything you need for this course is already loaded into Datahub, so you can get started on assignments quickly without having to download anything. You should be able to log in to Dathub and see DSC 10 listed among the course environments, starting on the first day of classes. If you find that you are not able to, first check the FAQs for some common login problems, and if you're still having trouble, please contact us on Campuswire. To access assignments and course materials, first log into Datahub, then click this link.
4) Forms.
All students must complete an Integrity of Scholarship Agreement through this required form.
Last but not least, you should make sure to read this syllabus carefully and explore the class website. All times listed here are in San Diego time, which is Pacific Daylight Time (GMT-7).
The primary textbook for this class is Dive Into Data Science, a free online textbook that is currently under development specifically for DSC 10.
As the primary textbook is still a work in progress and doesn't yet cover everything we cover in this course, we'll also use a supplementary textbook, Inferential Thinking, which is again a free online book. This book comes from Berkeley's version of this course and uses slightly different Python commands, but the underlying concepts are the same.
You will also need access to a computer and a stable internet connection to participate in this course. UCSD has a Laptop Loaner program which may be helpful, but you should also contact me if you have any concerns about access to technology. You will not need a webcam for this course.
Your mastery of class material will be assessed in the following ways, and final grades will be computed as follows:
15% Lab Assignments (best 7 out of 8)
35% Homework Assignments (best 6 out of 7)
10% Midterm Project
10% Midterm Exam
15% Final Project
15% Final Exam
Each lab and each homework assignment will be worth the same amount, regardless of the number of points it is graded out of.
You must score at least 55% on the final exam to pass the course. If you score lower than 55% on the final, you will receive an F in the course, regardless of your overall average.
I will use a standard scale for assigning letter grades.
If you'd like to request a regrade on any assignment, you must do so within one week of the assignment being graded. If you think there is a problem with the autograder, please post in the #autograder chatroom on Campuswire. If you think there is a problem with how your written question was graded, submit a regrade request through Gradescope.
If you are taking the course P/NP, you will receive a grade of P if you meet the criteria for a C- grade, otherwise you will receive a grade of NP.
If you have extenuating circumstances that prohibit your completion of coursework, you may be eligible for an Incomplete grade. If you are considering using this option, the best thing you can do is let me know right away, and I can help you decide if this is an appropriate course of action. If you have any doubt about your ability to perform satisfactorily in this course due to something outside of your control, please contact me as soon as possible so we can figure out a plan.
Weekly lab assignments are a required part of the course and will help you develop fluency in Python and working with data. The labs are designed to help you build the skills you need to complete homework assignments and projects, in a low-stress setting.
As you complete the lab, you'll be able to run a sequence of tests, which check to make sure that your answers are correct. If you complete the assignment and all the tests pass, you'll get a perfect score!
Each person must complete and submit the lab on their own, but you are welcome to discuss the lab with others. You cannot copy or share answers, however.
To submit a lab, follow the instructions in the assignment to upload your notebook to Gradescope, which will run automated tests and assign your score. Lab assignments will be due on Tuesdays at 11:59pm PST. The lowest lab score is dropped from the grade calculation.
Weekly homework assignments build off of the skills you have developed in labs. Homeworks will reinforce concepts from class, explore new ideas, and provide hands-on experience working with data.
An important difference between labs and homeworks lies in the way tests are run. Unlike the tests in the labs, the tests in the homework cannot be used to guarantee that you have the correct answers. The tests in the homework only check to make sure that your answer is reasonable, not that it is correct. For example, if a homework question asks you to calculate a percent, the test in the homework might check that the answer you provide is a number between 0 and 100. You should make sure that all the tests pass before submitting your homework, but this will not guarantee a perfect score.
After you submit your homework to Gradescope, and after the deadline for submissions has passed, a new set of hidden tests will be run to make sure that you have the correct answers. In the percent example above, the hidden test might check that your answer equals 56, for example. Your score for the assignment will be based on the results of the hidden tests, which won't be available immediately after submission. So if you see a perfect score upon submission, this only means that you've passed the formatting tests, not the hidden correctness tests that determine your score.
You may work on homework assignments either alone or with a partner from your same team, using pair programming. This means that you should work on the assignment synchronously, discussing each problem together and writing each answer together, taking turns of who is in control of the coding. Please read the section of this website dedicated to pair programming to learn more about how this works, and some of the benefits of working in pairs. If working with a partner, only one of you should submit the assignment.
To submit a homework, follow the instructions in the assignment to upload your notebook to Gradescope. Check back after the deadline to see your score, based on the hidden correctness tests. Homeworks will be due on Fridays at 11:59pm PST. The lowest homework score is dropped from the grade calculation.
This class has two projects, a midterm project and a final project. Projects are like more challenging homeworks. They are longer than a typical homework, and they require you to pull together ideas from previous weeks, rather than just the last week. Projects also give you a chance to explore a data set in-depth, which can be a lot of fun!
Project tests are like homework tests: the provided tests only check if your answer has the correct format, not if it is correct. You'll only be able to see your score on the project after the deadline, once all projects are submitted and the hidden correctness tests have been run.
As with homeworks, you can work alone or with a partner from your team, using pair programming. This means that you should work on the assignment synchronously, discussing each problem together and writing each answer together, taking turns of who is in control of the coding. Please read the section of this website dedicated to pair programming to learn more about how this works, and some of the benefits of working in pairs. If working with a partner, only one of you should submit the assignment.
Labs, homeworks, and projects must be submitted by the 11:59pm PST deadline to be considered on time. You may turn them in as many times as you like before the deadline, and only the most recent submission will be graded, so it's a good habit to submit early and often. If you make a submission after the deadline, your assignment will be counted as late.
You have six "slip days" to use throughout the quarter. A slip day extends the deadline of any one homework, lab, or project by 24 hours. If you need to extend the deadline by 48 hours, you can, but this costs you three slip days. You cannot turn in any assignments more than 48 hours late.
Slip days are applied automatically at the end of the quarter, and you don't need to ask in order to use one. It's your responsibility to keep track of how many you have left. If you run out of slip days and submit an assignment late, it may still be graded so that you'll see what questions you missed, but the grade will be changed to a zero at the end of the quarter. If you use more than six slip days, we will count the first six as slip days and late assignments after that will get zero scores.
Slip days are designed to be a transparent and predictable source of leniency in deadlines. You can use a slip day if you are too busy to complete an assignment on its original due date (or if you forgot about it). But slips days are also meant for things like the internet (or the Datahub server) going down at 11:58 PM just as you go to submit your homework. Slip days are meant to be used in exceptional circumstances, so you probably should not need to use all six, but if you have something going on in your life that is impeding your ability to do your classwork on time, please reach out to me as soon as possible so we can work something out.
This class has one midterm exam and one final exam. Exams are cumulative, though the final exam will emphasize material after the midterm.
Exams will be administered through Canvas, Gradescope, or a similar online platform. Exams will be open-notes, open-book, open-internet, with the limitation that you must take the exam alone without communicating with any other person.
Each exam will be available for at least 24 hours, for you to take whenever is convenient for you, but once you start the exam, you will have a limited amount of time in which to finish it.
Midterm Exam: due Wednesday, May 5 at 11:59pm PST
The midterm will be available starting Tuesday, May 4 at 11:59pm PST and must be completed in 1.5 hours.
Final Exam: due Friday, June 11 at 11:59pm PDT
The final will be available starting Thursday, June 10 at 11:30am PST and must be completed in 3 hours.
I am committed to an inclusive learning environment that respects our diversity of perspectives, experiences, and identities. My goal is to create a diverse and inclusive learning environment where all students feel comfortable and can thrive. If you have any suggestions as to how I could create a more inclusive setting, please let me know. We also expect that you, as a student in this course, will honor and respect your classmates, abiding by the UCSD Principles of Community Please understand that others’ backgrounds, perspectives and experiences may be different than your own, and help us to build an environment where everyone is respected and feels comfortable.
Students requesting accommodations for this course due to a disability or current functional limitation must provide a current Authorization for Accommodation (AFA) letter issued by the Office for Students with Disabilities (OSD), which is located in University Center 202 behind Center Hall. If you have an AFA letter, please make arrangements to meet with the instructor and with the Data Science OSD Liaison by the end of Week 2 to ensure that reasonable accommodations for the quarter can be arranged. The Data Science OSD Liaison can be reached at dscstudent@ucsd.edu.
The basic rule for DSC 10 is: Work hard. Make use of the expertise of the staff to learn what you need to know to really do well in the course. Act with integrity, and don't cheat.
If you do cheat, we will enforce the UCSD Policy on Integrity of Scholarship. This means you will likely fail the course and the Dean of your college will put you on probation or suspend or dismiss you from UCSD.
Students agree that by taking this course, their assignments will be submitted to third party software to help detect plagiarism.
Why is academic integrity important?
Academic integrity is an issue that is pertinent to all students on campus. When students act unethically by copying someone’s work, taking an exam for someone else, plagiarizing, etc., these students are misrepresenting their academic abilities. This makes it impossible for instructors to give grades (and for the University to give degrees) that reflect student knowledge. This devalues the worth of a UCSD degree for all students, making it imperative for the the campus as a whole to enforce that all members of this community are honest and ethical. We want your degree to be meaningful and we want you to be proud to call yourself a graduate of UCSD!
The UCSD Policy on Integrity of Scholarship and this syllabus list some of the standards by which you are expected to complete your academic work, but your good ethical judgment (or asking us for advice) is also expected as we cannot list every behavior that is unethical or not in the spirit of academic integrity. Ignorance of the rules will not excuse you from any violations.
What counts as cheating?
In DSC 10, you can read books, surf the web, talk to your friends and the DSC 10 staff to get help understanding the concepts you need to know to complete your assignments. However, all code must be written by you, together with your partner if you choose to have one, where allowed.
The following activities are considered cheating and are not allowed in DSC 10 (not an exhaustive list):
Using or submitting code acquired from other students (except your partner, where allowed), the web, or any other resource not officially sanctioned by this course
Posting your code online, including on Campuswire, unless privately to instructors only
Having any other person complete any part of your assignment on your behalf
Completing an assignment on behalf of someone else
Providing code, exam questions, or solutions to any other student in the course
Splitting up homework questions or project questions with your partner and each working on different questions
Collaborating with others on exams
The following activities are examples of appropriate collaboration and are allowed in DSC 10 (not an exhaustive list):
Discussing the general approach to solving homework problems or projects
Talking about problem-solving strategies or issues you ran into and how you solved them
Discussing the answers to exams with other students who have already taken the exam after the exam is complete
Using code provided in class, by the textbook or any other assigned reading or video, with attribution
Google searching for documentation on Python or babypandas
Working together with other students on lab assignments without copying or sharing answers
Posting a question about your approach to a problem in a class discussion forum, without sharing your code
How can I be sure that my actions are NOT considered cheating?
The best way to avoid problems is by using your best judgement and remembering to act with Honesty, Trust, Fairness, Respect, Responsibility and Courage. Here are some suggestions for completing your work:
Don't look at or discuss the details of another student's code for an assignment you are working on, and don't let another student look at your code.
Don't start with someone else's code and make changes to it, or in any way share code with other students.
If you are talking to another student about an assignment, don't take notes, and wait an hour afterward before you write any code.
Note: in the discussion above, we are talking about other students that are not your pair programming partner. See the pair programming guidelines for information on working with a partner.