CS620 --Introduction to Data Science and Analytics

Fall 2023

Course Overview

Welcome to CS620/DASC600 Data Science.  

Note that the course content will be delivered via https://canvas.odu.edu/ for all the registered students. The recitation sections will be delivered each Tuesday by Dr. Yi He. We will utilize the recitation time to discuss about the weekly course content, assignments, class activities and the project work. 

Course Overview Data science is an interdisciplinary blend of the analytical, computational, and statistical skills necessary to extract knowledge from large and complex sets of data. The proliferation of such data has led to an acute shortage of students with data science skills in the local, national, and global economies. 

This course will introduce students to this rapidly growing field of Data Science and equip them with some of its basic principles and tools as well as its general mindset. Students will learn concepts, techniques, and tools they need to deal with various facets of data science practices. Cross-listed with DASC 600. 


Course Objectives    Students completing this course should be able to: 

Basic Information

Instructor: Yi He

Office: E&CS 3108                                                                     Email: yihe@cs.odu.edu

Office Hour: Tuesday, 10AM – 11AM, or by appointment        Classroom: DRGS 1117                

Meeting Time: 11 AM -- 12:15 PM Tuesday

Reading & Project Write-ups: 11 AM -- 12:15 PM Thursday (No in-person meetings)

Grading

Final course grades are based on the overall average. Overall class grade (not the individual grade) windows may be increased in size if the instructor finds it appropriate. Final score in % will be rounded to the nearest whole number. Assigning + or – grades may be made at instructor’s discretion.)

  A: 94-100, A-:90-93, B+:87-89, B: 84-86, B-:80-83, C+:78-79, C: 74-77, C-:70-73, Fail (Grade F): 0-69 

The scores you receive on the various graded tasks in the class will be weighted as follows:

Grading correction: The assignment or exam grading correction requests should be sent to the instructor within 1 week of receiving the grade, or before the end of the semester, whichever comes first. After that, your grade will not be adjusted. If you find a mistake in grading, please let the instructor know. Your grade will not be lowered.

There is no separate grading scale for PhD students, but PhD students will typically be held to a higher standard.


The scores you receive on the various graded tasks in the class will be weighted as follows:


We will have five homework assignments, in total worth 25% of your overall grade.


Class activities and participation in the discussion are both important to your success in the course. As one measure of your participation and course preparation, we will have class activities related to lecture topics to supplement the learning. 


Final examination will be a comprehensive (covering all the modules), closed-book exam and will be scheduled during the last week of the class. On the week before final exam, I will post a study guide that will help students to prepare for the written examination. You may have one standard 8.5" by 11" piece of paper with any notes you deem appropriate or significant (front and back) for the final exam.


The data project is an opportunity to tackle a more challenging data science activity. Details, requirements, and submission information will be on the project section of the course web page. For the project, you will work individually or team of 2-3 students on a problem of your choosing that is interesting, significant, and relevant to data science. The ultimate goal of your course project is to tackle some interesting real-world problems.  All members of a group will receive the same grade on group work. Therefore, it is in your interest to choose other group member (ideally, first week of the class) who have the same goal in the class as you do. It is also in your interest to work together and ensure that all tasks are completed effectively. Your scores on group work may be adjusted based on your contribution. The goal of your data project is to apply the techniques learn in each week of the class towards your dataset (exploration, wrangling, machine learning, visualization). We are going to use Google Colab (Colaboratory) (https://colab.research.google.com/), a free Jupyter notebook environment that requires no setup and runs entirely in the cloud. With Colaboratory you can write and execute code, save and share your analyses, and access powerful computing resources, all for free from your browser.

Textbook

No textbook is required in this course in general. Below are some recommended materials that could consolidate your background knowledge, so as to facilitate your understanding of what shall be covered in this course.

Communication

If you send email to me (for any urgent matter such as health issue etc.,), please be sure to include your name and the course number in the body of the e-mail. You should also use an appropriate subject line that looks like “CS620-Health” etc. Failure to follow these guidelines may result in delayed response. 

Course Schedule (Tentative)

Please check the course website periodically for updates. Substantial changes will also be announced via emails. 

Homework Assignments

The homeworks are to be done as individuals. There will be five homework assignments


Data Project

Milestone Due Dates: 

Introduction

The data project is an opportunity to tackle a more challenging data science activity. For the project, you will work in individual or a team of 2-3 students on a problem of your choosing that is interesting, significant, and relevant to data science. More members you have (2 or 3), my expectations from the project will be high compared to an individual project, so choose carefully. The ultimate goal of your course project is to develop to tackle some interesting real-world problem.  All members of a group will receive the same grade on group work. Therefore, it is in your interest to choose other group member (ideally, first week of the class) who have the same goal in the class as you do. It is also in your interest to work together and ensure that all tasks are completed effectively. Your scores on group work may be adjusted based on your contribution. The ultimate goal of your data project is to apply the techniques learn in each week of the class towards your dataset (exploration, wrangling, machine learning, visualization). You can utlize any rosources for this project, but I highly reocmmend using Google Colab (Colaboratory) (https://colab.research.google.com/), a free Jupyter notebook environment that requires no setup and runs entirely in the cloud. With Colaboratory you can write and execute code, save and share your analyses, and access powerful computing resources, all for free from your browser.

The assignment is flexible: choose a topic of interest to you and your group and carry out a cohesive, complete project based around it. The range of possible topics that you can choose among is broad. However, the project you pick should incorporate a dataset and wide range of data science techniques. I think the most interesting problems will be ones in which you identify and work with some "client" to develop a solution to one of their problems. Such clients can include organizations with which you are involved, work site, etc. You can propose to carry out a project as part of a larger effort. However the caution here is that you will need to be able to separate out the contribution made by this class' project from the rest.

You will need to prepare a written project abstract at Google Colab and get it approved by the instructor, project progress report, prepare a final report (written), and give a demo. Peer-assessment: Individual student's grades for projects will be influenced by their teamwork as evaluated by their project group members. This will be applied as an overall weight to the term project grade.


Project Abstract

The abstract (in Google Colab) should include the following information: 

Each member name, email, web portfolio link in the very first lines of the Colab.

 ○    Data  Source (if any)

○     Your end goal with this dataset (build a recommender system, prediction model/classifier, evaluation of models, visualizing something, infer something, or something else)

○     Any secondary datasets you are planning to utilize to augment your primary dataset (should be clearly specified that this is a secondary dataset)

Project Plan/ Gantt Chart.  Team member contribution plan (if a team project)

●     You need to have an acceptable abstract submitted by the deadline. Without abstract you'll recive zero for your project grade. 

●     Submit your Colab link to Piazza thread.

Project Progress Checks I and II (continue your report at Colab)

In this progress checks, you should assess the progress you are making on your project and update the work plan as necessary. Continue your earlier Colab document documenting your progress towards the project.Start with your proposal or previous progress report (if any) and add the following content to your progress report .

The progress should inform about, 

Project Presentation (10 Minutes Video)

Use Zoom (you have access to zoom pro via ODU https://www.odu.edu/ts/collaboration-tools/zoom) or any other video recording tool to record a 10 minute or less (2-3 pts penelty will be applied if more than 10 minutes) video of your project work and upload it to YouTube. You can show your implementation/demo (use the screenshare option) and also your presentation slides or Google Colab. Your presentation/Demo should succinctly tell us *why* we should care and *what* interesting insight you have about the chosen data project. Give us some insight into the tough / cool / interesting aspects of your project. This is your time to shine, so carefully prepare what exactly you want to show off that will impress us in this summary. View the audience as potential upper management in your company -- so convince us that your problem is important, that you have the appropriate insight about the dataset.


Follow the Guidelines preparing your Summary section for the talk (This should be at the very end of your Colab)

Project Final Report

A comprehensive report describing the project. This should be a "complete" document, so it should include front matter (title page, abstract, table of content, chapters), or a sidebar index that connect to your report elements. These should include problem statement, explain your design and implementation, results and evaluation. This report should stand by itself as the archival description of the project.    


Data sources for projects