Syllabus

Location and Time - CSI 388, 11:20-12:35 TR

Professor - Dr. Mark Lewis, Office: CSI 270H, Phone: 999-7022, e-mail: mlewis@trinity.edu. The best way to reach me typically is by e-mail. I check it frequently and try to respond promptly.

Office Hours - See my T-mail calendar or by appointment. I'm in my office a lot so you should free to drop by. If you are coming from lower campus you can always call or write a short e-mail to see if I'm in and available at that time. I can also do some virtual office hours using Google+ and Hangouts.

Text - "Introduction to Machine Learning" by Ethem Alpaydin. I also plan to post videos students can use for review/learning purposes. Here are some other books that you might consider buying if you are interested.

Course Description - This course is intended to be an introduction to data science, focusing largely on the computational aspects of the field. We will specifically learn about the Spark framework for big data analytics. Apache Spark is a distributed data framework developed to be used on large clusters to look at data sets that cannot be processed on single machines. We will also talk about the nature of data science, and introduce enough stats to allow you to perform the required analytics work. The course itself will be very hands-on, and the workload will largely focus on writing code to perform analyses of various datasets.

Work/Time Expectations - You signed up for an upper division Lewis course the first time I have ever taught it. I own you. You are the guinea pigs for this course.

Coding Practices - A lot of what you do for this course will be writing code that you turn in to me. Code should be well formated and reasonably documented. Code written outside of class that you turn in to me should be of your own construction. All code you turn in is pledged.

Grades - The grade for this course will be composed of five components. These components and what they entail are discussed below. This table summarizes how each component contributes to your grade in the course. All items turned in for a grade in this course are to be pledged. For code, the pledge statement should be put in a comment at the top of the code.

Data Sets - At the beginning of the semester I am going to ask you to go out and find three data sets that you find interesting. You will write a short paper where you describe each of these data sets, why you find them interesting, and what questions you would want to answer using those data sets. At the end of the semester, we will revisit this and you will do an analysis of what you wrote originally taking into account what you learned during the semester.

Coding Problems - Roughly every week you will also have a problem set to work on that you will do on your own outside of class.

Quizzes - There will also be six quizzes given during the course of the semester. These quizzes will cover readings about data science as well as aspects of Spark that we have talked about.

Final Project - At the end of the semester each of you will do a larger project that you will present to the class during the normally scheduled final time.