SM217 Spring 2021-2022

Welcome to SM217, Introduction to Data Science at the U.S. Naval Academy. Our theme for the course is Data Science for Decision Makers. We'll be learning how statistics can be used to support decision making. This is in line with recommendations made by the 2016 GAISE College Report, endorsed by the American Statistical Association. It is also a direct response to remarks by our Superintendent, Vice Admiral Sean Buck, who encouraged us to develop a data science curriculum to make midshipmen more effective officers. The material in this course is based off pioneering work in the Data8 program at U.C. Berkeley but we've modified their curriculum to adapt to the needs of the Naval Academy, the Navy and the Marine Corps. Our curriculum development efforts have been supported by two generous grants from the Office of Naval Research.

Data science is a modern approach to statistics that blends computation with statistical theory. We'll use Python, the industry-leading programming language for data science. Despite its broad capabilities, our course will focus on using Python for data manipulation. visualization, and statistical computation. Students wishing further instruction in computer programming. are encouraged to take SI286: Programming for Everyone. Course content includes data organization and manipulation, data visualization with an emphasis on briefing senior leadership, probabilities and Bayes' Rule for updating probabilities in light of new information, hypothesis testing, confidence intervals via bootstrapping, applications of the Central Limit Theorem and an introduction to distributions, regression and inference for regression, predictive modeling and an introduction to machine learning, an overview of ethics in machine learning, and classes devoted to critical thinking in the context of decision making with data science. This course is mainly taken by midshipmen with majors in the School of Humanities and Social Science and many examples have been chosen from these areas, as well as in applications of interest to the Navy and the Marine Corps.

In the schedule below, CIT refers to U.C. Berkeley's freely available online textbook, Computational and Inferential Thinking: The Foundations of Data Science by Ani Adhikari and John DeNero with contributions from David Wagner and Henry Miller. Other readings and lecture notes can be found on their class day's webpage. There is also a collection of all the non-textbook class readings and lecture notes. This is a large document, more suited to computer search than everyday use. This page is maintained by Prof. Will Traves (traves@usna.edu); please contact him with questions or comments.

SM217 Spring AYE 2022 Schedule