Instructor:
Naseef Mansoor, Ph.D. (Pronouns: He/Him/His)
Office: WH 235
TA: TBD
Class Technology:
1. Laptop or desktop with microphone and web camera.
2. High-speed Internet access capable of streaming both audio and video.
3. Software/Apps: D2L Brightspace, Zoom (For online Classes only), Zybooks, Lockdown Browser
For IT issues please consider contacting IT Solutions Center.
Course Description:
This course introduces data science, discusses opportunities and challenges associated with data science projects, and develops competencies related to data collection, data cleaning, data analysis, and model evaluation. The course focuses on hands-on exercises using data analytics tools.
Learning Outcomes:
After successfully completing this course, students will be able to:
Define data science and common terminology associated with data science.
Discuss examples of how data science and data analytics are being used to improve decision making and solve complex problems for organizations and society.
Explain challenges related to data science, especially data privacy, ethics, legal issues, and organizational barriers. Identify social, legal, organizational, and ethical issues related to data science projects and recommend solutions and strategies for mitigating such risks.
Describe and design the data science process (framing the problem, solving the problem, and communicating and acting on results).
Explain and apply common data science libraries in Python and other tools.
Explain and differentiate various data analytics models including descriptive, predictive, prescriptive, and optimization and their use cases.
Perform data collection using Python including reading and writing data in various formats and a variety of sources (such as enterprise systems, web, social media, and sensors).
Clean and format data for analysis; perform common data cleaning tasks (such as merging, reshaping, pivoting, handling duplicates, replacing missing values, binning, grouping, manipulating strings, detecting and handling outliers, scaling, and sampling) using Python.
Create and use Python data structures to organize both numerical and text-based data.
Perform exploratory data analysis and visual analytics in Python and other tools.
Explain basic statistical concepts, discuss appropriate statistical procedures for various use cases; create and test statistical models in appropriate Python libraries.
Evaluate models using various metrics for performance; discuss model deployment, testing, and documentation.
Construct applications using additional Python libraries related to big data.
Prerequisite:
IT 310/CIS 223 and IT 340/CIS 340
Course Materials:
There is no dedicated textbook for this class. The following resources are helpful and is available online.
1. “Data Science from Scratch First Principles with Python (2nd edition)” by Joel Grus, Publisher: O Reilly. (Available Online), 2019
2. “Practical Statistics for Data Scientists” by Peter Bruce and Andrew Bruce, Publisher: O Reilly. (Available Online), 2017
3. “Introduction to Machine Learning with Python A Guide for Data Scientists” by Andreas C. Müller and Sarah Guido, Publisher: O Reilly. (Available Online), 2017
4. "Introduction to Probability and Statistics" by Jeremy Orloff and Jonathan Bloom (Available Online)
5. "Linear Algebra" by Gilbert Strang (Available Online)
6. "Pattern Recognition and Machine Learning", Chistopher Bishop, Springer, 2006 (Available Online)
7. Websites:
https://towardsdatascience.com/
Programming Language:
We will be using python and other python libraries/frameworks in this course.
Note: For the tutorials you will need google colab to view and work with them.
Course Website:
Log in at https://mnsu.ims.mnscu.edu/d2l/home and find the ### CIS 418/518-xx Foundation of Data Science. This website will be used for:
Lecture slides, assignments, project documents, and any related reading materials (if needed)
Uploading assignments, presentations, and/or homework (email submissions will not be graded)
All class related announcements will be posted on the d2l page.
Students should check the course homepage one hour before the class for new announcements.
Course Topics:
Module 1: Introduction
Introduction to Data science, Terminology: Big Data, Structured, Unstructured Data, Data Sources; Application of data science and data analytics in modern world; Data privacy; Ethical conundrums and legal implications associated with data science; data science process; Python basics, Web Data scrapping.
Module 2: Data Wrangling, Exploration and Visualization
Data Wrangling using Pandas; Data visualization and exploration using seaborn and Matplotlib; Data Reporting using Dashboarding.
Module 3: Statistics and Probability
Descriptive Statistics: Central Tendency (Mean, Median, Variance), Sample and Population, Sampling Techniques, Central Limit theorem, Correlation, and covariance; Inferential Statistics: Confidence Interval, Hypothesis testing: Parametric; Probability: Random Variable, conditional Probability, joint probability, marginal probability, Bayes theorem, probability distribution function: Bernoulli, Binomial, Multinomial, Gaussian, exponential, Poisson, etc.
Module 4: Linear Algebra
Matrix, Scalars, and Vectors; Matrix Multiplication; Matrix Operations; Determinant; Matrix Inverse; Matrix Decomposition (e.g. SVD); Eigen Values and Eigen Vectors
Module 5: Machine Learning
Introduction to Supervised, Unsupervised learning, Reinforcement Learning; Linear Regression: OLS, Polynomial; Classification: Logistic regression, decision tree (etc.); Generative Models; Model Evaluation: Metric, cross validation; Hyperparameter tuning: Grid search.
Assessments:
Quizzes:
There are 4 quizzes during the semester. The quizzes will be conducted in person during class meeting time. The questions can range from multiple choice questions, short questions, performing computation, and coding.
Group Assignment:
There are four group assignments in this course. Students will form groups for these assignments. Each group can have at most 3 members. To form a group, students from 418 section should team up with classmates from 418 section of the course. Similarly, students in the 518 section should team up students from the 518 section.
Note:
The use of generative AI tools (such as ChatGPT, DALL-E, etc.) are not permitted in this class; therefore, any use of AI tools for work in this class may be considered a violation of Minnesota State University, Mankato’s Academic Honesty policy, since the work is not your own. The use of unauthorized AI tools will result in an F grade. Also, check the academic honesty policy section.
Grading:
Your course grade will be based on the quality of work submitted for the assignments, and quizzes. All work will be graded numerically and weighted based on the following table for final grade calculation. Grades for each item will be posted on D2L.
Bonus: Based on your attendance and participation you will receive bonus points in the course. Below of the distribution of the bonus.
Grading Scale:
Letter grades will be assigned at the end of the semester according to the following scale. Students must have a percentage at or above the value listed to earn the corresponding letter grade. The percentage is calculated by weighing each grade item based on the table above.
Note:
Grades are based on the quality of your work and on how well you are prepared for class. If you believe something has been graded incorrectly, or if a grade has been recorded incorrectly, an email message must be sent to the instructor stating your concern no later than one week after the grade has been posted. Queries on grade after a week of grade posting will not be considered.