Instructor: Balagopal Komarath, Mayank Singh (office: AB-13/402B, email: singh.mayank@iitgn.ac.in)
Location: Jasubhai Auditorium
Class Timings: Monday 10:00-11:30
Lecture Slides: The link to lecture slides will be updated before each class
QA sessions: Email me to book an appointment to discuss any doubts, clarifications, and concepts.
Data Analysis: Use of spreadsheets for data analysis; data manipulation specific Python libraries e.g. Pandas; introduction to SQL. Answering queries over tabular data: map-filter-reduce, joins and unions. Using vectorized operations for efficiency. Techniques: Data structuring (tidy data) and cleaning (dealing with multiple labels for the same data, missing data). Summarizing data (averages, variance, moments), functions of tables (loading, cleaning, normalizing). Introduction to data visualization using matplotlib; plotting various statistics; categorical distributions, numerical distributions, overlaid graphs.
Making Predictions: Linear regression; basics of classification (train, test, validation); creating features; naive Bayes classification, logistic regression and its interpretation; introduction to clustering (k-means). Confidence in predictions: prediction intervals, confidence intervals.
Introduction to Pandas [Colab Notebook]
Introduction to SQL [Slides]
Machine Learning [Colab Notebook]
Matplotlib [Colab Notebook]
Quiz - 1 [Link]