Course code: TD2223
Course Title: Data Analysis
Credits: 3
Course Coordinators: Anindya Goswami and Amit Apte
Pre-requisites:
Objectives (goals, type of students for whom useful, outcome, etc): The course will introduce students to basic techniques of data analysis in order to equip them with tools to understand and interpret data in the context of experiments, population statistics, and real-world numbers in general. The students will gain the following skills: design of experimental or observational studies, basic explorations, inference, and modelling.
Course contents:
First Half: experimental and observational studies; data summaries; visualization; (4 lectures, 1 tutorial), sampling distribution; foundations of inference; (3 lectures, 1 tutorial), confidence intervals (3 lectures, 1 tutorial), hypothesis testing; Bayesian hypothesis testing (4 lectures, 1 tutorial)
Second Half: linear, logistic, and nonlinear regression (5 lectures, 1 tutorial), analysis of variance (3 lectures, 1 tutorial), measurement errors, error propagation, experimental design (2 lectures, 1 tutorial) optional topics: statistical graphics, causal inference, scientific ethics,
Number of lectures: 24 (excluding tutorials), 6 tutorials
Evaluation:
End-sem examination 35%
Mid-sem examination 35%
Quiz 30%
Text-book:
- Introduction to Modern Statistics by Mine Çetinkaya-Rundel and Johanna Hardin, https://openintroims.netlify.app/ or https://leanpub.com/imstat or in the library
Additional (optional) reference:
- Introduction to Probability and Statistics for Engineers and Scientists by Sheldon M. Ross, Academic Press Inc, 6th edition (2021)
- OpenIntro Statistics by David Diez, Christopher Barr, and Mine Çetinkaya-Rundel, Fourth Edition, free PDF at openintro.org/os
Class Schedule: Tuesday 11:00 AM and Thursday 10:00 AM @ NLH
Tutorial Schedule: Every Monday at 11:00 AM @ LHC 101, 103, 105, 106, 107, 108
Office hour: Friday 16:00-17:00 for all batch
Anindya Goswami: A404, Main Academic Building
Amit Apte: Main Academic Building, 4th floor
Teaching Assistants:
Amruta Lambe <amruta.lambe@acads.iiserpune.ac.in> Student Office 525
Dheeraj KumarGehlot <dheeraj.kumargehlot@students.iiserpune.ac.in> Student Office 524
Nachiket Dravid <nachiket.dravid@students.iiserpune.ac.in> Student Office 526
Pinak Mandal Student Office 524
Ramjan Ali <ramjan.ali@students.iiserpune.ac.in> Student Office 526
Udit NarayanSahu <udit.narayansahu@students.iiserpune.ac.in> Student Office 524
Lectures
05/01 survey vs experiment, types of variables, sample vs observation, descriptive statistics,
06/01 When a small collection of statistics is not enough, illustrations of the empirical distribution are used to summarise the data.
08/01 Beyond hystogram or a frequency plot: Scatter plot for revealing the dependence structure of a multivariate sample, Box plot for rough comparison of distributions of multiple samples, Q-Q plot for detailed comparison of empirical distributions of two samples.
12/01 Tutorial and Quiz 1 [Max=10, Min=2, Mean=7.3, Median=7.5, STD=1.5 ]
13/01 Sample statistics take different values as the sample changes: Quantification of randomness in sample statistics
15/01 CLT gives an approximate distribution of the sample statistics, which is easy to compute.
19/01 Tutorial on HW -02
20/01 Monte Carlo Simulation using pseudorandom numbers for generating synthetic data from a probabilistic model. This helps to infer about the population without a real sample.
22/01 Antithetic variates method for variance reduction in Monte Carlo Simulation.
26/01 Holiday
27/01 Parametric Bootstrap to estimate bias, Confidence interval of a point estimator.
Exams
Result