CO1: Demonstrate proficiency with statistical analysis of data.
CO2: Use inferential statistics for decision making.
CO3: Apply supervised learning for classification and regression problems.
CO4: Apply unsupervised learning for clustering.
Course Content
Introduction Programing and Applications: Introduction to data analytics, Python fundamentals. Data Quality and Pre-processing: Distance measures, dimensionality reduction, principal component analysis (PCA). Descriptive Statistics: Graphical approach - Frequency tables, relative frequency tables, grouped data, pie chart, bar chart, histograms, ogives, stem and leaf plots, box plots, dot diagram, scatter plots, Pareto diagram. Measure of Central Tendency and Dispersion - Arithmetic mean, median and mode, variance, standard deviation, quartiles, range, mean absolute deviation, coefficient of variation, Z scores, normal distribution, confidence interval estimation.
Mathematical Basics and Analysis: Probability Distribution and Inferential Statistics: Random variables, probability distributions, hypothesis testing, single sample test, two sample test, Type I error, Type II error, Analysis of Variance (ANOVA).
Advanced Algorithms to Process System Applications: Supervised learning: Linear regression, ridge regression, Lasso regression, logistic regression, multiple linear regression, goodness of fit, bias–variance trade off, k-nearest neighbors algorithm, linear discriminant analysis, classification and regression trees and pruning, support vector machines, random forest, Naive Bayes, Introduction to deep learning. Unsupervised learning: Cluster analysis – K Means, hierarchical, DBSCAN. Applications to different engineering systems.
Reference:
T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning, 2nd Edn., Springer, 2009
E. Alpaydın, Introduction to Machine Learning, 3rd Edn., MIT Press, 2014
D. C. Montgomery, G. C. Runger, Applied Statistics and Probability for Engineers, 6th Edn., John Wiley &Sons Inc., 2016.
P. N. Tan, M. Steinbach, A. Karpatne, V. Kumar, Introduction to Data Mining, 2nd Edn., Pearson, 2019.
J. M. Moreira, A. C. P. L. F. de Carvalho, T. Horváth, A General Introduction to Data Analytics, Wiley, 2019
K Tangirala, Principles of System Identification: Theory and Practice, 2nd Edn., CRC Press, 2020.
R. Rengaswamy and R. Suresh, Data Science for Engineers, CRC Press, 2022