CS307 Machine Learning
CS307 Machine Learning
Class Timing: Wednesday 9.00-10.30 AM and Friday 9.00-10.30 AM at C303
Lab Timing: Monday 11.00 AM-1.00 PM at L106 and L107
Theoretical Assignment 1: 5 February, 2025
Mid Sem Examination: 17 February, 2025 10.00AM-12.00PM
Theoretical Assignment 2: 30 March, 2025
End Sem Examination: 17 April, 2025 (BTech) / 21 April, 2025 (MTech & PhD)
Credit Structure: 3-0-2-4-4
Understanding regression, classification and clustering techniques and algorithms,
Processing univariate and multivariate data,
Analyzing the performance of various machine learning techniques,
Selecting appropriate features for training machine learning algorithms,
To create new machine learning techniques.
Unit 1:
Introduction and Concept Learning:
Overview of Machine Learning – The Landscape. The Learning Problem.
Feasibility of Learning. General-to-Specific Hypotheses Ordering. Find-S and Candidate Elimination Algorithm. Version Space and Inductive Bias.
Bayesian Learning:
Probability Overview, Bayes Theorem, MLE, and MAP Estimates. Naive Bayes Classifier.
Unit 2:
Instance-based Learning
k-Nearest Neighbour (kNN) Classifier. Voronoi Diagram and Distance-Weighted kNN, Distance Metrics and Curse of Dimensionality.
Computational Complexity: Condensing and High Dimensional Search.
Classifier/Hypothesis Evaluation
Accuracy, Precision, Recall and F-Measures. Scores, Sampling, Bootstrapping and ROC, Hypotheses Testing and Cross-validation.
Unit 3:
Linear Models and Regression
Linear Classification. Linear Regression. Non-linear Transformation. Logistic Regression.
Computational Learning Theory
Error and Noise Formalisms. Training vs. Testing. Theory of Generalization. PAC Learnability and VC Dimensions. Bias-Variance Trade-offs. Overfitting, Regularization and Validation.
Unit 4:
Artificial Neural Network
Perceptron Learning Algorithm: Delta Rule and Gradient Descent. Multi-layer Perceptron Learning: Backpropagation and Stochastic Gradient Descent. Hypotheses Space, Inductive Bias and Convergence.
Support Vector Machines
Decision Boundary and Support Vector: Optimization and Primal-Dual Problem. Extension to SVM: Soft Margin and Non-linear Decision Boundary. Kernel Functions and Radial Basis Functions.
Unit 5:
Decision Tree Learning
Decision Tree Representation and Learning Algorithm (ID3). Attribute Selection using Entropy Measures and Gains. Hypotheses Space and Inductive bias. Overfitting, Generalization, and Occam's Razor
Ensemble Learning
Bagging and Boosting. Adaboost and Random Forest.
Unit 6:
Unsupervised Learning
Gaussian Mixture Model (GMM), E-M Algorithm, Partitional Clustering and Hierarchical Clustering. Cluster Types, Attributes, and Salient Features. k-Means, Hierarchical and Density-based Clustering Algorithms. Inter and Intra Clustering Similarity, Cohesion and Separation. MST and DBSCAN Clustering Algorithms. Dimensionality Reduction and Principal Component Analysis (PCA)
Other Learning Paradigms
Deep Learning. Reinforcement Learning. Transfer Learning. Semi-supervised Learning. Active Learning. Explainability in Learning.
Week 1:
Lecture 1: Decision tree learning; Source: Slide 1, Mitchell
Lecture 2: Overfitting, Probabilities; Source: Slide 2, Mitchell
Lab 1: Lab intro
Week 2:
Lecture 3: Bayes rule, MLE, MAP; Source: Slide 3, Mitchell
Lecture 4: Conditional independence, Multinomial Naive Bayes; Source: Slide 4, Mitchell
Lab 2/PA1: Implement decision tree a given set of training data in a .CSV file, test on a test data and demonstrate its performance in classification tasks.
Week 3:
Lecture 5: Gaussian Bayes classifiers, Document classification; Source: Slide 5, Mitchell
Lecture 6: Logistic Regression, Gradient ascent; Source: Slide 6, Mitchell
Lab 3/PA2: Classify a set of documents using the Naïve Bayesian Classifier and evaluate the model's performance.
Week 4:
Lecture 7: Generative/Discriminative models, bias-variance decomposition; Source: Slide 7, Mitchell
Lecture 8: Find-S and Candidate Elimination Algorithm, PAC and Statistical Learning Theory; Source: Chapter 1,2,&7, Mitchell and Slide 8, Mitchell
Lab 4/PA3: Implement the Gaussian Naïve Bayesian classifier for training data stored in a .CSV file and evaluate its accuracy using test datasets.
Week 5:
Lecture 9: Sample Complexity, Shattering and VC Dimension; Source: Slide 9, Mitchell
Lecture 10: Overfitting and Regularization; Source: Slide 10, Mitchell
Lab 5/PA4: Construct a Logistic Regression classifier.
Week 6:
Lecture 11: Bayes Nets; Source: Slide 11, Mitchell
Lecture 12: Learning from fully and partially observed data; Source: Slide 12, Mitchell
Lab 6/PA5: Implement Linear Regression module and determine its performance.
Week 7:
Lecture 13: EM algorithms for supervised learning; Source: Slide 13, Mitchell
Lecture 14: EM algorithms for Gaussian clustering; Source: Slide 14, Mitchell
Lab 7/PA6: Implement Find-S & Candidate Elimination Algorithm
Week 8:
Midsem
Week 9:
Lecture 15: Ensemble Learning, Bagging & Boosting, Random Forest, Adaboost; Source: Slide 15, Mitchell
Lecture 16: Geometric Margins and Perceptron; Source: Slide 16, Mitchell
Lab 8/PA7: Write a program to construct a Bayesian network considering medical data. Use this model to demonstrate the diagnosis of heart patients using standard Heart Disease Data Set. You can use Java/Python ML library classes/API.
Week 10:
Lecture 17: Geometric Margins, Kernels: Kernelizing a Learning Algorithm, Kernelized Perceptron; Source: Slide 17, Mitchell
Lecture 18: Geometric Margins, SVM: Primal and Dual Forms, Kernelizing SVM, Semi-supervised Learning, Semi-supervised SVM; Source: Slide 18, Mitchell
Lab 9/PA8: Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for clustering using k-Means algorithm. Compare the results of these two algorithms and comment on the quality of clustering. You can add Java/Python ML library classes/API in the program.
Week 11:
Lecture 19: Transductive SVM, Co-training and Multi-view Learning, Graph-based Methods; Source: Slide 19, Mitchell
Lecture 20: Batch Active Learning, Selective Sampling and Active Learning, Sampling Bias; Source: Slide 20, Mitchell
Lab 10/PA9: Write a program to implement Different ensemble methods (Bagging, Boosting, Random Forest, Adaboost) to classify the iris/any other data set. Base classifier can be of your choice.
Week 12:
Lecture 21: Clustering, k-means, DBSCAN, Hierarchical Clustering; Source: Slide 21, Mitchell
Lecture 22: Principal Component Analysis; Source: Slide 22, Mitchell
Lab 11/PA10: Build an Artificial Neural Network by implementing the Backpropagation algorithm and test the same using appropriate data sets.
Week 13:
Lecture 23: Never Ending Learning; Source: Slide 23, Mitchell
Lecture 24: Neural Networks, Deep Learning; Source: Slide 24, Mitchell
Lab 12/PA11: Build a SVM and test using appropriate data sets.
Week 14:
Lecture 25: Markov Decision Processes, Value Iteration, Q-learning; Source: Slide 25, Mitchell
Lecture 26: Deep Learning, Differential Privacy, Discussion on the Future of ML; Source: Slide 26, Mitchell
Lab 13/PA12: Final Assessment
Week 15:
Lecture 27: Extra class
Lecture 28: Extra class
Lab 14: Extra lab
Week 16:
Endsem
Tom Mitchell, Machine Learning, 1st Edition, McGraw Hill Education, 1997, ISBN 0070428077
Ethem Alpaydin, Introduction to Machine Learning, 4th Edition, The MIT Press, 2020, ISBN 978-0-262-028189
Bishop, Christopher M. Pattern Recognition and Machine Learning. Springer, 2006. ISBN 978-0387310732.
Duda, Richard O., Peter E. Hart, and David G. Stork. Pattern Classification. 3rd ed., Wiley-Interscience, 2012. ISBN 978-0471056690.
Raschka, Sebastian, and Vahid Mirjalili. Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2. 2nd ed., Packt Publishing, 2019. ISBN 978-1789955750.
2 Theoretical Assignments (20%); Mid Sem (20%); End Sem (30%), 12 Practical Assignments (30%)