SYLLABUS
M.SC. (DATA SCIENCE) I-YEAR, I-SEMESTER
MDS-101-T: PAPER- I: MATHEMATICAL FOUNDATIONS FOR DATA SCIENCE
UNIT – I
Linear Algebra: Vector spaces, Subspaces, Basis and dimension of a vector space, linear dependence and independence, spanning set. Linear transformation, kernel, range, Matrix Representation of a linear transformation, Matrices: Trace and Rank of a Matrix and their properties, Determinants, Inverse, symmetric, orthogonal and idempotent matrices and their properties, Gauss elimination, row canonical form, diagonal form, triangular form and its Applications, Characteristic roots and vectors, Statement of Caley-Hamilton theorem and its applications, Orthogonal and Spectral decomposition of a real symmetric matrix, Singular Value Decomposition.
UNIT – II
Combinatorics: Basic counting principle, inclusion-exclusion for two sets, pigeonhole principle, permutations and combinations, Binomial coefficient and identities, generalized permutations and combinations. principle of inclusion-exclusion, applications of inclusion-exclusion. Recurrence Relations: introduction, solving linear recurrence relations, generating functions. Real-time applications of combinatorial concepts.
UNIT-III
Graph Theory: Basic concepts to Graphs; Isomorphic graphs and simple problems; Trees: definitions, properties, and simple problems, tree traversals, spanning tree constructions: breadth and depth-first search, minimal spanning tree constructions: Kruskal’s algorithm. Planar and Hamiltonian graphs and simple problems and applications; Graph coloring and its applications. Real-time applications of graphical concepts.
REFERENCE BOOKS
1. Gilbert Strang (2016): Introduction to linear algebra, 5/e., Wellesley-Cambridge.
2. David C. Lay (2019): Linear Algebra and Its Applications, Pearson, 5/e.
3. Joe L. Mott, Abraham Kandel, Theodore P. Baker, Discrete Mathematics for Computer Scientists and Mathematicians
M.SC. (DATA SCIENCE) I-YEAR, I-SEMESTER
MDS-102-T: PAPER- II: DESIGN AND ANALYSIS OF ALGORITHMS
UNIT I
Introduction to Algorithms: Algorithm Specification, Performance Analysis, Randomized Algorithms. Elementary Data Structures: Stacks and Queues, Trees, Dictionaries, Priority Queues, Sets and Disjoint Set Union, Graphs.
Divide and Conquer: Binary Search, Finding the Maximum and Minimum, Merge Sort; Quick Sort, Selection sort, Strassen's Matrix Multiplication, Convex Hull.
Greedy Method: Knapsack Problem, Job Sequencing with Deadlines, Minimum-Cost Spanning Trees (Kruskal’s & Prim’s), Single Source Shortest Paths (Dijkstra’s).
UNIT-II
Dynamic Programming: General Method, Multistage Graphs, All-Pairs Shortest Paths, Single-Source Shortest Paths, Optimal Binary Search Trees, 0/1 Knapsack, Traveling Salesperson Problem.
Back Tracking technique: General Method, 8-Queens Problem, Sum of Subsets, Graph Colouring, Hamiltonian Cycles, Knapsack Problem.
UNIT -III
Branch-Bound technique: General Method, 0/1 Knapsack Problem, Traveling Sales Person problem.
NP-Hard and NP-Complete Problems: Basic Concepts, Cook's Theorem, NP-Hard and NP-Complete problems. Graph Problems, NP-Hard Scheduling Problems, NP-Hard Code Generation, Some Simplified NP-Hard Problems.
REFERENCE BOOKS
1. E Horowitz, S Sahni, S Rajasekaran, (2007): Fundamentals of Computer Algorithms, 2/e, Universities Press.
2. T.H. Cormen, C.E.Leiserson, R.L Rivert, C Stein, (2010): Introduction to Algorithms, 3/e, PHI.
3. R. Pannerselvam (2007): Design and Analysis of Algorithms, PHI.
4. Hari Mohan Pandey, (2009): Design, Analysis and Algorithm, University Science
SYLLABUS
M.SC. (DATA SCIENCE) I-YEAR, I-SEMESTER
MDS-103-T: PAPER- III: JAVA PROGRAMMING
UNIT – I
Java Programming Fundamentals: Introduction, Overview of Java, Data Types, Variables and Arrays, Operators, Control statements, Classes, Methods, Inheritance, Packages and Interfaces. I/O basics: Byte & Character Streams, Reading Console input and output, Scanner Class, Console Class, Print Writer Class, String Handling, Exception Handling, Multithreaded Programming. Overview of Networking: Working with URL, connecting to a Server, Implementing Servers, serving multiple Clients, Sending E-Mail, Socket Programming, Internet Addresses, and URL Connections. AWT: Introduction, AWT Class Hierarchy, Creating Container, Adding Components, Layout, Using Panel, Text Field, Text Area, List, Checkbox, Checkbox Group, Choice, Event Handling, Dialog Boxes, Scrollbar, Menu.
UNIT – II
Swing: Containment Hierarchy, Adding Components, JTextField, JPasswordField, JTable, JComboBox, JProgressBar, JList, JTree, JColorChooser, Dialogs. Java Database Connectivity (JDBC): Introduction, JDBC Drivers, JDBC Architecture, JDBC Classes and Interfaces, loading a Driver, making a Connection, Execute SQL Statement, SQL Statements, Retrieving Result, Getting Database Information, Scrollable and Updatable Result set, Result Set Metadata.
UNIT – III
Servlet: Introduction to Servlet, Servlet Life Cycle, advantages, Sharing Information, initializing a Servlet, Writing Service Methods, Filtering Requests and Responses, Invoking Other Web Resources, Accessing the Web Context, Maintaining Client State, Finalizing a Servlet. Java Server Pages: Introduction to JSP, JSP Engine, Anatomy of a JSP Page, JSP Syntax, Life Cycle of a JSP Page, Creating Static Content, Creating Dynamic Content.
REFERENCE BOOKS
1. Uttam K. Roy, Advanced Java programming
2. Herbert Schildt, Java Complete Reference
3. Cay S. Horstmans, Gray Coronell, Core Java Vol. II – Advanced Features
4. Sharanam Shah, Vaishali Shah, Java EE 7 for Beginners
SYLLABUS
M.SC. (DATA SCIENCE) I-YEAR, I-SEMESTER
MDS-104-T: PAPER- IV: STATISTICAL INFERENCE
UNIT-I
Estimation Theory: Basic concepts to estimation; Concepts, examples, applications and simple problems on criteria for good estimator: Unbiasedness, consistency, efficiency and sufficiency, Cramer-Rao inequality, Rao-Blackwell theorem, Fisher Information, Lehmann-Scheffe theorem. Simple Problems on UMVUE.
Methods of Estimation: Method of Moments, Least squares and Maximum Likelihood, Properties and Simple problems. Resampling methods: Jackknife, Bootstrap, Estimation of bias and standard deviation of point estimation by the Jackknife & Bootstrap methods with examples, U-statistic, Kernel and examples. Interval estimation, confidence level CI using pivots and shortest length CI and example problems.
UNIT-II
Testing of Hypotheses: Neyman-Pearson Lemma, Most Powerful tests, Uniformly Most Powerful tests, Likelihood ratio tests, Sequential Probability Ratio Tests.
Non parametric tests: One and two sample tests (Kolmogorov Smirnov, Kruskal Wallis & Friedman test, Kendal’s tau, Ansari broadly tests)
UNIT-III
Non-parametric Density Estimation: Rosenblatt’s naïve density estimator, its bias and variance. Consistency of Kernel density estimators and its MSE.
Simulation: Introduction, generation of random numbers for Uniform, Normal, Exponential, Cauchy and Poisson Distributions. Estimating the reliability of the random numbers. Priori and Posteriori distributions, conjugate families, Bayesian estimation of parameters, MCMC algorithms: Metropolis Hasting and Gibbs Sampler.
REFERENCES
1. Rohatgi, V.K.: An Introduction to Probability Theory and Mathematical Statistics (Wiley)
2. Gibbons: Non-Parametric Statistical Inference, (TMH)
3. Lehman, E. L.: Testing of hypothesis, John Wiley
4. Goon, Gupta and Das Gupta: Outlines of Statistics, Vol. II, World Press.
5. C.R. Rao – Linear Statistical Inference (John Wiley)
Instruction to Practicals of M.Sc. Data Science
1. The semesters I & II, each has Four practical papers, each has two credits with 4 hours of lab with a weightage of 50 marks each. 2. Each student has to spend a minimum 60 hours in lab for each practical paper and has to practice on various data sets available in various web sources.
3. Each practical record should contain all practicals mentioned in the syllabus and maintaining of Practical records is mandatory (submission at the time of examination) irrespective of the existence or non-existence of practical record marks allocation.
4. The statistical analysis report should follow the steps of Data Analysis depends on the practical (Problem, Data set considered (Data source), Data description, Data objectives, hypothesis framed on the population, and statistical techniques applicable (as per the syllabus of paper), writing of base program code to familiar with computational procedure rather than usage of packages), outputs and results, data interpretation, conclusions on the data set).
5. Each practical record should be written with own handwriting (not computer printouts) and should take the signature of the concerned faculty with the date of the practical done.
6. The Semester end practical exam question paper contains answer any two out of the three questions given in 2 hours duration (including its implementation)
7. The practical examination question paper is also common to all the students of all colleges and conducted and scheduled by Head, Department of Statistics, O.U. Hyderabad, with the appointment of external examiners.
8. All answers to the questions should be written in the practical answer booklets at the time of practical examination (including executed program, output, interpretation, and conclusions and in case of data analysis all the steps should be written as mentioned above.
SYLLABUS
M.SC. (DATA SCIENCE) I-YEAR, I-SEMESTER
MDS-105-P: PAPER- V (PRACTICAL-I)
Statistical Process for Data Science Using R
List of Practical’s:
1. Basics of R-programming and R-studio for data handling.
2. Data understanding, data description, Measurement of scales, Data objectives, Formation of Hypothesis’s. Sequential steps for writing data analysis report.
3. Evaluation of Data pre-processing steps. Data transformations (Standardize, Normalize, converting data from one scale to other scales, etc.).
4. Evaluation of Descriptive Statistics to various measurement of scales data sets.
5. Data Visualization using R: Drawing One dimensional diagram (Pictogram, Pie Chart, Bar Chart), two-dimensional diagrams (Histogram, Line plot, frequency curves & polygons, ogive curves, Scatter Plot), other diagrammatical/graphical representations like, Gantt Chart, Heat Map, Box-Whisker Plot, Area Chart, Correlation Matrices.
6. Correlation Analysis (parametric and nonparametric), Simple and Multiple linear Regression model fitting and its analysis.
7. Testing of Hypothesis-I: Parametric tests (z-, χ2, t-, F-tests, ANOVA).
8. Testing of Hypothesis-I: Non-Parametric tests (Sign test, Median, Wilcoxon sign rank, Mann-Whitney U, Run test).
9. Statistical analysis for qualitative data.
10. Applying the modeling process, Model evolution, overfitting, underfitting, cross-validation concepts, and Model Performance (train/test, K fold and leave out one approaches) for qualitative and Quantitative data.
Note: The implementation of the above list of practical’s to be applied on the sample data sets available in various web sources and should be practiced by each student. For example, www.kaggle.com. contains thousands of data sets with different measurement of scales; a few are: Fishers Iris Dataset; Online food dataset, Wine quality data set, water portability dataset, Heart data set, Protease data set, Mortagaze data set, flights dataset; Sustainable Development Data; Credit Card Fraud Detection; Employee dataset; Heart Attack Analysis & Prediction Dataset; Dataset for Facial recognition; Covid_w/wo_Pneumonia Chest Xray Dataset; Groceries dataset; Financial Fraud and Non-Fraud News Classification; IBM Transactions for Anti Money Laundering.
SYLLABUS
M.SC. (DATA SCIENCE) I-YEAR, I-SEMESTER
MDS-106-P: PAPER- VI (PRACTICAL-II):
DESIGN & ANALYSIS OF ALGORITHMS LAB USING PYTHON
List of Practical’s using Python programming (not usage of packages)
1. Divide and Conquer method of implementation of sorting/ searching data/data set using:
(i) Selection Sort
(ii) Merge Sort
(iii) Quick Sort.
(iv) Construction of Heap, Maintain heap, and Heap sort.
(v) Binary Search.
(vi) Strassen's Matrix multiplication
2. Greedy method implementation
(i) Fractional Knapsack problem.
(ii) Job Sequencing with Deadlines,
(iii) Minimum-Cost Spanning Trees (Kruskal’s & Prim’s),
(iv) Single Source Shortest Paths (Dijkstra’s).
3. Dynamic programming technique implementation for
(i) Travelling salesperson problem.
(ii) Multistage Graph problem,
(iii) All-Pairs Shortest Paths (Warshal),
(iv) Single-Source Shortest Paths (Bellman Ford),
(v) Optimal Binary Search Trees.
4. Back tracking technique implementation for
(i) 0-1 Knapsack problem.
(ii) 8-Queens Problem,
(iii) Hamiltonian Graph problem
5. Branch and bound Implementation for
(i) 0-1 Knapsack problem.
(ii) Travelling salesman problem.
SYLLABUS
M.SC. (DATA SCIENCE) I-YEAR, I-SEMESTER
MDS-107-P: PAPER- VII (PRACTICAL-III): JAVA PROGRAMMING LAB
List of Practical’s using Java programming
1. Java programs to the Mathematical / Statistical applications to demonstrate core Java and OOPs concepts. (working with functions, classes, abstract class, interfaces, string handling and string buffer classes, user-defined packages, event handling, exception handling, inheritance, polymorphism, multi-threading, etc, Computation central tendencies, dispersions, moments, skewness, kurtosis, distributions, correlation and regression, Matrices)
2. Developing servlet applications on databases. (like to accept H.T. No of a student from the client and display the memorandum of marks from the server, Question-Answer Application using HttpServletRequest and HttpServletRequest interfaces, etc).
3. Create JSP pages that prints (a) temperature conversion (from Celsius to Fahrenheit) chart; (b) current date and time (c). number of times page is referred after the page is loaded. simple JSP application to demonstrate the use of implicit object (at least 5). JSP Application to accept Registration Details from the user and store database table; Accept Registration Details from the user and store database table; Authenticate User Login as per the Registration Details. If Login Success then forward User to Index Page otherwise show Login failure Message; web Application to add items in the inventory using JSP.
4. Create GUI to present a set of choices for a user to select stationary products and display the price of the Product after selection from the list; typical Editable Table which describes Employee for a software company; swing components using student registration form.
5. Create a Remote Object for simple arithmetic operators. Use AWT / SWING to create user interface.
6. Develop a Hibernate application to Store Feedback of Website Visitors in MySQL Database.
7. Write EJB applications using stateless session beans and state-full session beans.
8. Develop a Room Reservation System Application using Entity Beans.
9. Create a Three-tire application using Servlets, JSP, and EJB.
SYLLABUS
M.SC. (DATA SCIENCE) I-YEAR, I-SEMESTER
MDS-108-P: PAPER- VIII (PRACTICAL-IV): STATISTICAL INFERENCE USING PYTHON
List of Practical’s using Python programming
1. Data visualization: Diagrammatical / Graphical representation of the data in the form of dataset with different measurement of scales (Pictorial representation, Bar (simple, multiple, component, percent) and Pie Charts, Histogram, Line plot, frequency curves & polygons, ogive curves, Scatter Plot, Gantt Chart, Heat Map, Box - Whisker Plot, Waterfall Chart, Area Chart, Density Plot, Bullet Graph, Choropleth Map, Treemap, Path diagram, Network Diagram, Correlation Matrices).
2. Correlation and Regression Analysis (including simple (Pearsons and Spearman’s), partial and multiple correlations, Simple, Multiple linear regression and logistic regression).
3. Parametric tests (z-, χ2, t-, F-tests, ANOVA).
4. Non-Parametric tests (Sign test, Median, Wilcoxon sign rank, Mann-Whitney U, Run test, U-test, K-S test, Kruskal Wallis and Friedman test, Independence, goodness of fit, Kendal’s tau, Ansari broadly tests).
5.Generation of Jackknife and Bootstrap samples and estimation of parameters and computation of bias.
6. Confidence Interval estimation for Binomial, Poisson, Normal and Exponential parameters.
7. Simulation: Generation of random numbers from various probability distributions (Uniform, Binomial, Poisson, Normal, Exponential, Gamma, Cauchy, Lognormal, and Weibull Distributions).
8. Bayesian estimation of parameters (using Metropolis Hasting and Gibbs Sampler).