SYLLABUS
M.SC. (DATA SCIENCE) I-YEAR, I-SEMESTER
MDS-101-T: PAPER- I: MATHEMATICAL FOUNDATIONS FOR DATA SCIENCE
UNIT – I
Linear Algebra: Vector spaces, Subspaces, Basis and dimension of a vector space, linear dependence and independence, spanning set. Linear transformation, kernel, range, Matrix Representation of a linear transformation, Matrices: Trace and Rank of a Matrix and their properties, Determinants, Inverse, matrices and their properties,
UNIT-II
Symmetric, orthogonal and idempotent matrices and their properties Gauss elimination, row canonical form, diagonal form, triangular form and its Applications, Characteristic roots and vectors, Statement of Caley-Hamilton theorem and its applications, Orthogonal and Spectral decomposition of a real symmetric matrix, Singular value Decomposition.
UNIT – III
Combinatorics: Basic counting principle, inclusion-exclusion for two-sets, pigeonhole principle, permutations and combinations, Binomial coefficient and identities, generalized permutations and combinations. principle of inclusion-exclusion, applications of inclusion-exclusion. Recurrence Relations: introduction, solving linear recurrence relations, generating functions.
UNIT-IV
Graph Theory: Basic concepts to Graphs; Isomorphic graphs and simple problems; Trees: definitions, properties, and simple problems, tree traversals, spanning tree constructions: breadth and depth first search, minimal spanning tree constructions: Kruskal’s algorithm. Planar and Hamiltonian graphs and simple problems and applications; Graph coloring and its applications.
Suggested Readings
1. Gilbert Strang (2016): Introduction to linear algebra, 5/e., Wellesley-Cambridge.
2. David C. Lay (2019): Linear Algebra and Its Applications, Pearson, 5/e.
3. Joe L. Mott, Abraham Kandel, Theoder P. Baker, Discrete Mathematics for Computer Scientists and Mathematicians
M.SC. (DATA SCIENCE) I-YEAR, I-SEMESTER
MDS-102-T: PAPER- II: DESIGN AND ANALYSIS OF ALGORITHMS
UNIT I
Introduction to Algorithms: Algorithm Specification, Performance Analysis, Randomized Algorithms. Elementary Data Structures: Stacks and Queues, Trees, Dictionaries, Priority Queues, Sets and Disjoint Set Union, Graphs. Divide and Conquer: Binary Search, Finding the Maximum and Minimum, Merge Sort; Quick Sort, Selection sort, Strassen's Matrix Multiplication, Convex Hull.
UNIT-II
Greedy Method: Knapsack Problem, Job Sequencing with Deadlines, Minimum-Cost Spanning Trees (Kruskal’s & Prim’s), Single Source Shortest Paths (Dijkstra’s). Dynamic Programming: General Method, Multistage Graphs, All-Pairs Shortest Paths, Single-Source Shortest Paths, Optimal Binary Search Trees, 0/1 Knapsack, Traveling Salesperson Problem.
UNIT-III
Back Tracking technique: General Method, 8-Queens Problem, Sum of Subsets, Graph Colouring, Hamiltonian Cycles, Knapsack Problem. Branch-Bound technique: General Method, 0/1 Knapsack Problem, Traveling Sales Person problem.
UNIT -IV
NP-Hard and NP-Complete Problems: Basic Concepts, Cook's Theorem, NP-Hard and NP-Complete problems. Graph Problems, NP-Hard Scheduling Problems, NP-Hard Code Generation, Some Simplified NP-Hard Problems.
Suggested Readings
1. E Horowitz, S Sahni, S Rajasekaran, (2007): Fundamentals of Computer Algorithms, 2/e, Universities Press.
2. T.H. Cormen, C.E. Leiserson, R.L Rivert, C Stein, (2010): Introduction to Algorithms, 3/e, PHI.
3. R. Pannerselvam (2007): Design and Analysis of Algorithms, PHI.
4. Hari Mohan Pandey, (2009): Design, Analysis and Algorithm, University Science Press.
SYLLABUS
M.SC. (DATA SCIENCE) I-YEAR, I-SEMESTER
MDS-103-T: PAPER- III: JAVA PROGRAMMING
UNIT – I
Java Programming Fundamentals: Introduction, Overview of Java, Data Types, Variables and Arrays, Operators, Control statements, Classes, Methods, Inheritance, Packages and Interfaces. I/O basics: Byte & Character Streams, Reading Console input and output, Scanner Class, Console Class, Print Writer Class, String Handling, Exception Handling, Multithreaded Programming. Overview of Networking: Working with URL, connecting to a Server, Implementing Servers, serving multiple Clients, Sending E-mail, Socket Programming, Internet Addresses, URL Connections.
UNIT – II
AWT: Introduction, AWT Class Hierarchy, Creating Container, Adding Components, Layout, Using Panel, Text Field, Text Area, List, Check-box, Check-Box-Group, Choice, Event-Handling, Dialog-Boxes, Scroll-Bar, Menu. Swing: Containment Hierarchy, Adding Components, J-Text-Field, J-Password-Field, J-Table, J-Combo-Box, J-Progress-Bar, J-List, J-Tree, JColor-Chooser, Dialogs.
UNIT-III
Java Database Connectivity (JDBC): Introduction, JDBC Drivers, JDBC Architecture, JDBC Classes and Interfaces, loading a Driver, making a Connection, Execute SQL Statement, SQL Statements, Retrieving Result, Getting Database Information, Scrollable and Updatable Result-set, Result-Set Metadata.
UNIT – IV
Servlet: Introduction to Servlet, Servlet Life Cycle, advantages, Sharing Information, initializing a Servlet, Writing Service Methods, Filtering Requests and Responses, Invoking Other Web Resources, Accessing the Web Context, Maintaining Client State, Finalizing a Servlet. Java Server Pages: Introduction to JSP, JSP Engine, Anatomy of a JSP Page, JSP Syntax, Life Cycle of a JSP Page, Creating Static Content, Creating Dynamic Content.
Suggested Readings
1. Uttam K. Roy, Advanced Java programming
2. C, C++, Java, Python & R The Complete Programmers Reference – By Dr. Mohd. Abdul Hameed
3. Herbertt Schildt, Java Complete Reference
4. Cay S. Horstmans, Gray Coronell, Core Java Vol. II – Advanced Features
5. Sharanam Shah, Vaishali Shah, Java EE 7 for Beginners.
SYLLABUS
M.SC. (DATA SCIENCE) I-YEAR, I-SEMESTER
MDS-104-T: PAPER- IV: STATISTICAL INFERENCE
UNIT-I
Estimation Theory: Basic concepts to estimation; Concepts, examples, applications and simple problems on criteria for good estimator: Unbiasedness, consistency, efficiency and sufficiency, Cramer-Rao inequality, Rao-Blackwell theorem, Fisher Information, Lehmann-Scheffe theorem. Simple Problems on UMVUE.
UNIT-II
Methods of Estimation: Method of Moments, Least squares and Maximum Likelihood, Properties and Simple problems. Resampling methods: Jackknife, Bootstrap, Estimation of bias and standard deviation of point estimation by the Jackknife & Bootstrap methods with examples, U-statistic, Kernal and examples. Interval estimation, confidence level CI using pivots and shortest length CI and example problems.
UNIT-III
Testing of Hypotheses: Neyman-Pearson Lemma, Most Powerful tests, Uniformly Most Powerful tests, Likelihood ratio tests, Sequential Probability Ratio Tests.
Non parametric tests: One and two sample tests (Kolmogorov Smirnov, Kruskal Wallis & Friedman test, Kendal’s tau, Ansari broadly tests)
UNIT-IV
Non-parametric Density Estimation: Rosenblatt’s naïve density estimator, its bias and variance. Consistency of Kernel density estimators and its MSE.
Simulation: Introduction, generation of random numbers for Uniform, Normal, Exponential, Cauchy and Poisson Distributions. Estimating the reliability of the random numbers. Priori and Posteriori distributions, conjugate families, Bayesian estimation of parameters, MCMC algorithms: Metropolis Hasting and Gibbs Sampler.
Suggested Readings
1. Rohatgi, V.K.: An Introduction to Probability Theory and Mathematical Statistics (Wiley)
2. Gibbons: Non-Parametric Statistical Inference, (TMH)
3. Lehman, E. L.: Testing of hypothesis, John Wiley
4. Goon, Gupta and Das Gupta: Outlines of Statistics, Vol. II, World Press.
5. C.R. Rao – Linear Statistical Inference (John Wiley)
1. The semesters I & II, each has Four practical papers, each has two credits with 4 hours of lab with weightage of 50 marks each.
2. Each student has to spend minimum 60 hours in lab for each practical paper and has to practice on various data sets available in various web sources.
3. Each practical record should contain all practical’s mentioned in the syllabus and maintaining of Practical records is mandatory (submission at the time examination) irrespective of existence or non-existence of practical record marks allocation.
4. The statistical analysis report should follow the steps of Data Analysis depends on the practical (Problem, Data set considered (Data source), Data description, Data objectives, hypothesis framed on population, statistical techniques applicable (as per syllabus of paper), writing of base program code to familiar with computational procedure rather than usage of packages), outputs and results, data interpretation, conclusions on the data set).
5. Each practical record should be written with own hand writing (not computer printouts) and should be taken the signature of the concerned faculty with date of the practical done.
6. The Semester end practical exam question paper contains answer any two out of the three questions given in 2 hours duration (including its implementation)
7. The practical examination question paper is also common to all the students of all colleges and conducted and scheduled by Head, Department of Statistics, O.U., Hyderabad, with the appointment of external examiners.
8. All answers to the questions should be written in the practical answer booklets at the time of practical examination (including executed program, output, interpretation and conclusions and incase of data analysis all the steps should be written as mentioned above.
Instructions in Practicing list of Practical
Objective of the practicals:
The students have to concentrate and practice on
(i) Perfection on one programming language with certification
(ii) Perfection in handling large scale datasets in storage, securing, retrieving and handling data (with certification)
(iii) Able to understand the methods/algorithms with theoretical knowledge.
(iv) Implementation by writing program code for any algorithm / method without usage of packages/ software.
(v) Able to write complete data science / analysis report as per the norms.
(vi) Expertise in minimum three related different domains by practicing on minimum 10 -15 data sets by choosing the domains: (a) Finance data sets (b) Business data sets (c) Medical data sets (d) Health data sets (e) Clinical trials datasets (f) Nutritional datasets (g) biological datasets (h) economical data sets etc.
1. Each student has to spend minimum 60 hours on practice on system by taking various data sets available in web sources in different domains, so that the
student is able to apply relevant statistical tools for their implementation and able to write report.
2. The Semester end practical exam question papers contains answer any two out of the three questions with implementation). Exam time is 2 hours
duration.
3. The detailed data analysis report should be written as per the question in the practical answer booklets.
4. The answer script will be evaluated as per the code, outputs/results and data results interpretation and conclusions presented in the script.
5. Submission of Practical records is mandatory at the time of Practical examination irrespective of record marks existence.
6. The practical record should be written with own hand writing.
7. Practical record should contain all practical’s as per the syllabus.
8. The data analysis report should contain: Problem statement, Data set, Data description, data objectives, Hypothesis, Statistical tools & techniques,
Program code, output/ results, data interpretation, conclusions.
Text Book:
Python for Data Science, Dr. Mohd Abdul Hameed, Wiley publisher
SYLLABUS
M.SC. (DATA SCIENCE) I-YEAR, I-SEMESTER
MDS-105-P: PAPER- V (PRACTICAL-I)
Statistical Process for Data Science Using R
List of Practical’s:
1. Data understanding, data description, Measurement of scales, Data objectives, Formation of Hypothesis’s.
2. Evaluation of Data pre-processing steps. Data transformations (Standardize, Normalize, converting data from one scale to other scales etc.).
3. Evaluation of Descriptive Statistics to various measurement of scales data sets.
4. Data Visualization using R: Drawing One dimensional diagram (Pictogram, Pie Chart, Bar Chart), two-dimensional diagrams (Histogram, Line plot, frequency curves & polygons, ogive curves, Scatter Plot), other diagrammatical / graphical representations like, Gantt Chart, Heat Map, Box-Whisker Plot, Area Chart, Correlation Matrices.
5. Correlation Analysis (parametric and nonparametric), Simple and Multiple linear Regression model fitting and its analysis.
6. Testing of Hypothesis-I: Parametric tests (z-, χ2, t-, F-tests, ANOVA),
7. Testing of Hypothesis-I: Non-Parametric tests (Sign test, Median, Wilcoxon sign rank, Mann-Whitney U, Run test).
8. Statistical analysis for qualitative data. Data interpretation and Statistical Report writing.
9. Applying the modelling process, Model evolution, over fitting, under fitting, cross validation concepts, Model Performance (train/test, K fold and leave out one approaches) for qualitative and Quantitative data.
Note: The implementation of the above list of practicals to be applied on the sample data sets available in web sources can be used for practice. (Suggested datasets like: www.kaggle.com Few data sets are : 1. Fishers Iris Dataset; 2. Online food dataset, 3. Wine quality data set, 4. Water portability dataset, 5. Heart data set, 6. Protease data set, 7. Mortagaze data set, 8. flights dataset; 9. Sustainable Development Data; 10. Credit Card Fraud Detection; 11. Employee dataset; 12. Heart Attack Dataset; 13. Dataset for Facial recognition; 14. Covid W/wo_Pneumonia Chest Xray Dataset; 15. Financial Fraud and Non-Fraud News Classification; 16. IBM Transactions for Anti Money Laundering).
Text Book: Python for Data Science, Dr. Mohd Abdul Hameed, Wiley publisher
SYLLABUS
M.SC. (DATA SCIENCE) I-YEAR, I-SEMESTER
MDS-106-P: PAPER- VI (PRACTICAL-II):
DESIGN & ANALYSIS OF ALGORITHMS LAB USING PYTHON
List of Practical’s:
1. Implementation sorting algorithms:
2. Implementation of Sequential Search, Binary Search.
3. Implementation of Tree Traversal Algorithms
4. Greedy implementation for Knapsack problem.
5. Greedy Construction of minimal spanning tree using
a. Kruskal’s and
b. Prims Algorithms.
6. Construction of the shortest path in a weighted graph using Dijkstra’s Algorithm.
7. Dynamic programming technique Implementation for
a) Travelling sales man problem.
b) Multistage Graph problem,
c) All-Pairs Shortest Paths (Warshal),
d) Single-Source Shortest Paths (Bellman ford),
e) Optimal Binary Search Trees.
8. Implementation of Back tracking technique for Knapsack problem.
9. Implementation of Branch and bound for knapsack problem.
10. Implementation of Branch and bound for travelling sales man problem.
Text Book: Python for Data Science, Dr.Mohd Abdul Hameed, Wiley publisher
SYLLABUS
M.SC. (DATA SCIENCE) I-YEAR, I-SEMESTER
MDS-107-P: PAPER- VII (PRACTICAL-III): JAVA PROGRAMMING LAB
List of Practical’s:
1. Create GUI to present a set of choices for a user to select stationary products and display the price of Product after selection from the list.
2. Create GUI to demonstrate typical Editable Table which describing Employee for a software company.
3. Write an RMI application using call back mechanism
4. Develop Servlet Question-Answer Application using Http Servlet Request and Http Servlet Request interfaces.
5. Develop Servlet application to accept HTNO of a student from client and display the memorandum of marks from the server.
6. JSP Programs a. Create a JSP page that prints temperature conversion (from Celsius to Fahrenheit) chart b. Create a JSP page to print current date and time c. Create a JSP page to print number of times page is referred after the page is loaded.
7. Write a simple JSP application to demonstrate the use of implicit object (at least 5).
8. Develop an Hibernate application to Store Feedback of Website Visitors in MySQL Database.
9. Develop a JSP Application to accept Registration Details from the user and store database table.
10. Develop a JSP Application to Authenticate User Login as per the Registration Details. If Login Success then forward User to Index Page otherwise show Login failure Message.
11. Develop a web Application to add items in the inventory using JSF.
12. Write EJB applications using stateless session beans and state-full session beans.
13. Develop a Room Reservation System Application using Entity Beans.
14. Create Three-tire application using Servlets, JSP, EJB.
SYLLABUS
M.SC. (DATA SCIENCE) I-YEAR, I-SEMESTER
MDS-108-P: PAPER- VIII (PRACTICAL-IV): STATISTICAL INFERENCE USING PYTHON
List of Practical’s:
1. Drawing One dimensional diagram (Pictogram, Pie Chart, Bar Chart,).
2. Drawing two-dimensional (Histogram, Line plot, frequency curves & polygons, ogive curves, Scatter Plot)
3. Drawing 3D and other data visualization techniques.
4. Drawing Gantt Chart, Heat Map, Box - Whisker Plot, Waterfall Chart, Area Chart, Stacked Bar Charts
5. Drawing Density Plot, Bullet Graph, Choropleth Map, Tree map, Path diagram, Network Diagram, Correlation Matrices.
6. Generation of Jackknife and Bootstrap samples and its parameter estimation.
7. Computation of Confidence Interval estimation of parameters.
8. Small and Large sample tests (for Mean(s), Standard deviation/variance(s), Proportion(s)).
9. Non parametric tests: One and two sample tests (Kolmogorov Smirnov, Kruskal Wallis & Friedman test, Kendal’s tau, Ansari broadly tests.)
10. Generation of random numbers for Uniform, Normal, Exponential, Cauchy and Poisson Distributions.
11. Bayesian estimation of parameters, (using Metropolis Hasting and Gibbs Sampler).