M.Sc. (DATA SCIENCE) IV-SEMESTER SYLLABUS
MDS-401: PAPER- I: CRYPTOGRAPHY
UNIT–I
Overview of Network Security: OSI Security Architecture, Security Attacks, Security Services, Security Mechanisms, a Model for Network Security. Classical Encryption Techniques: Symmetric Cipher Model, Substitution Techniques, Transposition Techniques, Rotor Machines, Steganography. Block Ciphers : Structure and Data Encryption Standard (DES),Strength of DES. Block Cipher Operation: Double and Triple DES, Electronic Code Book, Cipher Block Chaining Mode, Cipher Feedback Mode, Output Feedback Mode, Counter Mode.
UNIT–II
Encryption Standard (AES): Origins, Structure, Round Functions, AES Key Expansion. Pseudorandom Number Generation and Stream Ciphers: Principles, Block Cipher based PRNG, RC4. Public-Key Cryptography: Principles of Public-Key Cryptosystems, RSA Algorithm. Key Management and Distribution: Symmetric and Asymmetric Key Distribution, Public Key Distribution, X.509 Certificates, Diffie-Hellman Key Exchange.
UNIT–III
Cryptographic Hash Functions: Applications, SHA & MD5 Algorithm. Message Authentication Codes (MAC): requirements, HMAC, CMAC. Digital Signatures: Concepts, NIST Digital Signatures Algorithm. Transport-Level Security: SSL, TLS, HTTPS, SSH. EMail Security: Pretty Good Privacy, S/MIME. IP Security: Overview, Architecture, Encapsulating Security Payload, Internet Key Exchange. System Security: Intruders, Intrusion Detection, Password Management, Virus and Countermeasures, Firewall Design Principles and Types.
Suggested Readings:
1. William Stallings, Cryptography and Network Security – Principles and Practice (6e)
2. Zhenfu Cao, New Directions of Modern Cryptography
3. Douglas R. Stinson, Cryptography Theory and Practices
4. Tom St Denis, Simon Johnson, Cryptography for Developers
5. Joseph Migga Kizza, A Guide to Computer Network Security
6. A. Menezes, P. Van Oorschot, S. Vanstone, Handbook of Applied Cryptography
7. Henk C.A. van Tilborg, Sushil Jajodia, Encyclopedia of Cryptography and Security
8. Keith M. Martin, Everyday Cryptography–Fundamental Principles and Applications
MDS-402: PAPER- II: DATA MINING
UNIT-I
Introduction to Data Mining and Data Understanding: Data Mining Concepts – Definition, Need for Data Mining, Data Mining Scope – Types of Data to be Mined, Types of Patterns to be Mined, Technologies for Data Mining – Supporting Tools and Techniques, Applications of Data Mining – Targeted Domains and Use Cases, Major Issues in Data Mining – Challenges and Research Directions, Getting to Know Your Data- Data Objects and Attribute Types, Basic Statistical Descriptions of Data, Data Visualization Techniques, Measuring Data Similarity and Dissimilarity.
UNIT-II
Frequent Pattern Mining and Classification: Frequent Pattern Mining & Association,Basic Concepts and Methods, Frequent Itemset Mining Techniques, Interestingness of Patterns, Pattern Evaluation Methods, Classification: Basic Methods -Concepts of Classification, Decision Tree Induction, Bayes Classification Methods. Classification: Advanced MethodsBayesian Belief Networks, Classification by Backpropagation, Support Vector Machines (SVM)
UNIT-III
Cluster Analysis and Data Mining Trends: Cluster Analysis: Concepts and MethodsIntroduction to Cluster Analysis, Partitioning Methods, Hierarchical Methods, Density-Based Methods, Grid-Based Methods, Evaluation of Clustering. Data Mining Trends and Research Frontiers- Mining Complex Data Types, Alternative Methodologies in Data Mining, Applications of Data Mining, Data Mining and Society, Emerging Trends in Data Mining
Suggested Readings:
1. Jiawei Han, Micheline Kamber, Jin Pei, Data Mining: Concepts & Techniques, 3rd Edition., Morgon Koffman ,2011
2. Vikram Pudi P.Radha Krishna, Data Mining, Oxford University press, lst Edition,2009.
3. Pang-Ning Tan, Michael Steinbach, Vipin kumar, Introduction to Data Mining, pearcon Education,2008.
MDS-403 A: PAPER- III (A) : SENTIMENTAL ANALYSIS
UNIT-I
Basics & Applications: Introduction, Applications, Research Scope, Sentiment Analysis as Mini NLP. The problem of Sentiment Analysis: Definition & Opinion Summary - Affect, Emotion, and Mood. Deferent types of opinions, Author Vs. Reader standpoint. Document-Level sentiment classification: Supervised and Unsupervised Sentiment classification, Sentiment Rating Prediction, Cross-Domain and Cross-Language Sentiment Classification, Emotion Classification of Documents.
UNIT-II
Subjectivity, Sentence-Level Analysis & Lexicons: Subjectivity & Sentence Sentiment Classification – Sentence Subjectivity, Sentiment Classification, Handling Conditional & Sarcastic Sentences, Cross-Language Classification, Discourse-Based Sentiment, Emotion Classification of Sentences. Sentiment Lexicon Generation – Dictionary-Based Approach, Corpus-Based Approach, Desirable vs. Undesirable Facts.
UNIT-III
Comparative Opinions, Summarization & Opinion Quality: Analysis of Comparative Opinions – Problem Definition, Identifying Comparative Sentences, Preferred Entity Set, Types of Comparison, Entity & Aspect Extraction. Opinion Summarization & Search – Aspect-Based Summarization, Contrastive View, Traditional Summarization, Summarization of Comparative Opinions, Opinion Search & Retrieval Techniques. Mining Intentions – Intention Mining Problem, Intention Classification, Fine-Grained Mining. Fake & LowQuality Opinions – Fake/Deceptive Opinion Detection (Spam Types, Supervised Detection, Behavioral Analysis, Group Spam, Multiple IDs, Business Exploitation), Quality of Reviews (Regression Approach & Other Methods).
Suggested Readings:
l. Bing Liu "Sentiment Analysis University Press, 2015. Mining Opinions, Sentiments and Emotions, Cambridge
MDS-404 B: PAPER- IV (B) : WEB MINING
UNIT–I
Web Data Mining & Data Mining Foundations: Introduction to WWW, Web Mining, and Data Mining, Association Rule Mining – Apriori Algorithm, Frequent Itemset & Rule Generation, Multiple Minimum Supports, Class Association Rules. Sequential Pattern Mining – GSP, PrefixSpan, Rule Generation from Patterns.
UNIT–II
Machine Learning for Web Mining: Supervised Learning – Decision Trees, Rule Induction, Classification based on Associations, Naïve Bayes & Text Classification. Unsupervised Learning – K-means Clustering, Hierarchical Clustering (Single Link, Complete Link, Average Link), Strengths & Weaknesses
UNIT–III
Information Retrieval, Link Analysis & Web Crawling: Information Retrieval – Boolean Model, Vector Space Model, Statistical Language Model, Relevance Feedback, Evaluation Measures. Text & Web Page Preprocessing – Stopword Removal, Stemming, Duplicate Detection, Inverted Index & Compression, Latent Semantic Indexing. Web Search & Issues – Web Search, Meta Search, Web Spamming. Link Analysis – PageRank, HITS, Community Discovery. Web Crawling – Crawler Algorithms (BFS, Focused, Topical), Implementation Issues, Ethics. Sentiment Classification – Sentiment Phrases, Text Classification Methods.
Suggested Readings:
1. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data by Bing Liu (Springer Publications)
2. Data Mining: Concepts and Techniques, Second Edition Jiawei Han, Micheline Kamber (Elsevier Publications)
3. Web Mining:: Applications and Techniques by Anthony Scime
4. Mining the Web: Discovering Knowledge from Hypertext Data by Soumen Chakrabarti
MDS-405 : PAPER- V: PRACTICAL PAPER: CRYPTOGRAPHY & DATA MINING
Cryptography Practical's:
1. Classical Encryption Techniques
a) Implement Caesar Cipher, Playfair Cipher, and Vigenère Cipher for encryption and decryption.
b) Extend with a simple transposition cipher.
2. Block Cipher Operations
a) Implement DES (or simplified DES) encryption and decryption.
b) Demonstrate ECB, CBC, CFB, OFB, and Counter (CTR) modes with sample plaintext.
3. Public-Key Cryptography (RSA)
a) Implement RSA key generation, encryption, and decryption.
b) Verify with a short text message.
4. Hashing and Authentication
a) Implement SHA-256 and MD5 hashing.
b) Demonstrate HMAC for message authentication
Data Mining Practical:
5. Frequent Pattern Mining
a) Implement the Apriori algorithm for frequent itemset generation.
b) Generate and display association rules with confidence and support.
6. Classification
a) Implement Decision Tree Classifier using Gini Index or Entropy.
b) Train and test it on a dataset (e.g., Iris or UCI dataset).
7. Clustering
a) Implement K-Means clustering and visualize clusters.
b) Compare with Hierarchical clustering (single-link / complete-link).
8. Data Similarity and Visualization
a) Write a program to compute Euclidean, Cosine, and Jaccard similarity between data objects.
b) Visualize data distribution using histograms, scatter plots, and boxplots.
MDS-406A : PAPER- VI : PRACTICAL PAPER E-I &E-II
ELECTIVE-I (A): SENTIMENTAL ANALYSIS
1. Implement a sentiment classification model to classify IMDB movie review dataset as positive or negative using machine learning techniques like Naïve Bayes and Logistic Regression.
2. Implement Python program to classify sentences as subjective/objective and then sentiment (positive/negative) on Movie review dataset split into sentences.
3. Implement Python program to build a sentiment lexicon using Dictionary-based and Corpus-based approaches to compare effectiveness on test sentences.
4. Implement Python program to Identify and analyze comparative opinions using Laptop/Mobile product reviews dataset.
5. Apply classification techniques to Detect fake/deceptive/spam reviews using Yelp Review Spam Dataset. (Hint: classification: Logistic Regression, Random Forest).
ELECTIVE-II (B): WEB MINING
1. Implement Association Rule Mining using the Apriori algorithm to generate frequent itemsets and association rules. Demonstrate rule generation with minimum support, confidence, and multiple minimum support thresholds.
2. Develop a program for Sequential Pattern Mining using algorithms such as GSP or PrefixSpan. Demonstrate pattern discovery and rule generation from sequential data.
3. Implement supervised and unsupervised learning techniques for web data mining by applying Decision Trees, Naïve Bayes for text classification, and K-means or Hierarchical Clustering. Analyze strengths and weaknesses of each method.
4. Design an information retrieval system using Boolean Model and Vector Space Model. Perform text preprocessing (stopword removal, stemming, inverted index) and evaluate retrieval performance using standard measures.
5. Implement web mining techniques for link analysis and web crawling by demonstrating PageRank or HITS algorithms, crawler strategies (BFS / focused crawling), and sentiment classification of web documents.
MDS-407: PAPER- VII: CAPSTONE PROJECT-II
Objectives:
1. To enhance practical and professional skills.
2. To familiarize tools and techniques of systematic Literature survey and documentation
3. To expose the students to industry practices and team work.
4. To encourage students to work with innovative and entrepreneurial ideas
Outcomes:
Student will be able to:
1. Demonstrate the ability to synthesize and apply the knowledge and skills acquired in the academic program to real-world problems
2. Evaluate different solutions based on economic and technical feasibility
3. Effectively plan a project and confidently perform all aspects of project management
4. Demonstrate effective written and oral communication skills
Guidelines for Project:
1. Each Student has to do one standalone Statistical data analysis project in IV semester (Major project) (it should be different with III Sem project) to familiarize the practical usage of all statistical techniques covered in UG & PG level using Statistical software’s Python and R, from any one industry/ institution under a recognized Supervisor (supervisor must be eligible for teaching at PG level Statistics in any University as per UGC norms).
2. Suggested to collect Live large scale data set (may be primary or secondary source data set with minimum sample size 2000 with minimum 10 data variables with different measurement of scales of variables)
3. Each student has to submit the project reports in two copies (minimum 100 pages) hard bound copy along with the assignments follow the Ph.D. thesis norms as per Osmania University duly signed by the Students on Declaration, Certificate from Industry /institution and certified by recognized Supervisor supported by Plagiarism report (not exceeding 10% of similarity index).
4. The Project Report should fulfill the norms of statistical data analysis report. (It should contains (a) Literature collected related to the study and its review (research articles minimum 10) & Data Domain Description (b) National & International significance on the problem (c) Detailed data variable description (d) Data set objectives/ objectives of the study (e) Formation of statistical hypothesis (f) Data visualization techniques applied on each/combination of variables (g) Descriptive statistics (h) Exploratory Data Analysis on each variable understudy, (i) Advanced data analysis tools applied (minimum 5), (j) Results analysis and its interpretation. (k) sample data set (min 20-50), Python / R programs written for implementation should be placed in appendix (l) Bibliography; Analysis report. Usage of Python programs should be presented in Appendix.
5. Projects will be evaluated by two subject experts by conducting Project viva-voce along with other practical's. May require presentation on the Project.
6. Project Marks will be awarded based on the Criteria in the viva-voce performance in (i) Project topic chosen and its significance (ii) Depth of the theoretical & technical soundness and practical Implementation, domain knowledge, proper organization of chapters (iii) seminar presentation & communication skills, role/ contribution of the student in the project etc.
Project Report guidelines:
1. Title Page in prescribed format
2. Declaration by the student in prescribed format
3. Certificate from the Industry / supervisor in prescribed format
4. Acknowledgments to supervisor and who helped in the project.
5. Contents in prescribed format
Chapters:
1. Introduction (Introduction, Motivation to topic, Significance and need of the study, Problem Statement, Objectives of the study, Chapter wise summary).
2. Review of the Literature (Introduction with list of authors contributed (collected from journals) related to topic, Each author method description (5-10 methods) and its comparison study
3. Data Domain (About Data set and its domain knowledge with Sample data description,)
4. Statistical tools and software’s used (Basic and advanced tools used and minimum 5 advanced methods used and their procedures, Software’s used, etc)
5. Data visualizations (Tools applicable, justification, program Outputs/ results analysis and interpretations and conclusions)
6. Exploratory Data Analysis Tools applicable, justification, program Outputs/ results analysis and interpretations and conclusions 7. Model Building /Advanced data analysis techniques Tools applicable, justification, program Outputs/ results analysis and interpretations and conclusions
8. Conclusions & Future Scope
9. Appendix (d. Program Code for Implementation, sample data set if any)
10.Bibliography (Minimum 30 articles)
11. Plagiarism Report