Data Analysis for Cybersecurity (Advanced Information System Development)
This course is a mini-course of Advanced Information System Development.
It serves as an important triggering class for the student in PhD Program of Management Information Systems. In this course, we focus on the data analysis related topics and student should get familiar with tools, algorithms, concepts, and the execution environment to perform data analysis. In addition, we add one more important research field and application – cybersecurity – in this course. Students need to learn to be an architect to solve a cybersecurity-related problem by using data analysis algorithm. Related cybersecurity and data analysis theories, research papers and background knowledge will be instructed in the class.
Announcements (Spring 2018)
Announcements (Spring 2018)
- 5/14: Homework-01 is announced (see schedule below). Hand-in your Orange design in hard copy in 5/21.
Class Info
Class Info
- Instructor: Shun-Wen Hsiao, NCCU MIS Dept., hsiaom at nccu.edu.tw
- Lectures: Monday EFG (18:10~21:00)
- Classroom: TBA.
- TA: TBA.
- Office Hours: By appointment only.
Course Objectives & Learning Outcomes
Course Objectives & Learning Outcomes
- Understand the relationship of Cybersecurity and Security Management.
- Understand the concept of detection, the profiling subject, profiling techniques, misuse detection, and anomaly detection.
- Familiar with data analysis environment, big data concept, and could computing.
- Understand the data analysis algorithms: distance, similarity, classification, clustering for cybersecurity application
- Understand visualized machine learning tools: Orange
- Understand the operation of cybersecurity-related information systems from the perspective of data-driven system
Schedule (Spring 2018)
Schedule (Spring 2018)
- 4/09: Security Management
- Safety, Security, Assurance, Risk.
- 4/16: Data Analysis Environment. Python Execution Environment (pdf), NumPy.ipynb, pandas.ipynb, PythonDSPackages.ipynb.
- Google: Data Science Process, Data Science Lifecycle
- 4/23: Data Mining Algorithm: Supervised Learning
- DM-0.pdf: Distance, Confusion Matrix.
- Linear Regression, Logistic Regression, Decision Tree, Association Rule.
- 4/30: Data Mining Algorithm: Unsupervised Learning
- UPGMA (UPGMA Walkthrough by Dr. R. Edwards), k-means, knn.
- PCA & LDA
- 5/07: Visualized Machine Learning Tool: Orange
- Class note for Orange
- 5/14: Intrusion Detection System
- An Introduction to Intrusion Detection by Aurobindo Sundaram.
- R. A. Kemmerer and V. Giovanni, "Intrusion Detection: A Brief History and Overview," Computer, vol. 35, 2002, supl. 27--30.
- Additional material: E. Hodo ea al., "Shallow and Deep Networks Intrusion Detection System: A Taxonomy and Survey," arXiv:1701.02145 [cs.CR], Jan 2017.
- Security Datasets: Awesome Machine Learning for Cyber Security, SecRepo.com - Samples of Security Related Data, vizsec.org
- Homework-01: Use NSL_KDD data to construct a classifier and hand-in your orange design (in hard copy) in 5/21.
- 5/21: Anomaly Detection System
- V. Chandola, A. Banerjee and V. Kumar, "Anomaly Detection: A Survey," ACM Computing Survey, vol. 41, no. 3, July 2009.
- A. Lakhina, M. Crovella and C. Diot, "Diagnosing Network-Wide Traffic Anomalies," ACM SIGCOMM Computer Communication Review, vol. 34, no. 4, pp. 219--230, 2004.
- Here is an example of Network Analysis by Orange.
- 5/28: Spam Mail Filtering System
- R. Sommer and V. Paxson, "Outside the Closed World: On Using Machine Learning For Network Intrusion Detection," in Proc. IEEE Symposium on Security and Privacy, 2010, pp. 305-316.
- Homework-02: Use Spambase Data Set to construct a classifier and hand-in your orange design (in hard copy) in 6/04.
- Homework-03: Write an essay to discuss the applicability of using ML algorithms on 1) security data (to perform anomaly detection) and 2) financial data (to perform supervised or unsupervised learning).
- 6/4: Sequence Analysis System
- S. Forrest, S. A. Hofmeyr, A. Somayaji and T. A. Longstaff, "A Sense of Self for Unix Processes," in Proc. IEEE Symposium on Security and Privacy, 1996.
- Homework-04: Here i3s a set of system call traces from several malware, try to classify them into malware family. Please email your homework before July 1st 23:59.
- 6/11: Student Presentation
- 6/18: No class.
- 6/25: Final
Grading Policy
Grading Policy
- Homework (30%): programming exercises and essays
- Participation (5%): attendance and discussion
- Term Project (15%): implement a data analysis application
- Midterm and Final (50%)