Data Analysis for Cybersecurity (Advanced Information System Development)

This course is a mini-course of Advanced Information System Development.

It serves as an important triggering class for the student in PhD Program of Management Information Systems. In this course, we focus on the data analysis related topics and student should get familiar with tools, algorithms, concepts, and the execution environment to perform data analysis. In addition, we add one more important research field and application – cybersecurity – in this course. Students need to learn to be an architect to solve a cybersecurity-related problem by using data analysis algorithm. Related cybersecurity and data analysis theories, research papers and background knowledge will be instructed in the class.

Announcements (Spring 2018)

  • 5/14: Homework-01 is announced (see schedule below). Hand-in your Orange design in hard copy in 5/21.

Class Info

  • Instructor: Shun-Wen Hsiao, NCCU MIS Dept., hsiaom at nccu.edu.tw
  • Lectures: Monday EFG (18:10~21:00)
  • Classroom: TBA.
  • TA: TBA.
  • Office Hours: By appointment only.

Course Objectives & Learning Outcomes

  • Understand the relationship of Cybersecurity and Security Management.
  • Understand the concept of detection, the profiling subject, profiling techniques, misuse detection, and anomaly detection.
  • Familiar with data analysis environment, big data concept, and could computing.
  • Understand the data analysis algorithms: distance, similarity, classification, clustering for cybersecurity application
  • Understand visualized machine learning tools: Orange
  • Understand the operation of cybersecurity-related information systems from the perspective of data-driven system

Schedule (Spring 2018)

  1. 4/09: Security Management
    • Safety, Security, Assurance, Risk.
  2. 4/16: Data Analysis Environment. Python Execution Environment (pdf), NumPy.ipynb, pandas.ipynb, PythonDSPackages.ipynb.
    • Google: Data Science Process, Data Science Lifecycle
  3. 4/23: Data Mining Algorithm: Supervised Learning
  4. 4/30: Data Mining Algorithm: Unsupervised Learning
  5. 5/07: Visualized Machine Learning Tool: Orange
    • Class note for Orange
  6. 5/14: Intrusion Detection System
  7. 5/21: Anomaly Detection System
  8. 5/28: Spam Mail Filtering System
  9. 6/4: Sequence Analysis System
    • S. Forrest, S. A. Hofmeyr, A. Somayaji and T. A. Longstaff, "A Sense of Self for Unix Processes," in Proc. IEEE Symposium on Security and Privacy, 1996.
    • Homework-04: Here i3s a set of system call traces from several malware, try to classify them into malware family. Please email your homework before July 1st 23:59.
  10. 6/11: Student Presentation
  11. 6/18: No class.
  12. 6/25: Final

Grading Policy

  • Homework (30%): programming exercises and essays
  • Participation (5%): attendance and discussion
  • Term Project (15%): implement a data analysis application
  • Midterm and Final (50%)