Software Security via Program Analysis
Class: 16741 CS 6501 Section 008
Cyberattacks are becoming more and more sophisticated. State-funded attackers are spending tremendous time and effort to infiltrate organizations (e.g., enterprise and government agencies) leveraging stealthy and sophisticated attack mechanisms (e.g., zero-day exploits).
To fight back against those attackers, there are various advanced techniques proposed by researchers and industry. As attackers break into systems in various ways, building a fundamental protection against these attackers require techniques across various layers of software and fundamental understanding of the system as well as attackers.
- This course will cover recent advances in cyberattack prevention and analysis via program analysis and reverse-engineering. In particular, we will focus on understanding recent advances in the topics by reading, presenting, and discussing details of recent publications in top security conferences (S&P, USENIX Security, CCS, and NDSS).
- You will learn (1) how to analyze vulnerabilities and exploits to understand root causes of the attack, and come up with fundamental solutions, (2) how to investigate sophisticated cyberattacks in order to pinpoint and discover how attackers infiltrate systems and what they did (e.g., leak secrets), and (3) how we can leverage program analysis techniques in order to automate the above tasks and make the software more secure.
- You will (1) read recent academic papers carefully and present the essence of papers, (2) learn how to implement advanced dynamic and static analysis used in attack prevention and investigation, and (3) learn how to conduct a system security research that builds fundamental software.
- It would be great if you have basic knowledge or experience in system programming in C (assembly is a big plus). Experience in dynamic program analysis tools (e.g., Intel Pin, DynamoRIO), static program analysis tools (e.g., LLVM), and/or reverse-engineering tools (e.g., IDA Disassembler, OllyDbg, Immunity Debugger) is very welcomed.
- If your individual project is well developed, I will support you to turn it into a research paper. Once accepted (in a conference or workshop), expenses required for your travel to the conference (or workshop) and presentation will be supported.
- If we both agreed that your research interests and potential are well aligned (with me), we may seek for potential funding sources for your Ph.D. program.
Information flow can be tracked via program analysis, meaning that we can understand how Siri and Alexa laughed.
Week 1: Class Introduction / Project Descriptions / How to Read/Critique/Present System Security Papers
Week 2: Program Tracing / (Project 1 and Pin introduction)
What is tracing? Why it is needed? How to automatically trace a program?
- Project 1 Start -- 10% of the grade., Paper Assignment
Week 3: Dynamic Analysis (Dynamic Slicing / Information Flow)
What is dynamic analysis? What is slicing and information flow? Why we need those? How to do those effectively? What we can do with those?
Week 4: Dynamic Analysis (Information Flow) / Reverse Engineering (Disassembly)
Topic 1: How we can leverage our knowledge of information flow to build secure systems/make systems secure?
Topic 2: What is reverse engineering? Basic principles of disassembly. Recovering semantics
Week 5: Reverse Engineering (Advanced Disassembly, Decompiler, Anti-Debugging Techniques)
Recovering semantics, Decompilers, Anti-debugging techniques and solutions against them
Week 6: Static Analysis / (Project 2 and LLVM introduction)
What is static analysis? How it is compared to dynamic analysis? What are the common static analysis techniques for security applications? => Answer: Value-set Analysis, Control-flow Integrity, Data-flow Integrity. How we can implement those? => Answer: LLVM
- Project 2 Start -- 10% of the grade.
Week 7: Static Analysis
Various algorithms including Value-set/Control flow integrity, Data-flow integrity
Week 8: Static Analysis + Dynamic Analysis / Probabilistic program analysis
Combining both analyses for effective and efficient problem solving. Runtime monitoring guided by static analysis / Static analysis guided by dynamic traces. Probabilistic approaches to improve program analysis techniques.
Week 9: Research proposal discussion
Have open discussion sessions for the projects. Providing peer feedback.
Week 10-14: Research paper discussion (Topics are flexible and determined in the class)
A group of students will present a particular topic. It would be a single research paper or multiple papers. Depending on the topic, the instructor may help the presentation to provide complete information.
- Project 3 Start -- 10% of the grade
Week 15: Final Presentations (Outcomes of your individual projects)
Individual students present final results of the course project (individual project) -- 10% of the grade. (10% for a report, 20% for the project artifacts)
Reading Day and Thanksgiving: No class
This class has no exam. The grading is based on projects and presentations.
1. Presentation: 20% (10% for understanding of the paper, 10% for effective presentation)
2. Assignments: 30% (3 assignments; each 10%)
3. Independent Research Project: 40% (20% for the design and implementation, 10% for a presentation, 10% for a report)
4. Class participation: 10% (Questions and Reviews for the papers discussed in the class)
I will hand out my business card for people who asked good questions. At the end of the semester, return the card to redeem your credits.
5. Extra credit: Extra assignments: TBD% (To be announced)
Dynamic Program Analysis
- Data-flow tracking
- Control-flow tracking
Static Program Analysis
- Data-flow analysis
- Control-flow analysis
- Pointer/alias analysis
- Evasive techniques
- Code obfuscation/de-obfuscation
Operating System Security
- Sandboxing/isolation, Fault localization
- Record and replay based analysis
- Script Language Security (JS/Flash)
- Browser Security
- Malicious Advertisement
- Security Issues on Android/iOS
- Program Analysis techniques for Mobile Platforms
- Security Issues on heterogeneous IoT platforms
- Improving IoT security via program analysis
This reading list includes representative publications that will be covered during this class. Papers will be added during the semester. Please use them to understand high-level themes of the class topics.
Particularly for systems security papers: (1) Read Abstract -> Introduction -> Conclusion, (2) Find and read a motivation (representative) example or case studies. They include a complete (and often realistic) story and how the proposed idea solves the problem with newly proposed methods.
Dynamic/Static Analysis Frameworks
- Pin: building customized program analysis tools with dynamic instrumentation [PLDI'05]
- Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation [PLDI'07]
- LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation [CGO'04]
Data-flow tracking and Data-flow analysis
- libdft: Practical Dynamic Data Flow Tracking for Commodity Systems [VEE'12]
- A General Approach for Efficiently Accelerating Software-based Dynamic Data Flow Tracking on Commodity Hardware [NDSS'12]
- LDX: Causality Inference by Lightweight Dual Execution [ASPLOS'16]
Control-flow tracking and Control-flow analysis
- Control-Flow Integrity [MSR-TR-05-18]
- Efficient Path Encoding [MICRO'96]
- Precise Calling Context Encoding [ICSE'10]
- LDX: Causality Inference by Lightweight Dual Execution [ASPLOS'16]
- Evading android runtime analysis via sandbox detection [ASIACCS'14]
- X-Force: Force-Executing Binary Programs for Security Applications [USENIX'14]
- Revolver: An Automated Approach to the Detection of Evasive Web-based Malware [SP'13]
- Deobfuscation of virtualization-obfuscated software: a semantics-based approach [CCS'11]
- LOOP: Logic-Oriented Opaque Predicate Detection in Obfuscated Binary Code [CCS'15]
- Code obfuscation against symbolic execution attacks [ACSAC'16]
Record and replay / N-version systems
- Intrusion recovery using selective re-execution [OSDI'10]
- Record and transplay: partial checkpointing for replay debugging across heterogeneous systems [SIGMETRICS'11]
- Transparent Mutable Replay for Multicore Debugging and Patch Validation [ASPLOS'13]
- Varan the Unbelievable: An Efficient N-version Execution Framework [ASPLOS'15]
- High Accuracy Attack Provenance via Binary-Based Execution Partition [NDSS'13]
- LogGC: Garbage Collecting Audit Log [CCS'13]
- Efficient patch-based auditing for web application vulnerabilities [OSDI'12]
- The Security Architecture of the Chromium Browser
- UCognito: Private Browsing without Tears [CCS'15]
- WebCapsule: Towards a Lightweight Forensic Engine For Web Browsers [CCS'15]
- Riding out DOMsday: Toward Detecting and Preventing DOM Cross-Site Scripting [NDSS'18]
- FlashDetect: ActionScript 3 malware detection [RAID'12]
- The Postman Always Rings Twice: Attacking and Defending postMessage in HTML5 Websites [NDSS'13]
Sandboxing/isolation, Fault localization
- iRiS: Vetting Private API Abuse in iOS Applications [CCS'15]
- GUITAR: Piecing Together Android App GUIs from Memory Images. [CCS'15]
- Fear and Logging in the Internet of Things [NDSS'18]
- IoTFuzzer: Discovering Memory Corruptions in IoT Through App-based Fuzzing [NDSS'18]
- Sensitive Information Tracking in Commodity IoT [USENIX'18]
Machine Learning (Added)