Kexin Pei

About Me

Publication

Talks

Work Experience

Services

Media Coverage

Teaching

Kexin Pei

Assistant Professor

Department of Computer Science

The University of Chicago

Contact: kpei@cs.uchicago.edu

Google Scholar, GitHub, Twitter

Prospective students: I am always looking for brilliant students. I am especially interested in working with self-motivated students who (1) have previously worked in Security or Software Engineering; (2) have experience with Program Analysis or Reverse Engineering; or (3) have a solid background in Machine Learning. Drop me an email if you are interested in working with me.

About Me

I am a Neubauer Family Assistant Professor at the Department of Computer Science, The University of Chicago. I received my PhD at Department of Computer Science, Columbia University. I am broadly interested in Security, Software Engineering, and Machine Learning, focusing on developing data-driven program analysis to improve the security and reliability of both traditional and AI-based software systems. I get most excited about developing machine learning models that can reason about program structure and behavior to precisely and efficiently analyze, detect, and fix software bugs and vulnerabilities.

Publication

Kexin Pei, Weichen Li, Qirui Jin, Shuyang Liu, Scott Geng, Lorenzo Cavallaro, Junfeng Yang, Suman Jana. "Exploiting Code Symmetries for Learning Program Semantics" (Spotlight Top-3.5%) in 41st International Conference on Machine Learning (ICML 2024). [pdf]
Sally Junsong Wang, Kexin Pei, Junfeng Yang. "SMARTINV: Multimodal Learning for Smart Contract Invariant Inference" in 45th IEEE Symposium on Security and Privacy (Oakland S&P 2024). [pdf][code]
John Yang, Akshara Prabhakar, Shunyu Yao, Kexin Pei, Karthik R Narasimhan. "Language Agents as Hackers: Evaluating Cybersecurity Skills with Capture the Flag" (Best Paper Award, Oral) in Multi-Agent Security, Co-located with the 37th Conference on Neural Information Processing Systems (MASEC@NeurIPS 2023). [pdf]
Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, Karthik Narasimhan. "SWE-bench: Can Language Models Resolve Real-World GitHub Issues?" (Oral) in 12th International Conference on Learning Representations (ICLR 2024). [pdf][code][website]
Yangruibo Ding, Ben Steenhoek, Kexin Pei, Gail Kaiser, Wei Le, Baishakhi Ray. "TRACED: Execution-aware Pre-training for Source Code" in Proceedings of the 46th International Conference on Software Engineering (ICSE 2024). [pdf]
Kexin Pei, David Bieber, Kensen Shi, Charles Sutton, Pengcheng Yin. "Can Large Language Models Reason about Program Invariants?" in Proceedings of the 40th International Conference on Machine Learning (ICML 2023). [pdf][poster]
Kexin Pei, Dongdong She*, Michael Wang*, Scott Geng*, Zhou Xuan, Yaniv David, Junfeng Yang, Suman Jana, Baishakhi Ray. "NeuDep: Neural Binary Memory Dependence Analysis", in Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2022). [pdf][code]
Xin Jin, Kexin Pei, Jun Yeon Won, Zhiqiang Lin. "SymLM: Predicting Function Names in Stripped Binaries via Context-Sensitive Execution-Aware Code Embeddings", in ACM Conference on Computer and Communications Security (CCS 2022). [pdf][code][poster]
Kexin Pei, Zhou Xuan, Junfeng Yang, Suman Jana, Baishakhi Ray. "Trex: Learning Execution Semantics from Micro-Traces for Binary Similarity", in IEEE Transactions on Software Engineering (TSE 2022). [pdf][code][ghidra plugin][short video][slides]
Kexin Pei, Jonas Guan, Matthew Broughton, Zhongtian Chen, Songchen Yao, David Williams-King, Vikas Ummadisetty, Junfeng Yang, Baishakhi Ray, Suman Jana. "StateFormer: Fine-Grained Type Recovery from Binaries using Generative State Modeling", in Proceedings of the 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2021). [pdf][code][slides][video]
Kexin Pei*, Jonas Guan*, David Williams-King, Junfeng Yang, Suman Jana. "XDA: Accurate, Robust Disassembly with Transfer Learning", in Proceedings of the 2021 Network and Distributed System Security Symposium (NDSS 2021). [pdf][code][slides][video]
Dongdong She, Kexin Pei, Dave Epstein, Junfeng Yang, Baishakhi Ray, Suman Jana. "NEUZZ: Efficient Fuzzing with Neural Program Smoothing", in Proceedings of the 40th IEEE Symposium on Security and Privacy (Oakland S&P 2019). [pdf][code][slides][video]
Kexin Pei, Linjie Zhu, Yinzhi Cao, Junfeng Yang, Carl Vondrick, Suman Jana. "Towards Practical Verification of Machine Learning: The Case of Computer Vision Systems". in ICSE 2019 Workshop on Testing for Deep Learning and Deep Learning for Testing (DeepTest 2019). [pdf]
Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, Suman Jana. "Efficient Formal Safety Analysis of Neural Networks", in Proceedings of the 32nd Conference on Neural Information Processing Systems (NIPS 2018). [pdf][code][video]
Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, Suman Jana. "Formal Security Analysis of Neural Networks using Symbolic Intervals", in Proceedings of the 27th USENIX Security Symposium (USENIX Security 2018). [pdf][code][video]
Yuchi Tian, Kexin Pei, Suman Jana, Baishakhi Ray. "DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars", in Proceedings of the 40th International Conference on Software Engineering (ICSE 2018). [pdf][code][results]
Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana. "DeepXplore: Automated Whitebox Testing of Deep Learning Systems" (Best Paper Award, Runner-Up in CSAW 2018 Top-10 Finalist of Applied Research Competition, ACM SigMobile Research Highlight, MLSec@NIPS'17), in Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP 2017). [pdf][code][poster][slides][CACM research highlight][video].
Suphannee Sivakorn, George Argyros, Kexin Pei, Angelos D. Keromytis, Suman Jana. "HVLearn: Automated Black-box Analysis of Hostname Verification in SSL/TLS Implementations", in Proceedings of the 38th IEEE Symposium on Security and Privacy (Oakland S&P 2017). [pdf][code]
Kexin Pei, Zhongshu Gu, Brendan Saltaformaggio, Shiqing Ma, Fei Wang, Zhiwei Zhang, Luo Si, Xiangyu Zhang, Dongyan Xu. "HERCULE: Attack Story Reconstruction via Community Discovery on Correlated Log Graph", in Proceedings of the 32nd Annual Computer Security Applications Conference (ACSAC 2016). [pdf]
Zhaogui Xu, Xiangyu Zhang, Lin Chen, Kexin Pei, Baowen Xu. "Python Probabilistic Type Inference with Natural Language Support" (Distinguished Artifact Award), in Proceedings of the 24th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE 2016). [pdf]
Zhongshu Gu, Kexin Pei, Qifan Wang, Luo Si, Xiangyu Zhang, Dongyan Xu. "LEAPS: Detecting Camouflaged Attacks with Statistical Learning Guided by Program Analysis", in Proceedings of the 45th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2015). [pdf]
Haibo Hu, Jianliang Xu, Xizhong Xu, Kexin Pei, Byron Choi, Shuigeng Zhou. "Private Search on Key-Value Stores with Hierarchical Indexes", in Proceedings of the 30th IEEE International Conference on Data Engineering (ICDE 2014). [pdf]

Talks

"Analyzing and Securing Software via Robust and Generalizable Learning". USC, CMU, UMich, UCSD, UW Madison, UBC, UMass, Georgia Tech, Rice, UChicago, Purdue ECE, Purdue CS, HKU, HKBU, UMD, JHU, Nvidia, SKKU.
"Can Large Language Models Reason about Program Invariants"? Google Brain
"Scalable, Accurate, Robust Binary Analysis with Transfer Learning Trace Modeling". Northwestern, NSA, OSU, JHU, NUS
"Trex: Learning Execution Semantics from Micro-Traces for Binary Similarity". UC Berkeley, University of Stuttgart
"DeepXplore: Automated Whitebox Testing of Deep Learning Systems". Stanford University
"XDA: Accurate, Robust Disassembly with Transfer Learning". Rutgers University
"Neural Program Smoothing for Fuzzing". The University of Hong Kong
"Bringing Engineering Rigor to Deep Learning". Nanjing University
"Towards Testing and Verification of Machine Learning (ML) Systems". MSR, NEC Lab
"LEAPS: Detecting Camouflaged Attacks with Statistical Learning Guided by Program Analysis". CERIAS Security Seminar Series

Work Experience

Research Intern @ Google Brain, Google Research, Mountain View, CA (2022), working with Charles Sutton, David Bieber, Kensen Shi, Pengcheng Yin, and Henryk Michalewski.
Research Intern @ RiSE, Microsoft Research, Redmond, WA (2018), working with Madan Musuvathi, Todd Mytkowicz, and Saeed Maleki.

Services

Program Committee. ASE (2024), USENIX ATC (2024), ISSTA (2024), Oakland Security & Privacy (2024), NDSS Poster (2024), ESEC/FSE-MAPS (2023), ESEC/FSE-AEC (2023), AITest (2023), DSN-DSML (2023), AISec (2022, 2021), CSAW Applied Research Competition (2022, 2021, 2023), USENIX Security AEC (2022), Binary Analysis Research (BAR) (2024, 2022), Deep Learning and Security (DLS) (2024, 2023, 2022), ESEC/FSE-Demo (2022)
Reviewer. TOPS (2023), ASPLOS (2023), USENIX ATC (2023), ICLR (2023, 2022), AAAI (2022), ICML (2022), OSDI (2022), NeurIPS (2022, 2021), USENIX Security (2020, 2017), CCS (2021, 2017), Oakland Security & Privacy (2020), NDSS (2020), ACM Computing Surveys (2019), TSE (2019), AsiaCCS (2018), DSN (2016), RAID (2016)

Media Coverage

Interview with CACM, Scientific American, IEEE Spectrum, Newsweek, TechRadar, Columbia News, Sohu, Sina, CCTV Hello, AI Documentary

Teaching

E6121: Reliable Software, Spring 2019
E6998: Security and Robustness of ML systems, Spring 2018