Kexin Pei

Ph.D. Candidate

Department of Computer Science, Columbia University

kpei@cs.columbia.edu

Google Scholar, GitHub, Twitter

About Me

I am a Ph.D. student at Department of Computer Science, Columbia University. I am co-advised by Suman Jana and Junfeng Yang, and work closely with Baishakhi Ray. Before coming to Columbia, I did my research-based Master at Department of Computer Science, Purdue University, advised by Dongyan Xu, Xiangyu Zhang, and Luo Si. Prior to Purdue, I worked at the HKBU Database Group, advised by Haibo Hu and Jianliang Xu. I was a research intern at Google Brain, working with Charles Sutton, David Bieber, Kensen Shi, Pengcheng Yin, and Henryk Michalewski.


I am broadly interested in Security, Software Engineering, and Machine Learning, specifically focusing on developing data-driven program analysis approaches to improve the reliability and security of traditional and AI-based software systems. I get most excited about developing machine learning models that can reason about program behavior to precisely and efficiently analyze, detect, and fix software vulnerabilities.


[Update 2022] I am on the 2022-2023 academic job market.

Education

  • Ph.D., Department of Computer Science, Columbia University

  • M.S., Department of Computer Science, Purdue University

  • B.S., Department of Computer Science, Hong Kong Baptist University

Publication

  1. Kexin Pei, Dongdong She*, Michael Wang*, Scott Geng*, Zhou Xuan, Yaniv David, Junfeng Yang, Suman Jana, Baishakhi Ray. "NeuDep: Neural Binary Memory Dependence Analysis", in Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2022, Acceptance Rate: 25%). [pdf][code]

  2. Xin Jin, Kexin Pei, Jun Yeon Won, Zhiqiang Lin. "SymLM: Predicting Function Names in Stripped Binaries via Context-Sensitive Execution-Aware Code Embeddings", in ACM Conference on Computer and Communications Security (CCS 2022). [pdf][code][poster]

  3. Kexin Pei, Jonas Guan, Matthew Broughton, Zhongtian Chen, Songchen Yao, David Williams-King, Vikas Ummadisetty, Junfeng Yang, Baishakhi Ray, Suman Jana. "StateFormer: Fine-Grained Type Recovery from Binaries using Generative State Modeling", in Proceedings of the 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2021, Acceptance Rate: 24.5%). [pdf][code][slides][video]

  4. Kexin Pei, Zhou Xuan, Junfeng Yang, Suman Jana, Baishakhi Ray. "Trex: Learning Execution Semantics from Micro-Traces for Binary Similarity". [pdf][code][ghidra plugin][slides]

  5. Kexin Pei*, Jonas Guan*, David Williams-King, Junfeng Yang, Suman Jana. "XDA: Accurate, Robust Disassembly with Transfer Learning", in Proceedings of the 2021 Network and Distributed System Security Symposium (NDSS 2021, Acceptance Rate: 15.2%). [pdf][code][slides][video]

  6. Dongdong She, Kexin Pei, Dave Epstein, Junfeng Yang, Baishakhi Ray, Suman Jana. "NEUZZ: Efficient Fuzzing with Neural Program Smoothing", in Proceedings of the 40th IEEE Symposium on Security and Privacy (Oakland S&P 2019, Acceptance Rate: 13%). [pdf][code][slides][video]

  7. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana. "Towards Practical Verification of Machine Learning: The Case of Computer Vision Systems". in ICSE 2019 Workshop on Testing for Deep Learning and Deep Learning for Testing (DeepTest 2019). [pdf]

  8. Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, Suman Jana. "Efficient Formal Safety Analysis of Neural Networks", in Proceedings of the 32nd Conference on Neural Information Processing Systems (NIPS 2018, Acceptance Rate: 20.8%). [pdf][code][video]

  9. Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, Suman Jana. "Formal Security Analysis of Neural Networks using Symbolic Intervals", in Proceedings of the 27th USENIX Security Symposium (USENIX Security 2018, Acceptance Rate: 19%). [pdf][code][video]

  10. Yuchi Tian, Kexin Pei, Suman Jana, Baishakhi Ray. "DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars", in Proceedings of the 40th International Conference on Software Engineering (ICSE 2018, Acceptance Rate: 21%). [pdf][code][results]

  11. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana. "DeepXplore: Automated Whitebox Testing of Deep Learning Systems", in Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP 2017, Acceptance Rate: 16%). [pdf][code][poster][slides]

    • Best Paper Award

    • Research Highlights of Communications of the ACM (CACM). [article][video].

    • Runner-Up in CSAW 2018 Top-10 Finalist of Applied Research Competition (CSAW'18).

    • In Research Highlights of ACM SigMobile: Mobile Computing and Communications (GetMobile).

    • In Proceedings of the NIPS 2017 Workshop on Machine Learning and Computer Security (MLSec 2017).

  12. Suphannee Sivakorn, George Argyros, Kexin Pei, Angelos D. Keromytis, Suman Jana. "HVLearn: Automated Black-box Analysis of Hostname Verification in SSL/TLS Implementations", in Proceedings of the 38th IEEE Symposium on Security and Privacy (Oakland S&P 2017, Acceptance Rate: 13%). [pdf][code]

  13. Kexin Pei, Zhongshu Gu, Brendan Saltaformaggio, Shiqing Ma, Fei Wang, Zhiwei Zhang, Luo Si, Xiangyu Zhang, Dongyan Xu. "HERCULE: Attack Story Reconstruction via Community Discovery on Correlated Log Graph", in Proceedings of the 32nd Annual Computer Security Applications Conference (ACSAC 2016, Acceptance Rate: 22%). [pdf]

  14. Zhaogui Xu, Xiangyu Zhang, Lin Chen, Kexin Pei, Baowen Xu. "Python Probabilistic Type Inference with Natural Language Support", in Proceedings of the 24th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE 2016, Acceptance Rate: 27%). [pdf]

    • Diamond Artifact Award

  15. Zhongshu Gu, Kexin Pei, Qifan Wang, Luo Si, Xiangyu Zhang, Dongyan Xu. "LEAPS: Detecting Camouflaged Attacks with Statistical Learning Guided by Program Analysis", in Proceedings of the 45th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2015, Acceptance Rate: 21%). [pdf]

  16. Haibo Hu, Jianliang Xu, Xizhong Xu, Kexin Pei, Byron Choi, Shuigeng Zhou. "Private Search on Key-Value Stores with Hierarchical Indexes", in Proceedings of the 30th IEEE International Conference on Data Engineering (ICDE 2014, Acceptance Rate: 20%). [pdf]

Work Experience

Talks

  • "Can Large Language Models Reason about Program Invariants?". Google Brain

  • "Scalable, Accurate, Robust Binary Analysis with Transfer Learning Trace Modeling". Northwestern University

  • "Scalable, Accurate, Robust Binary Analysis with Transfer Learning Trace Modeling". NSA Vulnerability Research and Machine Learning Group

  • "Trex: Learning Execution Semantics from Micro-Traces for Binary Similarity". UC Berkeley

  • "Scalable, Accurate, Robust Binary Analysis with Transfer Learning Trace Modeling". Ohio State University

  • "Scalable, Accurate, Robust Binary Analysis with Transfer Learning Trace Modeling". Johns Hopkins University

  • "Scalable, Accurate, Robust Binary Analysis with Transfer Learning Trace Modeling". National University of Singapore

  • "Trex: Learning Execution Semantics from Micro-Traces for Binary Similarity". University of Stuttgart

  • "DeepXplore: Automated Whitebox Testing of Deep Learning Systems". Stanford University

  • "XDA: Accurate, Robust Disassembly with Transfer Learning". Rutgers University

  • "Neural Program Smoothing for Fuzzing". The University of Hong Kong

  • "Bringing Engineering Rigor to Deep Learning". Nanjing University

  • "Towards Testing and Verification of Machine Learning (ML) Systems". Microsoft Research Redmond

  • "Towards Testing and Verification of Machine Learning (ML) Systems". NEC Lab Princeton

  • "LEAPS: Detecting Camouflaged Attacks with Statistical Learning Guided by Program Analysis". CERIAS Security Seminar Series

Services

  • Program Committee. AISec (2021, 2022), CSAW Applied Research Competition (2021, 2022), USENIX Security Artifact Evaluation (2022), Binary Analysis Research (BAR) (2022), Deep Learning and Security (DLS) (2022), ESEC/FSE Demo Track (2022)

  • Reviewer. ICLR (2022, 2023), AAAI (2022), ICML (2022), OSDI (2022), NeurIPS (2021, 2022), USENIX Security (2017, 2020), CCS (2017, 2021), Oakland Security & Privacy (2020), NDSS (2020), ACM Computing Surveys (2019), TSE (2019), AsiaCCS (2018), DSN (2016), RAID (2016)

Media Coverage

Teaching

  • E6121: Reliable Software, Spring 2019

  • E6998: Security and Robustness of ML systems, Spring 2018