MACHINE LEARNING FOR

PROGRAM ANALYSIS

Description

The Machine Learning for Program Analysis (MLPA) workshop is being held as an independent event.

The main objective of this workshop is to bring together researchers in the machine learning and program analysis communities and to serve as a platform for identifying cross-disciplinary problems of mutual interest.

Important Dates

Submission deadline: September 4, 2020

Author notifications: September 22, 2020

Camera-ready version: September 29, 2020

Live workshop: January 5th 2021, 8am-1pm PST

Program

(See below)

Registration

https://usc.zoom.us/meeting/register/tJcqfuuvqTktHdH3CxsRP7L3M2SepV_HP8Id

Call for Papers

Program analysis is an essential research area in software security. In addition to formal methods and compiler theory, a large span of post-development techniques have been developed over time in order to solve software security problems ranging from vulnerability discovery, reverse engineering, code clone detection and obfuscation/deobfuscation among many other applications. Some approaches require source-code to operate at the language or bytecode level, whereas other approaches focus on binary code in order to cope with situations where source code and/or build environments are not accessible.

In both cases, methods for post-development program analysis have traditionally relied on manually defined heuristics, requiring human effort and limiting the scalability of the resulting models.

In recent years, in a context of constantly growing software size, complexity and attack surface, there has been a growing interest in applying machine learning techniques to further automate and improve the scalability of program analysis techniques. Examples include the use of Conditional Random Fields for recovering debug information about binaries, developing deep neural networks for identifying function boundaries and function types, discovering new vulnerabilities, and decompilation. In addition to this, graph-based methods have also been used for assessing similarity between two binary inputs and code-duplicate detection, code classification and vulnerability detection, among others.

The main objective of this workshop is to bring together researchers in machine learning and program analysis communities and serve as a platform for identifying cross-disciplinary problems of mutual interest. The partial list of the topic covered at the workshop include: Representation learning, Natural language processing, Graph based methods for source-level, binary-level, bytecode-level program analysis.

Topics of interest

  • Representation learning for program analysis

  • Natural language processing for program analysis

  • Graph neural networks for intermediate representations

  • Supervised vs unsupervised problems in program analysis

  • Relevant applications, e.g.:

      • Code similarity detection

      • Vulnerability detection

      • Function boundary identification

  • Standardized datasets and benchmarks

  • Challenge problems

  • Automated analysis approaches for Go and Rust binaries

  • Automated analysis of smart contracts


Submissions

Submissions can be of two types:

  • Full-length papers (max 6 pages + 1 page for references) describing original research findings

  • Short papers (max 4 pages + 1 page for references) describing challenge problems.

A selected number of submissions will be accepted for oral presentations. All the accepted papers can be presented as a poster during a designated session.

Please note that MLPA submissions will not appear in proceedings.

All submissions are anonymous and must be made via Easychair website.

Program

Live event: Jan 5th, 2020 8am-1pm PST

  • 8 am - ​ Opening

  • 8:10 am - ​ Invited Talk​ - Charles Sutton (Google AI, University of Edinburgh)

  • 9:10 am - ​ Break

  • 9:30 am - ​ Revisiting Function Identification with Machine Learning ​ (Hyungjoon Koo, Soyeon Park and Taesoo Kim)

  • 9:50 am - ​ AI-based Code Deobfuscation: Evaluation and Improvement ​ ( ​ Grégoire Menguy, Cauim de Souza Lima, Sébastien Bardin and Richard Bonichon)

  • 10:10 am - ​ Compiler and optimization level recognition using graph neural networks ​ (Sébastien Bardin, Tristan Benoit and Jean-Yves Marion)

  • 10:30 am - ​ Control Flow Graph Retrieval from Blackbox Execution of Embedded Software through Physical Side-channel Analysis ​ ( Alexis Rey, Roland Groz and Jean-Christophe Fonbonne)

  • 10:50 am - ​ Break

  • 11 am - ​ Keynote - ​ Sergey Bratus (DARPA)

  • 12 pm ​ Panel/Discussion (chaired by Martin Rinard, MIT CSAIL)


Invited Speakers

Organizing Committee

  • Shushan Arakelyan, Information Sciences Institute/University of Southern California

  • Aram Galstyan, Information Sciences Institute/University of Southern California

  • Christophe Hauser, Information Sciences Institute/University of Southern California

  • Dawn Song, University of California at Berkeley

  • Heng Yin, University of California at Riverside

Program Committee

  • Sami Abu-El-Haija, Information Sciences Institute/University of Southern California

  • Miltiadis Allamanis, Microsoft

  • Uri Alon, Technion - Israel Institute of Technology

  • Davide Balzarotti, Eurecom

  • Tiffany Bao, Arizona State University

  • Sebastien Bardin, CEA LIST

  • Antonio Bianchi, Purdue University

  • Marc Brockschmidt, Microsoft

  • Lorenzo Cavallaro, King's College London

  • Philippe Charland, Defence Research and Development Canada

  • Huili Chen, University of California, San Diego

  • Scott Coull, FireEye

  • Yaniv David, Technion - Israel Institute of Technology

  • Yue Duan, Cornell University

  • Yanick Fratantonio, Eurecom

  • Palash Goyal, Samsung

  • Kevin Hamlen, The University of Texas at Dallas

  • Jingxuan He, ETH Zurich

  • Trent Jaeger, The Pennsylvania State University

  • Alex Jordan, Raytheon BBN Technologies

  • Christopher Kruegel, University of California, Santa Barbara

  • Zhen Li, Huazhong University of Science and Technology/Hebei University

  • Zhenkai Liang, National University of Singapore

  • Zhiqiang Lin, The Ohio State University

  • Mehrnoosh Mirtaheri, Information Sciences Institute/University of Southern California

  • Aravind Prakash, Binghamton University

  • William Robertson, Northeastern University

  • Edward Schwartz, Carnegie Mellon University

  • Giovanni Vigna, University of California, Santa Barbara

  • Gang Wang, University of Illinois at Urbana-Champaign

  • Ruoyu Wang, Arizona State University

  • Maverick Woo, Carnegie Mellon University

  • Dinghao Wu, The Pennsylvania State University

  • Xinyu Xing, The Pennsylvania State University

  • Sarah Zennou, Airbus

  • Mu Zhang, The University of Utah

For questions emails us at: mlpa@isi.edu

mlba2020#h.rkdkobkw5bfy