ParserHunter: Identify Parsing Functions in Binary Code

Marco Scapin, Fabio Pinelli, Letterio Galletta

IMT School for Advanced Studies Lucca

Parsing and validation functions are crucial because they process untrusted data, e.g., user inputs. Due to their complexity, these functions are highly susceptible to bugs, making them a primary target for security audits. However, identifying such functions within a binary is time-intensive and challenging, given the numerous functions typically present and the lack of source code or supporting documentation. This paper presents an AI-based methodology for identifying functions with parser-like behavior and complex processing logic within a binary.

Our methodology analyzes each binary by identifying its functions, extracting their Control Flow Graphs (CFGs), and enriching them with features derived from an embedding model that captures both structural and semantic aspects of their behavior. These annotated CFGs are the input to a Graph Neural Network trained to identify parsing functions. We implement this methodology in the tool ParserHunter, which allows users to train the model on labeled data, query the model with unseen binaries, and accommodate a symbolic execution phase on the processed binary through a user interface. Our experiments on ten real-world projects from GitHub show that our tool effectively identifies parsers in binaries

The figure shows the pipeline adopted by ParserHunter to train and test a classifier to detect parser functions in binary executables.

DEMO VIDEO

Web_ParserHunter.mp4

REFERENCE:

Official Publication (Journal of Systems & Software): https://doi.org/10.1016/j.jss.2026.112783

Open Access Version (Pre-print): [Download PDF from Zenodo]

CODE:

ParserHunter Core Tool & Datasets: GitHub: https://github.com/ScapMarco/ParserHunter

WEB INTERFACE

ParserHunter Web Interface: GitHub: https://github.com/ScapMarco/Web_Interface_ParserHunter

Google Sites

Report abuse