Parsing and validation functions are crucial because they process untrusted data, e.g., user inputs. Due to their complexity, these functions are highly susceptible to bugs, making them a primary target for security audits. However, identifying such functions within a binary is time-intensive and challenging, given the numerous functions typically present and the lack of source code or supporting documentation. This paper presents an AI-based methodology for identifying functions with parser-like behavior and complex processing logic within a binary.
Our methodology analyzes each binary by identifying its functions, extracting their Control Flow Graphs (CFGs), and enriching them with features derived from an embedding model that captures both structural and semantic aspects of their behavior. These annotated CFGs are the input to a Graph Neural Network trained to identify parsing functions. We implement this methodology in the tool ParserHunter, which allows users to train the model on labeled data, query the model with unseen binaries, and accommodate a symbolic execution phase on the processed binary through a user interface. Our experiments on ten real-world projects from GitHub show that our tool effectively identifies parsers in binaries
The figure shows the pipeline adopted by ParserHunter to train and test a classifier to detect parser functions in binary executables.
DEMO VIDEO
REFERENCE:
Official Publication (Journal of Systems & Software): https://doi.org/10.1016/j.jss.2026.112783
Open Access Version (Pre-print): [Download PDF from Zenodo]
CODE:
ParserHunter Core Tool & Datasets: GitHub: https://github.com/ScapMarco/ParserHunter
WEB INTERFACE
ParserHunter Web Interface: GitHub: https://github.com/ScapMarco/Web_Interface_ParserHunter