MTV2 2024

Journée MTV2 du 21 Novembre 2024

Localisation

EPITA (campus de Paris – Kremlin-Bicêtre), Amphi 0

Adresse : 14-16 Rue Voltaire, 94270 Le Kremlin-Bicêtre

Les informations pratiques se trouvent en bas de page.

Inscriptions

Les inscriptions sont désormais closes. La liste des inscrits est visible sur :

https://framadate.org/xM8uD51NKLGTNnoO

Pour toute demande tardive de modifications d'inscription, merci de contacter par mail les responsables du groupe MTV2.

Programme

Talks can be given in French or in English. Since some part of the audience is not fluent in French, it is recommended to have the presentation slides in English.

09h45-10h25 Accueil, café / Arrivals, welcome coffee

10h25-10h30 Ouverture de la journée / Opening. Nikolai Kosmatov, Natalia Kushik, Pascale Le Gall, Antoine Rollet

10h30-12h00 Session 1.

ROSA : Détection de Portes Dérobées avec du Fuzzing. Dimitri Kokkonis (CEA)
Vérification à l'Exécution de Propriétés de Sécurité de Haut Niveau: Etude de Cas sur la TPM Software Stack. Yani Ziani (Thales)
CUTECat: Concolic Execution for Computational Law. Pierre Goutagny (Inria Lille)

12h00-13h30 Déjeuner / Lunch

13h30-15h00 Session 2.

Exposé invité / Invited talk. Property-Based Testing of OCaml 5. Jan Midtgaard (Tarides)
An Efficient and Uniform CSP Solution Generator Generator. Ghiles Ziat (EPITA)

15h00-15h30 Pause café / Coffee break

15h30-17h00 Session 3.

Discovery of interaction models of distributed systems from their execution. Christian Joel Nguetoum (CEA)
Towards Efficient Verification of Parallel Applications with Mc SimGrid. Mathieu Laurent (Inria Rennes)
Fast identification of network protocol states with greybox active automata learning - Ongoing work. Yohan Pipereau (Télécom SudParis)

17h00 Fin de la journée / End of the workshop

The duration of a regular talk is 30 min (25 min talk +5 min questions & answers).

The duration of an invited talk is 60 min (50 min talk +10 min questions & answers).

Résumés des exposés

Exposé invité: Jan Midtgaard (Tarides) - Property-Based Testing of OCaml 5

Abstract:

OCaml 5, released in December 2022, added multicore parallelism to the OCaml programming language. We describe our work on building a property-based test suite to help ensure its correctness. To do so, we ended up developing two related testing libraries:

- Lin -- a library to test for sequential consistency

- STM -- a state-machine testing library

Both libraries build on QCheck, a black-box, property-based testing library in the style of QuickCheck. We share our findings and our experience in using these to eliminate bugs from both the standard library and from the run-time system of OCaml 5.

This is based on joint work with my Tarides colleagues Olivier Nicole, Nicolas Osborne, and Samuel Hym. An early version of this work was presented at the 2022 OCaml Workshop.

Biography:

Jan Midtgaard is a Principal Software Engineer at Tarides. Before joining industry, he was Associate Professor in Software Engineering at the Maersk Mc-Kinney Moller Institute, University of Southern Denmark. He holds a PhD in computer science from the University of Aarhus, and since postdoc'ed at INRIA Rennes, Roskilde University, University of Aarhus, and the Technical University of Denmark (DTU).

His work centers around programming languages and the technology surrounding them, in particular, functional programming, compilers, static analysis, and property-based testing. His paper "QuickChecking Patricia Trees" won the John McCarthy best paper award of TFP 2017.

Présentations régulières:

ROSA : Détection de Portes Dérobées avec du Fuzzing (Dimitri Kokkonis - CEA)

Une porte dérobée (backdoor en anglais) au niveau du code est un accès caché, injecté dans le code d'un programme. Par exemple, des identifiants codés en dur dans le code d'un serveur web pourraient donner l'accès à ce dernier pour toute personne étant au courant de la vulnérabilité. Des attaques de chaîne d'approvisionnement ont déjà mené à l'injection de portes dérobées dans des logiciels open-source populaires, ainsi que dans du firmware utilisé par plusieurs routeurs. L'audit manuel de code binaire à cette échelle présente un obstacle insurmontable, et les approches semi-automatiques existantes se concentrent uniquement sur un petit sous-ensemble de types de programmes et de portes dérobées, et en même temps nécessitent de la rétro-ingénierie manuelle des programmes binaires en question. Le fuzzing en boîte grise (génération automatisée semi-aléatoire de tests) est devenu très populaire grâce à sa capacité de découvrir un grand nombre de vulnérabilités dans des programmes variés, et devient ainsi une solution-candidate très adaptée pour l'amélioration de la détection de portes dérobées. Toutefois, les méthodes existantes en fuzzing n'offrent aucun moyen direct pour la détection dynamique de portes dérobées. Avec cette contribution on introduit ROSA, une nouvelle approche (et outil) qui combine un fuzzer de l'état de l'art (AFL++) avec un nouvel oracle métamorphique, capable de détecter des déclenchements de portes dérobées de façon dynamique. Pour faciliter l'évaluation de ROSA, on a aussi créé ROSARUM, le premier benchmark open-source pour l'évaluation de la détection de plusieurs types de portes dérobées se trouvant dans des programmes variés. Notre évaluation montre que ROSA atteint un niveau de robustesse, rapidité et automatisation similaire au fuzzing classique. Comparée aux outils existants, notre approche peut détecter une grande variété de portes dérobées dans plusieurs types de programmes sans nécessiter la rétro-ingénierie du code binaire.

A code-level backdoor is a hidden access, programmed and concealed within the code of a program. For instance, hard-coded credentials planted in the code of a web server would enable maliciously logging into all the deployed instances of this server. Confirmed software supply-chain attacks have led to the injection of backdoors into popular open-source projects, and backdoors have been discovered in various router firmware. Manual code auditing for backdoors is challenging and existing semi-automated approaches can handle only a limited amount of programs and backdoors, while requiring manual reverse-engineering of the audited (binary) program. Graybox fuzzing (automated semi-randomized testing) has grown in popularity due to its success in discovering vulnerabilities and hence stands as a strong candidate for improved backdoor detection. However, current fuzzing knowledge does not offer any means to detect the triggering of a backdoor at runtime. In this work we introduce ROSA, a novel approach (and tool) which combines a state-of-the-art fuzzer (AFL++) with a new metamorphic test oracle, capable of detecting runtime backdoor triggers. To facilitate the evaluation of ROSA, we have created ROSARUM, the first openly available benchmark for assessing the detection of various backdoors in diverse programs. Experimental evaluation shows that ROSA has a level of robustness, speed and automation similar to classical fuzzing. Compared to existing detection tools, it can handle a diversity of backdoors and programs and it does not rely on manually reverse-engineering the fuzzed binary code.

Vérification à l'Exécution de Propriétés de Sécurité de Haut Niveau: Etude de Cas sur la TPM Software Stack (Yani Ziani - Thales)

Le Trusted Platform Module (TPM) est un crypto-processeur conçu pour fournir un stockage sécurisé basé sur le matériel et pour protéger l'intégrité des ordinateurs modernes. Les communications avec le TPM se font par le biais de la TPM Software Stack (TSS), dont une implémentation populaire est la librairie tpm2-tss. Il est donc ainsi crucial d'assurer qu'aucune fuite de données sensibles ne puisse survenir dans la TSS lors de communications entre la plateforme hôte et le TPM. Plusieurs défis ont été rencontrés dans les travaux récents sur la vérification déductive de propriétés de sûreté et de propriétés fonctionnelles pour cette librairie. L'objectif de cette étude de cas est de se concentrer sur des propriétés de sécurité de haut niveau, et de proposer une approche de validation alternative de telles propriétés sur cette librairie, en utilisant la vérification à l'exécution. Nous décrivons l'approche proposée, l'appliquons pour spécifier et vérifier à l'exécution des propriétés de sécurité clés en utilisant la plateforme de vérification Frama-C, et nous présentons nos premiers résultats d'évaluation.

CUTECat: Concolic Execution for Computational Law (Pierre Goutagny - Inria Lille)

Many legal computations, including the amount of tax owed by a citizen, whether they are eligible to social benefits, or the wages due to civil state servants, are specified by computational laws. Their application, however, is performed by expert computer programs intended to faithfully transcribe the law into computer code. Bugs in these programs can lead to dramatic societal impact, e.g., paying employees incorrect amounts, or not awarding benefits to families in need. To address this issue, we consider concolic unit testing, a combination of concrete execution with SMT-based symbolic execution, and propose CUTECat, a concolic execution tool targeting implementations of computational laws. Such laws typically follow a pattern where a base case is later refined by many exceptions in following law articles, a pattern that can be formally modeled using default logic. We show how to handle default logic inside a concolic execution tool, and implement our approach in the context of Catala, a recent domain-specific language tailored to implement computational laws. We evaluate CUTECat on several programs, including the Catala implementation of the French housing benefits and Section 132 of the US tax code. We show that CUTECat can successfully generate hundreds of thousands of testcases covering all branches of these bodies of law. Through several heuristics, we improve CUTECat’s scalability and usability, making the testcases understandable by lawyers and programmers alike. We believe CUTECat thus paves the way for the use of formal methods during legislative processes.

Discovery of interaction models of distributed systems from their execution (Christian Joel Nguetoum - CEA)

Our purpose is to develop a method for discovering interaction models similar to UML Sequence Diagrams and Message Sequence Charts for distributed systems by observing their executions. Executions are collected as tuples of local traces, one per remote subsystem. A local trace is a sequence of emissions and receptions of messages logged at the interface of each subsystem. Our proposal encompasses three steps: the first looks to infer local specifications from local traces as regular expressions, the second translates regular expressions as local interaction models, and the third composes local interaction models to derive a global interaction. Regarding the first step, we consider on-the-shelf learning algorithms, providing different candidate regular expressions according to their underlying assumptions. The second is a straightforward translation. In the third step, we define a horizontal composition mechanism with different strategies. We begin with a brute-force approach that we call full horizontal composition, and we then optimize it by imposing constraints on the results. In particular, in the guided composition, we use a guideline to enforce the pairing of emissions and receptions occurring in different involved local interactions. We will present the initial investigations aiming to define this method of interaction model discovery.

Towards Efficient Verification of Parallel Applications with Mc SimGrid (Mathieu Laurent - Inria Rennes)

Assessing the correctness of distributed and parallel applications is notoriously difficult due to the complexity of the concurrent behaviors and the difficulty to reproduce bugs. In this context, Dynamic Partial Order Reduction (DPOR) techniques have proved successful in exploiting concurrency to verify applications without exploring all their behaviors. However, they may lack of efficiency when tracking non-systematic bugs of real size applications. In this paper, we propose two adaptations of the Optimal Dynamic Partial Order (ODPOR) algorithm with a particular focus on bug finding and explanation. The first algorithm proposes an out-of-order version called BeFS ODPOR which purpose is to avoid being stuck in uninteresting large parts of the state space. Once a bug is found, the second adaptation takes advantage of ODPOR principles to efficiently find the origins of the bug.

An Efficient and Uniform CSP Solution Generator Generator (Ghiles Ziat - EPITA)

Constraint-based random testing is a powerful technique which aims at generating random test cases that satisfy functional properties of a program. Its objective is to determine whether an object, a function for instance, will respect a given behavior according to an input. This approach requires firstly defining the property to satisfy, then secondly to provide a "generator of inputs" able to feed the program with test cases generated. In this paper, we are interested in the problem of manufacturing a uniform and efficient generator for the solutions of a CSP. In order to do that, we propose a specialized solving method that produces a well-suited representation for random sampling. Our solving method employs a dedicated propagation scheme based on the hypergraph representation of a CSP, and a custom split heuristic that emphasizes the interests of our propagation scheme. The generators we built are uniform by construction, iterative and auto-adaptative. We produced a prototype using the AbSolute constraint solving library and demonstrate its performances on several realistic examples.

Fast identification of network protocol states with greybox active automata learning - Ongoing work (Yohan Pipereau - Télécom SudParis)

Active automata learning (AAL) algorithms such as L* are used in various works to extract state machine models which describe the logic of a network protocol implementation. However, AAL algorithms used in a black box setting have an implicit tradeoff between exhaustiveness and learning duration. Concretely, AAL either takes multiple days or weeks to extract some protocol models or fail to identify some states. In this presentation, we present our proposal to extend active automata learning algorithm with the ability to read/write the memory of the network protocol implementation for fast and exhaustive learning of automata.

Informations pratiques

Les présentation auront lieu dans l'Amphi 0, au rez-de-chaussée du bâtiment Voltaire. Voir le plan local.

Pour se rendre à l'Epita, les informations sont disponibles sur le site de l'Epita.

Organisateurs

Les animateurs du groupe MTV2 : Nikolai Kosmatov, Natalia Kushik, Pascale Le Gall, Antoine Rollet

Les organisateurs locaux : Amazigh Amrane, Daniela Becker, Alexandre Duret-Lutz, Uli Fahrenberg et Ghiles Ziat

Page updated

Google Sites

Report abuse