Who?
Name: Arnaud Legout
Mail: arnaud.legout@inria.fr
Web page: https://www-sop.inria.fr/members/Arnaud.Legout/
Where?
Place of the project: DIANA team, Inria, Sophia Antipolis
Address: 2004 route des Lucioles
Team: DIANA
Web page: https://team.inria.fr/diana/
Pre-requisites if any: Python and web development, basic knowledge on how LLM work,
open to trans-disciplinary works, eagerness to work at the edge of computer science and
social psychology
Description:
Large Language Models (LLM) have been a revolution in AI in the past two years.
In particular, copilots (such as GitHub copilot) are used to assist humans by predicting
what they want to do next. However, little is known on how a copilot interact
with the cognitive process. In particular, is it possible for a copilot to influence
and even change the mind of the assisted human?
The goal of this Internship is to design, run, and analyze experiments on the
impact of a copilot on the cognitive process. The design task will be to refine
the research question we want answer and create an experiment that allows
to answer that question. For instance, assume you want to show that a copilot
can influence basic calculation tasks, we might design an experiment where
the copilot answers at random an incorrect result. The run tasks consists in
conducting the experiment with real persons that we will recruit among the
population of the students of the faculties involved in the project. The analyze
tasks consists in making a statistical evaluation of the results using resampling
techniques such as permutation tests and bootstrap confidence intervals.
During this internship, you will learn how to define a research question,
how run a research experiment, and how to make the statistical interpretation
of an experiment.
This internship might be followed for excellent and motivated students by a Ph.D. thesis.
Useful Information/Bibliography:
Erik Jones and Jacob Steinhardt. “Capturing failures of large language models via human cognitive biases”. In:
Advances in Neural Information Processing Systems 35 (2022), pp. 11785–11799.
Enkelejda Kasneci et al. “ChatGPT for good? On opportunities and challenges of large language models for
education”. In: Learning and individual differences 103 (2023), p. 102274.
Celeste Kidd and Abeba Birhane. “How AI can distort human beliefs”. In: Science 380.6651 (2023), pp. 1222–1223
Bill Thompson and Thomas L Griffiths. “Human biases limit cumulative innovation”. In: Proceedings of the Royal
Society B 288.1946 (2021), p. 20202752.
Canyu Chen and Kai Shu. “Can LLM-Generated Misinformation Be Detected?” In: arXiv preprint arXiv:2309.13788
(2023).
Who?
Name: Arnaud Legout
Mail: arnaud.legout@inria.fr
Web page: https://www-sop.inria.fr/members/Arnaud.Legout/
Name: Damien Saucez
Mail: damien.saucez@inria.fr
Web page: https://team.inria.fr/diana/team-members/damien-saucez/
Where?
Place of the project: DIANA team, Inria, Sophia Antipolis
Address: 2004 route des Lucioles
Team: DIANA
Web page: https://team.inria.fr/diana/
Pre-requisites if any: Familiar with Python, Linux, basic system performance knowledge, highly
motivated to work in a research environment and exited to tackle hard problems.
Description:
Making datascience is not only a programming or machine learning issue, it is also
a system issue for most practical use cases. One complex case is to make the best
use of the available RAM.
Making computations requiring a large amount of memory that exceeds the available RAM
require the OS the swap memory pages on disk. Even in case your process does not exceed the
available RAM, the Linux memory management swap memory pages proactively.
We observed that under certain circumstances, the swap process dramatically
reduce the performance and reach a pathological behavior in which retrieving pages
from the swap becomes much slower than the disk speed.
The goal of this internship is to understand and document the current Linux memory management,
how it interacts with the Python interpreter, and how to reproduce the
circumstances under which we enter a pathological behavior. Ultimately, you can propose a patch
to the Python interpreter to solve or work around the issue.
This internship requires a good understanding of the Linux operating system and
a good knowledge of C and Python. It will be mandatory to look at Linux
and Python source code (written in C) to understand the details and the undocumented
behavior.
During this internship, you will learn system operational techniques and optimizations used
for datascience on large and challenging datasets.
Excellent student will have the possibility to continue for a Ph.D. thesis.
Useful Information/Bibliography:
What every programmer should know about memory
https://lwn.net/Articles/250967/
Python memory management:
https://docs.python.org/3/c-api/memory.html
Systems Performance
Brendan Gregg
Name: Frédéric Giroire et Davide Ferré
Mail: frederic.giroire@inria.fr
Web page: https://www-sop.inria.fr/members/Frederic.Giroire/
Place of the project:
Address: Inria, 2004 route de Lucioles, SOPHIA ANTIPOLIS
Team: COATI (common project Inria/I3S)
Web page: https://team.inria.fr/coati/
Pre-requisites:
Knowledge in networking and machine learning.
Python.
Description:
The exponential advances in Machine Learning (ML) are leading to the deployment of Machine Learning models in constrained and embedded devices, to solve complex inference tasks. At the moment, to serve these tasks, there exist two main solutions: run the model on the end device, or send the request to a remote server. However, these solutions may not suit all the possible scenarios in terms of accuracy or inference time, requiring alternative solutions.
Cascade inference is an important technique for performing real-time and accurate inference given limited computing resources such as MEC servers. It combines more than two models to perform inference: a highly-accurate but expensive model with a low-accuracy but fast model, and determines whether the expensive model should make a prediction or not based on the confidence score of the fast model. A large pool of works exploited this solution. The first ones to propose a sequential combination of models were [1] for face detection tasks, then, in the context of deep learning, cascades have been applied in numerous tasks [2,3].
Early Exit Networks take advantage of the fact that not all input samples are equally difficult to process, and thus invest a variable amount of computation based on the difficulty of the input and the prediction confidence of the Deep Neural Network [5]. Specifically, early-exit networks consist of a backbone architecture with additional exit heads (or classifiers) along its depth. At inference time, when a sample propagates through the through the network, it passes through the backbone and each of the exits in and the result that satisfies a predetermined criterion (exit policy) is (exit policy) is returned as the prediction output, bypassing the rest of the the rest of the model. In fact, the exit policy can also reflect the capabilities and load of the target device, and dynamically adapt the network to meet specific runtime requirements [6].
Our project is to use cascade models and/or early-exit models in the context of Edge Computing to improve the delay and reduce the resource usage of ML inference tasks at the edge. Of crucial importance for cascade models or early-exit models, is the confidence of the fast model. Indeed, if the prediction of the first model is used but wrong, it may lead to a low accuracy of the cascade model, even if the accuracy of the best model is very high. Similarly, if the first model confidence is set too low, it will never be used, and the computations will be higher than using only the second model by itself, additionally, we will use unnecessary network resources and have higher deals than necessary. Researchers have proposed methods to calibrate such systems [4]. However, they have not explored the choice of the loss function of such systems in depth.
In this project, we will explore the use of a new loss function for the fast models (or first exit) of cascade networks (of early-exit models). Indeed, such networks do not have the same goal as the global system, as they should only act as a first filter.
Useful Information:
The internship can be followed by a PhD for interested students. A PhD grant is already funded on the topic.
Bibliography:
[1] Viola, P., & Jones, M. (2001, December). Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001 (Vol. 1, pp. I-I). Ieee.
[2] Wang, X., Kondratyuk, D., Christiansen, E., Kitani, K. M., Alon, Y., & Eban, E. (2020). Wisdom of committees: An overlooked approach to faster and more accurate models. arXiv preprint arXiv:2012.01988.
[3] Wang, X., Luo, Y., Crankshaw, D., Tumanov, A., Yu, F., & Gonzalez, J. E. (2017). Idk cascades: Fast deep learning by learning not to overthink. arXiv preprint arXiv:1706.00885.
[4] Enomoro, S., & Eda, T. (2021, May). Learning to cascade: Confidence calibration for improving the accuracy and computational cost of cascade inference systems. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 8, pp. 7331-7339).
[5] Laskaridis, S., Kouris, A., & Lane, N. D. (2021, June). Adaptive inference through early-exit networks: Design, challenges and directions. In Proceedings of the 5th International Workshop on Embedded and Mobile Deep Learning (pp. 1-6).
[6] Laskaridis, S., Venieris, S. I., Almeida, M., Leontiadis, I., & Lane, N. D. (2020, September). SPINN: synergistic progressive inference of neural networks over device and cloud. In Proceedings of the 26th annual international conference on mobile computing and networking (pp. 1-15).
Name: Nicolas Nisse et Frédéric Giroire
Mail: nicolas.nisse@inria.fr
Web page: http://www-sop.inria.fr/members/Nicolas.Nisse/
Place of the project:
Address: Inria, 2004 route de Lucioles, SOPHIA ANTIPOLIS
Team: COATI (common project Inria/I3S)
Web page: https://team.inria.fr/coati/
This project is part of our study of the evolution of researchers' productivity and collaborations, and how they are affected by funding, whether national or international. In this context, we have collected all the publications (with at least one French author) in SCOPUS. We have developed various algorithms and metrics to assess the proximity between researchers, using the journals and conferences in which they publish. The data has already been greatly consolidated, but there are still some ambiguities (notably due to the names of the journals and conferences). In addition, the various measures of proximity need to be evaluated and compared so that an online tool can be offered, enabling researchers to situate themselves in relation to disciplines and/or their colleagues.
This work will be supervised by Frédéric Giroire (CNRS/I3S) and Nicolas Nisse (Inria/I3S) and is in collaboration with Michele Pezzoni (UniCA/GREDEG).
More specifically, the student(s) will have to carry out the following tasks:
• Online tool for estimating the distance between two researchers
◦ 2D projection of vectors representing researchers in publication space.
◦ Options: main publications, publication period
◦ Consolidation of the list of conferences and journals for all disciplines
Tool: Design of an AI model for classifying conference names.
• Exploration of the main components or clusters of a researcher's publications.
• Study the relationship / correlation between the different metrics.
Expected skills :
• Python programming, web programming
• Data science/data analysis
• Skills in AI models would be a plus.
Advisors: Frédéric Giroire and Stéphane Pérennes
Emails: frederic.giroire@inria.fr , stephane.perennes@inria.fr
Web Site:
http://www-sop.inria.fr/members/Frederic.Giroire/
Laboratory: COATI project - INRIA (2004, route des Lucioles – Sophia Antipolis) and the
startup Hive (https://www.hivenet.com/)
Web page: https://team.inria.fr/coati/
Pre-requisites:
Knowledge in networking and probability and graph theory.
Programming in python, C/C++ or java.
Description:
The internship will be done in collaboration with the startup Hive ( https://www.hivenet.com/ )
and may be followed by a internship for interested students.
Large scale peer-to-peer systems are foreseen as a way to provide highly reliable data
storage at low cost. To ensure high durability and high resilience over a long period of time the
system must add redundancy to the original data. It is well-known that erasure coding is a
space efficient solution to obtain a high degree of fault-tolerance by distributing encoded
fragments into different peers of the network. Therefore, a repair mechanism needs to cope
with the dynamic and unreliable behavior of peers by continuously reconstructing the missing
redundancy. Consequently, the system depends on many parameters that need to be well
tuned, such as the redundancy factor, the placement policies, and the frequency of data repair.
These parameters impact the amount of resources, such as the bandwidth usage and the
storage space overhead that are required to achieve a desired level of reliability, i.e.,
probability of losing data.
In this project, we will compare different repair policies and erasure codes. Indeed, some
erasure codes (maximum distance separable (MDA) codes) such as Reed Solomon [41] have
been shown to be optimal in terms of reception efficiency, i.e. the number of chunks required
for reconstructing a lost chunk in our context. This means that they have an optimal storage
space usage for a given number of tolerated failures for a distributed storage system.
However, they are not very efficient in terms of bandwidth usage when a reconstruction has to
be done. Indeed, the original data has to be fully reconstructed when a small chunk of data is
lost to keep the redundancy of the system. Now, this operation happens constantly as disk
failures are frequently happening in large distributed systems and peers may leave the
system. As bandwidth is a crucial resource in distributed systems, alternative repair policies
such as lazy reconstruction [2] or new codes such as hybrid codes [3], Hierarchical Codes [4],
and regenerating codes [1] have been proposed to decrease the bandwidth used for repair.
The latter are near optimal in terms of bandwidth usage. However, this comes at a cost of
much higher computational cost [5]. We will thus explore which codes present the best trade
off between storage space, bandwidth usage, computational cost, number of tolerated failures
and mean time to failure, data availability, and download speed.
References.
[1] Papailiopoulos, D. S., Luo, J., Dimakis, A. G., Huang, C., & Li, J. (2012, March). Simple
regenerating codes: Network coding for cloud storage. In 2012 Proceedings IEEE INFOCOM (pp. 2801-2805). IEEE.
[2] Giroire, F., Monteiro, J., & Pérennes, S. (2010, December). Peer-to-peer storage systems:
a practical guideline to be lazy. In 2010 IEEE Global Telecommunications Conference
GLOBECOM 2010 (pp. 1-6). IEEE.
[3] Rodrigues, R., & Liskov, B. (2005, February). High availability in DHTs: Erasure coding vs.
replication. In International Workshop on Peer-to-Peer Systems (pp. 226-239). Springer,
Berlin, Heidelberg.
[4] Duminuco, A., & Biersack, E. (2008, September). Hierarchical codes: How to make erasure
codes attractive for peer-to-peer storage systems. In 2008 Eighth International Conference on
Peer-to-Peer Computing (pp. 89-98). IEEE.
[5] Duminuco, A., & Biersack, E. (2009, June). A practical study of regenerating codes for
peer-to-peer backup systems. In 2009 29th IEEE International Conference on Distributed
Computing Systems (pp. 376-384). IEEE
Name: Sid Touati
Mail: sid.touati@inria.fr
Place of the project:
Address: INRIA Lagrange
Team: Kairos Team-Project
Web page: https://team.inria.fr/kairos/
• Supervisors: Joanna Moulierac (COATI) ; Alexandre Guitton (LIMOS, Université Clermont-
Auvergne)
• Mail: joanna.moulierac@inria.fr, alexandre.guitton@uca.fr
• Place of the project: Inria Sophia Antipolis
• Address: COATI project-team, 2004 route des Lucioles
• Web pages:
http://www-sop.inria.fr/members/Joanna.Moulierac/ ; https://perso.isima.fr/~alguitto/
Description:
In recent years, Mobile Edge Computing (MEC) has emerged as a promising solution for enhancing
energy efficiency in network applications. As a matter of fact, by offloading computational tasks to a
server deployed near the base station and by caching both the outputs and the related code for completed
tasks, it is possible to effectively reduce the energy consumption of mobile devices while ensuring
adherence to their latency constraints. Although there have been some previous works on task caching
and task offloading on the cloud, most of them focus on only one of these two strategies or formulate
optimization problems that are hard to solve and propose suboptimal solutions.
In this internship, we will study a linear model for the joint task caching and offloading optimization problem.
The idea is to focus on this proposed model [1] where the nearby and grid powered computing and
storage facility can be used in two different manners. First, the mobile applications can decide to offload
part of their computation tasks, in terms of code and data to the MEC server. This is formally called task
offloading. Second, the control plane of the applications, likely deployed in the central cloud, can decide
to cache some tasks in advance. This is called task caching, and aims at storing popular or
computationally intensive tasks.
We will first propose to analyze this model, and then we will try to extend it by proposing a more
complete model. As an example, one extension will be to include the energy cost of the cloud (which is
not present in the initial model) for the transmission, the storing and the computation of the model. Other
interesting extensions will be studied, and a performance evaluation will be proposed in order to evaluate
the efficiency of the proposed solution.
The internship can thus be followed by a PhD for a motivated student. The student
should have a taste for networking and optimization problems, algorithmic and linear programming.
[1] Y. Hao, M. Chen, L. Hu, M. S. Hossain, and A. Ghoneim, “Energy efficient task caching and
offloading for mobile edge computing,” IEEE Access, 2018.
Who?
Name: Chadi Barakat, Yassine Hadjadj-Aoul, Sanaa Ghandi
Mail: Chadi.Barakat@inria.fr,yassine.hadjadj-aoul@irisa.fr
Web page:https://team.inria.fr/diana/chadi/ https://people.irisa.fr/Yassine.Hadjadj-Aoul/
Where?
Place of the project: Inria centre at Université Côte d'Azur
Address: 2004, Route des Lucioles, 06902 Sophia Antipolis, France
Team: Diana team
Web page: https://team.inria.fr/diana/
Pre-requisites if any: Network programming and data analytics skills. Knowledge in video streaming and Machine Learning.
Description:
Video streaming is key service in the Internet today accounting by itself for more than half of the global Internet traffic. Generally, video streaming runs in a smooth manner thanks to the different protocols involved, mainly the DASH protocol. However, when the experience with video streaming degrades, either in the form of stalls or resolution switches, users are frustrated especially that most often they don't know what exactly causes this degradation (is it a saturation of the WiFi for example? a bad signal to noise ratio? a slow access to the Internet caused by a slow data plane? or any other cause?). Troubleshooting the network in this case is doable, but requires the deployment of specialized monitoring tools and techniques that not anyone is expert in (speedtest, Wireshark, WiFi Analyzer, etc). In this project, we will explore another approach based on collected data from within the browser (or the application running video streaming) and its use to infer the network performance and the origin of the anomaly. The data we are seeking for is about chunks and their resolutions, but also any other data that can be available within the browser such as CPU load or web level measurement on page rendering that can be provided by the Performance API of the browser. Usually data on the network is not available to the user (or hard to be obtained by non experts), that's why we will try to infer it by solely relying on whatever data is available within the browser about the video streaming experience. By running extensive experiments locally with real players and real videos and in artificial and variant network conditions, we will collect both types of data, network level and video streaming level, then with the help of supervised machine learning we will bridge the gap between the both sets and propose models that can (i) infer the network performance from the video streaming experience, (ii) detect video streaming anomalies, and (iii) classify the origin of these anomalies.
We have developed this methodology in the past for web browsing (see references below) and have produced several models to infer network performance and classify network anomalies, with very good results about the capacity of these models. The objective in this internship is to build upon this prior work and extend it to video streaming. Video streaming brings a new type of network traffic (greedier and lasting longer in time) and has different interaction with network performance (because of the DASH protocol for example), so we expect our results and models not to work in this new context. Our aim is to develop new network troubleshooting models for video streaming and compare their performance to the ones of web browsing, with hopefully better performance for video streaming traffic given the appealing characteristics of its traffic. In this internship, the student will start exploring this topic by reviewing the literature on video streaming monitoring and troubleshooting, and getting familiar with our prior work on web browsing. Next, the candidate will move to adapt our experimental testbed to this new scenario of experimentation with the integration of video streaming traffic into the testbed (server and player) and tools to measure the QoE of video streaming and collect measurements about its performance from within the browser. Then the student will define network scenarios that can impact the quality of video streaming, and run experiments in presence of these scenarios, collecting each time ground truth data about the network and the rendering of the video on the screen of the end user. We will make sure our scenarios are general enough to represent different network conditions and different types of videos to that users might request. With this data, the student will then calibrate machine learning models able to predict the network performance from the video streaming experience and study the accuracy and robustness of these models. Along this work, we will compare our results to the performance that web browsing provides in terms of network troubleshooting and checks if indeed video streaming brings new information from this regard. Our hope is to be able by the end of the internship to propose to end users a web extension that can quickly and in real time provides them with estimations about the performance of their network and hints on the origin of performance degradation when it occurs.
This work will be pursued in a PhD for motivated and skilled students and if funding opportunities exist.
Useful Information/Bibliography:
[1] Naomi Kirimi, Chadi Barakat, Yassine Hadjadj-Aoul, “Passive network monitoring and troubleshooting from within the browser: a data-driven approach“, in proceedings of the 20th International Wireless Communications & Mobile Computing Conference (IWCMC), Multimedia Symposium, Cyprus, May 2024.
[2] Imane Taibi, Yassine Hadjadj-Aoul, Chadi Barakat, “Data Driven Network Performance Inference From Within The Browser“, in proceedings of the 12th IEEE Workshop on Performance Evaluation of Communications in Distributed Systems and Web based Service Architectures (PEDISWESA), Rennes, July 2020.
[3] Muhammad Jawad Khokhar, Thibaut Ehlinger, Chadi Barakat, “From Network Traffic Measurements to QoE for Internet Video“, in proceedings of IFIP Networking, Warsaw, Poland, May 2019.
Who?
Name: Thierry Turletti and Chadi Barakat and Walid Dabbous
Mail: Thierry.Turletti@inria.fr andChadi.Barakat@inria.fr andWalid.Dabbous@inria.fr
Telephone: 04 92 38 77 77
Web page: https://team.inria.fr/diana/
Where?
Place of the project: Diana Project-Team, Inria centre at Université Côte d'Azur
Address: 2004, route des lucioles, 06902 Sophia Antipolis, France
Team: Diana team
Web page: https://team.inria.fr/diana/
What?
Pre-requisites if any: Strong programming skills, scripting, DevOps. Knowledge of Kubernetes is a plus. Knowledge in network protocols, cellular networks, data analytics.
Detailed description:
Nowadays, Mobile networks (5G and beyond) are witnessing a revolution with the increase in the bitrate, the densification of the wireless cells, and the advent of virtualization and softwarization allowing to deploy network functions and services in data centres, most of them placed at the edge of the network. The emergence of new services and applications (e.g., AR/VR, drones, autonomous vehicles) pose serious service quality requirements on the network. Network slicing was introduced to help operators create dedicated, virtualized, and isolated logical networks on a general physical network to meet the differentiated network capacity requirements of customers. An efficient network management that takes into account the quality of experience of end users does not only depend on simple network metrics such as the delay, or physical proximity, but rather on a complex set of metrics such as the bitrate in both directions, the jitter, the packet loss rate, the context of mobility, the device properties, etc. The monitoring and consideration of all these metrics by the network in an accurate and timely way represents a real challenge. Further, and given the large number of devices foreseen at the edge and their mobility and time dynamics, any management plane for the network has to be of low cost, able to scale with the number of users, devices and services, and must track the whole system in an efficient manner. The network should be able by the end to meet the requirements of the end users and to exploit to the maximum the available capacity in the underlying infrastructure.
In this internship, the student will learn how to deploy and experiment with a 5G sliced network (both core network and radio access network parts) on a real testbed composed of 5G radio hardware located on the SophiaNode platform including the R2lab anechoic chamber and a Kubernetes cluster. In particular, she/he will overview the literature for models for the Quality of Service (QoS) promises of sliced wireless networks; learn how to use the R2lab testbed to deploy scenarios; get familiar with the 5G OpenAirInterface (OAI) software; propose and execute a set of scenarios aiming to evaluate the slicing performance in presence of different QoS requirements and traffic load, and assess its QoS capacity by analyzing key metrics previously identified. The final goal is to evaluate the extent to which the platform can satisfy the QoS needs of different applications and possibly hints for improvements.
This internship is proposed in the context of the SLICES European project and the national Priority Research Programme and Equipment (PEPR) on 5G, 6G and Networks of the Future.
References: set of bibliographical references (article, books, white papers, etc) to be read by the student before starting to work on this subject
- Slicing 5G Blueprint, documentation available at https://doc.slices-sc.eu/blueprint/
- R2lab anechoic chamber,https://r2lab.inria.fr/
- OpenAirInterface,https://openairinterface.org/
- SLICES-RI project,https://gitlab.inria.fr/slices-ri
- PEPR on 5G, 6G and Networks of the Future, https://pepr-futurenetworks.fr/en/home/
- M. Lahsini, T. Parmentelat, T. Turletti, W. Dabbous and D. Saucez, "Sophia-node: A Cloud-Native Mobile Network Testbed," 2022 IEEE Conference on Network Function Virtualization and Software Defined Networks (NFV-SDN), Phoenix, AZ, USA, 2022.https://inria.hal.science/hal-03907719v1/
- Arora, Sagar. Cloud Native Network Slice Orchestration in 5G and Beyond. Diss. Sorbonne Université, 2023.https://theses.hal.science/tel-04269161/
Who?
Name: Christelle CAILLOUET, Franck ROUSSEAU
Mail: christelle.caillouet@univ-cotedazur.fr, franck.rousseau@imag.fr
Telephone: 0492387929
Web page: http://www-sop.inria.fr/members/Christelle.Molle-Caillouet/
https://lig-membres.imag.fr/rousseau/
Where?
Place of the project: Inria
Address: 2004 Route des Lucioles, 06902 Sophia Antipolis
Team: Coati
Web page: https://team.inria.fr/coati/
What?
Pre-requisites if any: Algorithms for Telecommunications
Operational research, linear programming
Strong programming skills: python, scripting, Java, C, etc.
Detailed description:
Low Power Wide Area Networks (LPWANs) are used in many Internet of Things (IoT) applications. Low Power Wide Area Network (LoRaWAN) is a well-known LPWAN protocol, offering wide coverage with low power consumption and low data throughput. LoRaWAN is a data link layer protocol developed by the LoRa Alliance to provide long-range connectivity with low power consumption. This protocol uses the long-range physical layer (LoRa) developed by Semtech and based on the Chirp Spread Spectrum (CSS) modulation technique.
The default topology of the LoRaWAN network is a star of stars, where users transmit directly to gateways. The network consists of end nodes, gateways and a LoRaWAN network server. The gateway receives packets from the end nodes and forwards them to the network server. The network server manages the LoRaWAN network, including verifying end node addresses, acknowledgements, frame counts…
Numerous studies have shown that a LoRaWAN network can achieve coverage of several kilometers in open areas and rural environments. However, performance decreases in areas with obstacles, such as buildings or mountains. Dense networks with many users also degrade performance due to interference and packet collisions [1].
The LoRa Alliance has therefore recently proposed specifications for introducing relay nodes into LoRaWAN networks [2] and several works in the literature have also examined this development [3][4].
The aim of this study is to optimize the use of relay nodes in LoRaWAN networks, taking care to define a good mathematical model integrating :
* the characteristics defined in the LoRa Alliance specifications on the use of these relays,
* physical analytical models of propagation and interference in these networks.
The results will be used to assess the number of relays required, depending on the number of network users, their locations, and the cost and energy savings of their deployment.
References:
C. Caillouet, M. Heusse, F. Rousseau, Optimal SF Allocation in LoRaWAN Considering Physical Capture and Imperfect Orthogonality, IEEE Globecom 2019.
https://resources.lora-alliance.org/technical-specifications/ts011-1-0-0-relay
E. Lumet, A. Le Floch, R. Kacimi, M. Lihoreau, A.L. Beylot, LoRaWAN relaying: Push the cell boundaries. ACM MSWiM 2021.
J.R. Cotrim, J.H. Kleinschmidt, An analytical model for multihop LoRaWAN networks. Elsevier Internet of Things, Volume 22, 100807, July 2023.
Who?
Name: MALLET Frédéric (and DE SIMONE Robert)
Mail: Frederic.Mallet@inria.fr
Telephone:
Web page: https://www-sop.inria.fr/members/Frederic.Mallet/
Where?
Place of the project:
Address: INRIA Lagrange
Team: Kairos Team-Project
Web page: https://team.inria.fr/kairos/
What?
Pre-requisites if any: First-order Logics, Temporal Logics
Detailed description: indicate the context of the work, what is
expected from the intern, what will be the outcome (software,
publication, ...).
The work is done in the context of the HAL4SDV European Project that intends to develop safe methods and temporal requirements for Software-Defined Vehicles. CCSL is a constraint language based on logical clocks that can be used to specify temporal and timed contracts. In the context of the European Project HAL4SDV we would like to use CCSL and its recent extension (called RT-CCSL) to express contracts for autonomous vehicles. This is particularly tricky as autonomous vehicles evolve in a very uncertain environment where it is difficult to predict what can happen. Also autonomous vehicles use more and more AI-Based components the behaviour of which may be difficult to predict or explain (in the corner cases : best and worst), but when safety is at stake, these corner cases cannot be left unattended. During the internship we expect to define a formal domain-specific language able to capture temporal and timed scenarios and contracts for vehicles. It is also expect to define tools to reason about that language combining Boolean methods, Temporal Logics, Logical Time, Polyhedral models
References: set of bibliographical references (article, books, white papers, etc) to be read by the student before starting to work on this subject
https://timesquare.inria.fr: gives a list of useful articles about CCSL.
https://publications.pages.asam.net/standards/ASAM_OpenSCENARIO/ASAM_OpenSCENARIO_DSL/latest/index.html
Maksym Labzhaniia, Julien Deantoni, Marie-Agnès Peraldi-Frati, Frédéric Mallet:
Spatio-Temporal Framework for Verifying Safety Rules in Autonomous Vehicles. MoDELS (Companion) 2024: 700-709
Adrien Champion, Arie Gurfinkel, Temesghen Kahsai, Cesare Tinelli:
CoCoSpec: A Mode-Aware Contract Language for Reactive Systems. SEFM 2016: 347-366
Chaymae El Jabri, Marc Frappier, Thibaud Ecarot, Pierre-Martin Tardif:
Development of Monitoring Systems for Anomaly Detection Using ASTD Specifications. TASE 2022: 274-289
Chaymae El Jabri, Marc Frappier, Thibaud Ecarot, Pierre-Martin Tardif:
Development of monitoring systems for anomaly detection using ASTD specifications. CoRR abs/2207.11134 (2022)
Fabien Siron, Dumitru Potop-Butucaru, Robert de Simone, Damien Chabrol, Amira Methni:
Semantics foundations of PsyC based on synchronous Logical Execution Time. CPS-IoT Week Workshops 2023: 319-324
Xiaohong Chen, Zhi Jin, Min Zhang, Frédéric Mallet, Xiaoshan Liu, Tingliang Zhou:
A Scalable Approach to Detecting Safety Requirements Inconsistencies for Railway Systems. IEEE Trans. Intell. Transp. Syst. 25(8): 8375-8386 (2024)
1. Project Leaders
• Dino Lopez Pacheco <dino.lopez@univ-cotedazur.fr> - SigNet Team
• Fabrice Huet <fabrice.huet@univ-cotedazur.fr > - SCALE Team
Subject (pdf)
Who?
Name: Françoise Baude & Fabrice Huet
Mail: francoise.baude@univ-cotedazur.fr, fabrice.huet@univ-cotedazur.fr
Telephone:
Web page: https://scale.i3s.unice.fr/
Where?
Place of the project: I3S laboratory
Address:
Team: Scale
Web page:https://scale.i3s.unice.fr/
Distributed event queues have become a central component in constructing large-scale and real-time cloud applications. They are currently employed in various latency-sensitive cloud applications, such as recording and analyzing web accesses for recommendations and ad placement, health care monitoring, fraud detection, smart grids, and intelligent transportation.
A distributed event queue comprises several partitions or sub-queues deployed across a cluster of servers. Applications (Event Consumers) that pull and process events from distributed queues are latency-sensitive. They necessitate a high percentile of events to be processed within a desired latency. Overprovisioning resources to meet this latency requirement is suboptimal, as it incurs substantial monetary costs for the service provider. Therefore, designing solutions for resource-efficient and latency-aware event consumers from distributed event queues is crucial. Such an architecture should dynamically provision and deprovision resources (event consumer replicas) to minimize resource usage while ensuring the required service level agreement (SLA).
To achieve this objective, we have framed the problem of autoscaling event consumers from distributed event queues to meet a desired latency as a bin pack problem. This bin pack problem is dependent on the arrival rate of events into queues, the number of events in the queues backlog, and the maximum consumption rate of the event consumers. We have validated our approach through extensive experiments where a service dynamically scales based on the input load.
The aim of this internship is to expand this work to multi-service scenarios. Specifically, we aim to investigate how to dynamically scale a Directed Acyclic Graph (DAG) of microservices while maintaining the SLA. In this context, any scaling action on an upstream microservice will have repercussions downstream. The intern will be responsible for the following tasks:
- Conduct a comprehensive review of existing literature on scaling DAGs of services, focusing on comparing approaches for RPC based against event queue based interconnections between (micro)services in such DAGs.
- Propose modifications to our current algorithms to incorporate the complexities of a DAG.
- Execute thorough experiments to validate the proposed changes.
Pre-requisites if any:
- Docker
- Java
References:
Ezzeddine, M., Baude, F., Huet, F.: Tail-latency aware and resource-efficient bin
pack autoscaling for distributed event queues. In: CLOSER 14th International
Conference on Cloud Computing and Services Science (2024)
G. Shapira , T. Palino, R. Sivaram and K. Petty. Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale, second edition, O’Reilly Media, Inc., 2021.
Joint Deployment and Request Routing for
Microservice Call Graphs in Data Centers, Yi Hu, Hao Wang, Liangyuan Wang, Menglan Hu, Kai Peng,
and Bharadwaj Veeravalli, IEEE transactions on Parallel and distributed systems, Nov 2023
Who?
Name: Liquori
Mail: Luigi.Liquori@inria.fr
Telephone:
Web page:
Where?
Place of the project: Inria
Address:
Team: kairos
Web page:
What?
Pre-requisites if any: The ideal candidate should have a solid knowledge of DLT, Blockchain, Programming smart contracts, Virtual machines & Compilers design, and Decentralized Finance. Some knowledge of legal issues in using digital assets could be beneficial.
Bitcoin [1], Ethereum [2], just to mention two, increase popularity of blockchains and decentralized distributed ledgers brought two main innovations: 1) the capability of building decentralized peer-to-peer ledgers that record and store ordered transactions without the need of trusted and centralized third parties and 2) the introduction of complex applications involving having digital assets being directly controlled by a piece of code implementing arbitrary rules, known as Smart Contracts [2].
The panorama of digital currencies is growing rapidly, as illustrated by the following image (courtesy of Bank for International Settlements, 2017).
Also, if the Bitcoin protocol implements a first concept of “Smart Contract” [3], the terms got popular with Ethereum and with its implementation of Smart Contract. The Ethereum Virtual Machine, that processes and executes Smart Contracts, made possible the creation of Decentralized Applications and
extended the blockchain capabilities that were considered only as digital cash system. Distributed Applications, that doesn’t require a middleman to function, opens numerous opportunities: automatic settlement, treasury applications, voting systems and many others.
While public blockchains were getting more and more popular (and numerous) and new platforms were proposing their implementation of Smart Contracts, much of the interest in the blockchain space was going towards the use of the blockchain technology for the enterprise world. We saw in the recent years the development of different platforms that focuses on the so-called private (or permissioned) blockchain(s) and digital ledgers. Some of the permissioned blockchains platforms are fork of public blockchain adapted to the enterprise needs (e.g. Quorum) while others have been designed and built from scratch (e.g. Corda, Digital Asset, Hyperledger Fabric, etc.). Almost the totality of private blockchain(s) present their own implementation of Smart Contact: for example, DAML [4] supported by Digital Asset, Chaincode [5] in Fabric, Kotlin/Java [6] supported by Corda and many others.
Between public and private blockchains we are observing a wide variety of different languages with different capabilities and limitations. Both public and private blockchain often lack maturity and a formal semantic as they have been under pressure of the sudden and rapid explosion of Blockchain popularity.
drive.google.com/file/d/1W7SwtKVRxD8_A47ocCXiaMiK6BDCZK3S/view?usp=drive_link
While the blockchain industry has seen remarkable growth, standardized protocols and formal semantics remain relatively few but are steadily increasing. Notable initiatives, such as the ETSI ISG PDL standards (such as PDL- 004, PDL-011 and PDL-018)[12], are leading efforts to establish guidelines for smart contract development and enhance blockchain interoperability. As Europe demonstrates a growing interest in blockchain-related projects and research, initiatives like Digital Euro[13] and European Digital Identity[14] underscore the importance of accelerating standardization efforts to ensure the seamless integration of blockchain technology into various sectors. Important to note that Digital identity is a critical asset in the digital landscape, yet it poses significant privacy challenges under GDPR regulations.
Research Objectives
The Master student will first focus his research on studying, understanding and assessing the state of the art of Smart Contract Languages (SCL) and existing digital currency, e.g. BitCoin, Ethereum, Stable coins, like USD Tether, e-RMBand and the recent UE proposal of Digital Euro. The candidate will focus on building a multi- dimensional framework to understand and classify SCLs landscape considering public and private blockchains; Smart Contracts should be evaluated considering their context and their efficacy by taking in account a wide range of parameters, e.g. terminology, automation, enforceability, semantics, typing, inheritance, security, scalability, formal verifiability, extensibility, Turing completeness, etc. [7] [8] [9] [10] [11].
In the second phase, the candidate will focus on proposing and building safe and expressive extensions of existing SCLs and/or proposing a new SCL with its formal semantics and execution environment to overcome limitations and to extend the current capabilities of existing languages [15][16]. As well the focus could also be oriented to experiment new programming languages features and paradigms that has not been fully applied before to Smart Contracts, e.g. typed vs. untyped, compiled vs. interpreted, object-oriented vs. functional, etc. The focus on that features should be on permissioned and private blockchains.
To apply: send a CV, a motivations letter and some references to Luigi Liquori, Luigi.Liquori@inria.fr.
References: set of bibliographical references (article, books, white papers, etc) to be read by the student before starting to work on this subject
[1] S. Nakamoto, "Bitcoin: A peer-to-peer electronic cash system," [Online]. Available: https://bitcoin.org/bitcoin.pdf. [Accessed 6 December 2018].
[2] V. Buterin, "Ethereum Whitepaper: A Next-Generation Smart Contract and Decentralized Application Platform," [Online]. Available: https://github.com/ethereum/wiki/wiki/White- Paper/f18902f4e7fb21dc92b37e8a0963eec4b3f4793a.
[3] "Bitcoin Contract," [Online]. Available: https://en.bitcoin.it/wiki/Contract.
[4] "The Digital Asset Platform: Non-Technical White Paper," Digital Asset, [Online]. Available: https://hub.digitalasset.com/hubfs/Documents/Digital%20Asset%20Platform%20-%20Non- technical%20White%20Paper.pdf.
[5] "Chaincode Tutorials," Hyperledger Fabric, [Online]. Available: https://hyperledger- fabric.readthedocs.io/en/release-1.3/chaincode.html.
[6] M. Hearn, "Corda: A distributed ledger," R3 Corda, 29 November 2016. [Online]. Available: https://www.corda.net/content/corda-technical-whitepaper.pdf.
[7] C. D. Clark, V. A. Bakshi and L. Braine, "Smart contract templates: foundations, design landscape and research directions," arXiv preprint arXiv:1608.00771, 2016.
[8] L. Luu, D.-H. Chu, H. Olicke, P. Saxena and A. Hobor, "Making Smart Contracts Smarter," Cryptology ePrint Archive, Report 2016/633, 2016.
[9] K. Bhargavan, A. Delignat-Lavaud, C. Fournet, A. Gollamudi, G. Gonthier, N. Kobeissi, N. Kulatova, A. Rastogi, T. Sibut-Pinote, N. Swamy and S. Zanella-Béguelin, "Formal Verification of Smart Contracts:
Short Paper," PLAS '16 Proceedings of the 2016 ACM Workshop on Programming Languages and Analysis for Security, pp. 91-96 , 24 October 2016.
[10] K. Delmolino, M. Arnett, A. Kosba, A. Miller and E. Shi, "Step by step towards creating a safe smart contract: Lessons and insights from a cryptocurrency lab".Cryptology ePrint Archive, Report 2015/460, 2015.
[11] P. Di Gianantonio, F. Honsell and L. Liquori, "Extension, A Lambda Calculus of Objects with Self- Inflicted," Proceedings of OOPSLA’98. ACM Press, New York, p. 166–178, 1998.
[12] “ETSI ISG PDL standards”, [Online]. Available: https://www.etsi.org/technologies/permissioned-distributed- ledgers.
[13] “Digital Euro”, [Online]. Available: https://www.ecb.europa.eu/euro/digital_euro/html/index.en.html.
[14] “EU Digital Identity Wallet”, [Online]. Available: https://ec.europa.eu/digital-building- blocks/sites/display/EUDIGITALIDENTITYWALLET/EU+Digital+Identity+Wallet+Home.
[15] “Openzeppelin contracts libraries”, [Online]. Available: https://www.openzeppelin.com/contracts.
[16] “Foundry toolchain”, [Online]. Available: https://book.getfoundry.sh/.
· Name: Diksha Gupta
· Mail: diksha.gupta@inria.fr
LOCATION
· Inria Sophia-Antipolis Mediterranee
· Address: 2004 route des Lucioles, 06902 Sophia Antipolis
· Team: COATI
· Webpage: Coati
Federated Learning (FL) empowers a multitude of IoT devices, including mobile phones and sensors, to collaboratively train a global machine learning model while retaining their data locally [1,2]. A prominent example of FL in action is Google's Gboard, which uses a FL-trained model to predict subsequent user inputs on smartphones [3].
Two primary challenges arise during the training phase of FL [4]:
Data Privacy: Ensuring user data remains confidential. Even though the data is kept locally by the devices, it has been shown that an honest-but-curious server can still reconstruct data samples [5,6], sensitive attributes [7,8], and the local model [9] of a targeted device. In addition, the server can conduct membership inference attacks [10] to identify whether a data sample is involved in the training or source inference attacks to determine which device stores a given data sample [11].
Security Against Malicious Participants: Ensuring the learning process is not derailed by harmful actors. Recent research has demonstrated that, in the absence of protective measures, a malicious agent can deteriorate the model performance by simply flipping the labels [12] and/or the sign of the gradient [13] and even inject backdoors into the model [14] (backdoors are hidden vulnerabilities, which can be exploited under certain conditions predefined by the attacker, like some specific inputs).
In this project, we aim to propose novel FL algorithms to effectively tackle these two mutually linked challenges.
In particular, we want to explore the potentialities of compression to FL training, as these techniques can highly reduce the model dimension d, which may provide a solution for a computation-efficient private and secure FL system.
Compression techniques were initially introduced to alleviate communication costs in distributed training processes, where only a proportion of model parameters are sent from the device to the server in each communication round [15,16,17]. The primary objective of compression design is to ensure a communication-efficient machine learning/FL system, by providing model parameters selection rules at the device side which optimize the trained model performance under a given communication budget. [18,19] combined Byzantine resilient methods with compression, to ensure a communication-efficient secure FL system. However, in these studies, even though devices transmit compressed models to the server, Byzantines resilient methods still operate on the full models of dimension d. Consequently, adopting their solutions to build a private and secure FL system requires high computation load.
In this project, our goal is different: we target a best compression strategy for a computation-efficient private and secure FL system. More precisely, the goal of this project is to study a compression strategy which provides the best trade-off among privacy, the robustness (against adversarial threats), computational complexity and model performance.
This research topic can lead to an internship and then to a PhD position. We are actively looking for students with a strong motivation to pursue a research career.
For this internship, we expect the student to:
• Familiarize himself/herself with the intricacies of Federated Learning.
• Implement the Byzantine resilient method using PyTorch.
• Evaluate its effectiveness in maintaining privacy and its robustness against malicious threats within the FL framework.
PREREQUISITES
We are looking for a candidate with coding experience in Python and good analytical skills.
[1] McMahan et al, Communication-Efficient Learning of Deep Networks from Decentralized Data, AISTATS 2017
[2] Li et al, Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, p.p. 50-60, 2020
[3] Hard, Andrew et al, Federated Learning for Mobile Keyboard Prediction. arxiv: 1811.03604, 2019
[4] Kairouz et al, Advances and Open Problems in Federated Learning. Now Foundations and Trends, 2021
[5] Geiping et al, Inverting gradients - how easy is it to break privacy in federated learning?, NeurIPS 2020
[6] Yin et al, See through gradients: Image batch recovery via gradinversion, CVPR 2021
[7] Lyu et al, A novel attribute reconstruction attack in federated learning, FTL-IJCAI 2021
[8] Driouich et al, A novel model-based attribute inference attack in federated learning, FL-NeurIPS22, 2022.
[9] Xu et al, What else is leaked when eavesdropping Federated Learning? PPML-CCS, 2021
[10] Zari et al, Efficient Passive Membership Inference Attack in Federated Learning, PriML-NeurIPS workshop, 2022
[11] Hu et al, Souce inference attacks in federated learning, ICDM 2021
[12] Fang et al, Local model poisoning attacks to Byzantine-robust federated learning, in 29th USENIX Security Symposium, 2020
[13] Wu et al, Federated variance-reduced stochastic gradient descent with robustness to byzantine attacks, IEEE Transactions on Signal Processing, vol. 68, pp. 4583–4596, 2020
[14] Wang et al, Attack of the tails: yes, you really can backdoor federated learning, NeurIPS 2020
[15] Alistarh et al, QSGD: Communication-efficient sgd via gradient quantization and encoding. NeurIPS 2017.
[16] Alistarh et al, The convergence of sparsified gradient methods. NeurIPS 2018.
[17] Haddadpour et al, Federated learning with compression: unified analysis and sharp guarantees, AISTATS 2021
[18] Gorbunov et al, Variance Reduction is an Antidote to Byzantines: Better Rates, Weaker Assumptions and Communication Compression as a Cherry on the Top, ICLR 2023
[19] Zhu, H et al. Byzantine-Robust Distributed Learning With Compression. IEEE Trans. on Signal and Inf. Process. over Networks 9, 280–294, 2023.
Who?
Name: Cinzia Di Giusto & Etienne Lozes
Mail: cinzia.di-giusto@univ-cotedazur.fr, Etienne.lozes@univ-cotedazur.fr
Telephone:
Web page: https://webusers.i3s.unice.fr/~cdigiusto/web/
Where?
Place of the project: Laboratoire I3S
Address: 2000 route de lucioles
Team: MDSC
Web page: https://www.i3s.unice.fr/fr/
What?
Pre-requisites if any: none
Detailed description: indicate the context of the work, what is
expected from the intern, what will be the outcome (software,
publication, …).
Context. The field of behavioral types [4, 5] has made significant advancements characterizing information exchanged in distributed systems. Multiparty Session Types (MPST) orchestrate interactions among multiple participants using global types (protocol specifications) and their local projections. Implementability, ensuring properties like deadlock freedom and session conformance, is central to MPST.
Traditionally, research has focused on peer-to-peer communication models, but real-world systems often involve diverse models, such as mailbox-based or causally ordered systems. A key challenge is that implementability is not monotonic across communication models ; global types implementable in one model may not work in another. In [1], we have introduced a parametric MPST theory adaptable to various communication models.
Key contributions include :
1. developing MPST frameworks for models like unordered, casual order and bus-based communications ;
2. Defining implementability through three semantic conditions : i) Local types are quasi-synchronous (QS) systems, ii) Local types ensure deadlock freedom and absence of orphan messages, iii) The global type is implementable in synchronous semantics.
This approach provides a general framework to extend MPST theory beyond peer-to-peer systems, emphasizing quasi-synchronous systems for their decidability properties in verifying system correctness.
The internship. The main objective of the internship is to develop a tool that implements the algorithms of [1].
The results in [1] strongly rely on the notion of quasi-synchronous systems, which is a slight variant of the notion of realisable with synchronous communications systems that was introduced in [2]. The algorithms in [2]
have been implemented within the tool RESCU [3].
Hence, during the internship we plan to :
1. getting acquainted with the literature (analysis of the state of the art), with a focus, in particular on : communicating automata, monadic second order logic and behavioral theory.
2. extend RESCU to handle the new class of systems and in particular handle the treatement of deadlock freedom.
References: set of bibliographical references (article, books, white papers, etc) to be read by the student before starting to work on this subject
[1] Cinzia Di Giusto, Etienne Lozes, Pascal Urso. On the Impact of Communication Models on Multiparty Session Types Implementability. (Under submission)
[2] Cinzia Di Giusto, Loïc Germerie Guizouarn, and Étienne Lozes. 2023. Multiparty half-duplex systems and synchronous communications. J. Log. Algebraic Methods Program. 131 (2023), 100843.
[3] Loïc Desgeorges, Loïc Germerie Guizouarn : RSC to the ReSCu : Automated Verification of Systems of Communicating Automata. COORDINATION 2023 : 135-143 2021
[4] Vasco Thudichum Vasconcelos and Kohei Honda : Principal Typing Schemes in a Polyadic pi-Calculus. CONCUR ’93, 524–538, 1993,
[5] Nobuko Yoshida and Lorenzo Gheri : A Very Gentle Introduction to Multiparty Session Types. ICDCIT 2020, 73–93.