What?
Title: Troubleshooting video streaming slowness: An experimental approach backed by Machine Learning
Who?
Name: Chadi Barakat, Yassine Hadjadj-Aoul, Sanaa Ghandi
Mail: Chadi.Barakat@inria.fr, yassine.hadjadj-aoul@irisa.fr, Sanaa.Ghandi@inria.fr
Web page: https://team.inria.fr/diana/chadi/ https://people.irisa.fr/Yassine.Hadjadj-Aoul/
Where?
Place of the project: Inria centre at Université Côte d'Azur
Address: 2004, Route des Lucioles, 06902 Sophia Antipolis, France
Team: Diana team
Web page: https://team.inria.fr/diana/
Pre-requisites if any: Network programming and data analytics skills. Knowledge in video streaming and Machine Learning.
Description:
Video streaming is key service in the Internet today accounting by itself for more than half of the global Internet traffic. Most often video streaming runs in a smooth manner thanks to the different protocols involved, mainly the DASH protocol. However, when the experience with video streaming degrades, either in the form of stalls or resolution switches, users are frustrated especially that most often they don't know what exactly causes this degradation (is it a saturation of the WiFi for example? a bad signal to noise ratio? a slow access to the Internet caused by a slow data plane? or any other cause?). Troubleshooting the network in this case is doable, but requires the deployment of monitoring tools and techniques that not anyone is expert in. In this project, we will explore another approach based on collected data from within the browser (or the application running video streaming). The data we are seeking for is about chunks and their resolutions, but also any other data that can be available within the browser such as CPU load or web level measurement on page rendering that can be provided by the Navigation API of the browser. Usually data on the network is not available to the user (or hard to be obtained by non experts), that's why we will try to infer it by solely relying on whatever data is available within the browser about the video streaming experience. By running extensive experiments locally with real players and real videos and in artificial and variant network conditions, we will collect both types of data, network level and video streaming level, then with the help of machine learning we will bridge the gap between the both sets and propose models that can infer the network performance from the video streaming experience, detect video streaming anomalies, and classify the origin of these anomalies.
We have developed this methodology in the past for web browsing (see references below) and have produced several models to infer network performance and classify network anomalies, with very good results about the capacity of these models. The objective in this PFE is to build upon this prior work and extend it to video streaming. Video streaming brings a new type of network traffic (greedier and lasting longer in time) and has different dependence on network performance, so we expect our results and models not to work in this new context, and so have to be reproduced with hopefully better performance in view of the different characteristics of video streaming traffic. In the TER we will start exploring this topic by first reviewing the literature on video streaming monitoring and troubleshooting, and our prior work on web browsing, then (2) we will move to adapt our experimental testbed to this new scenario of experimentation. We will then run first experiments to prove that we can indeed control the quality of video streaming and collect data about the network and the streaming experience. We will calibrate first machine learning models able to predict the network performance from the video streaming experience. This work will be pursued in an internship later (for motivated and skilled students) with the running of extensive experiments with different types of videos and network conditions, the introduction of degradations and the classification of these degradations.
Useful Information/Bibliography:
[1] Naomi Kirimi, Chadi Barakat, Yassine Hadjadj-Aoul, “Passive network monitoring and troubleshooting from within the browser: a data-driven approach“, in proceedings of the 20th International Wireless Communications & Mobile Computing Conference (IWCMC), Multimedia Symposium, Cyprus, May 2024.
[2] Imane Taibi, Yassine Hadjadj-Aoul, Chadi Barakat, “Data Driven Network Performance Inference From Within The Browser“, in proceedings of the 12th IEEE Workshop on Performance Evaluation of Communications in Distributed Systems and Web based Service Architectures (PEDISWESA), Rennes, July 2020.
[3] Muhammad Jawad Khokhar, Thibaut Ehlinger, Chadi Barakat, “From Network Traffic Measurements to QoE for Internet Video“, in proceedings of IFIP Networking, Warsaw, Poland, May 2019.
Name: Nicolas Nisse et Frédéric Giroire
Mail: nicolas.nisse@inria.fr
Web page: http://www-sop.inria.fr/members/Nicolas.Nisse/
Place of the project:
Address: Inria, 2004 route de Lucioles, SOPHIA ANTIPOLIS
Team: COATI (common project Inria/I3S)
Web page: https://team.inria.fr/coati/
This project is part of our study of the evolution of researchers' productivity and collaborations, and how they are affected by funding, whether national or international. In this context, we have collected all the publications (with at least one French author) in SCOPUS. We have developed various algorithms and metrics to assess the proximity between researchers, using the journals and conferences in which they publish. The data has already been greatly consolidated, but there are still some ambiguities (notably due to the names of the journals and conferences). In addition, the various measures of proximity need to be evaluated and compared so that an online tool can be offered, enabling researchers to situate themselves in relation to disciplines and/or their colleagues.
This work will be supervised by Frédéric Giroire (CNRS/I3S) and Nicolas Nisse (Inria/I3S) and is in collaboration with Michele Pezzoni (UniCA/GREDEG).
More specifically, the student(s) will have to carry out the following tasks:
• Online tool for estimating the distance between two researchers
◦ 2D projection of vectors representing researchers in publication space.
◦ Options: main publications, publication period
◦ Consolidation of the list of conferences and journals for all disciplines
Tool: Design of an AI model for classifying conference names.
• Exploration of the main components or clusters of a researcher's publications.
• Study the relationship / correlation between the different metrics.
Expected skills :
• Python programming, web programming
• Data science/data analysis
• Skills in AI models would be a plus.
Name: Frédéric Giroire et Davide Ferré
Mail: frederic.giroire@inria.fr
Web page: https://www-sop.inria.fr/members/Frederic.Giroire/
Place of the project:
Address: Inria, 2004 route de Lucioles, SOPHIA ANTIPOLIS
Team: COATI (common project Inria/I3S)
Web page: https://team.inria.fr/coati/
Pre-requisites:
Knowledge in networking and machine learning.
Python.
Description:
The exponential advances in Machine Learning (ML) are leading to the deployment of Machine Learning models in constrained and embedded devices, to solve complex inference tasks. At the moment, to serve these tasks, there exist two main solutions: run the model on the end device, or send the request to a remote server. However, these solutions may not suit all the possible scenarios in terms of accuracy or inference time, requiring alternative solutions.
Cascade inference is an important technique for performing real-time and accurate inference given limited computing resources such as MEC servers. It combines more than two models to perform inference: a highly-accurate but expensive model with a low-accuracy but fast model, and determines whether the expensive model should make a prediction or not based on the confidence score of the fast model. A large pool of works exploited this solution. The first ones to propose a sequential combination of models were [1] for face detection tasks, then, in the context of deep learning, cascades have been applied in numerous tasks [2,3].
Early Exit Networks take advantage of the fact that not all input samples are equally difficult to process, and thus invest a variable amount of computation based on the difficulty of the input and the prediction confidence of the Deep Neural Network [5]. Specifically, early-exit networks consist of a backbone architecture with additional exit heads (or classifiers) along its depth. At inference time, when a sample propagates through the through the network, it passes through the backbone and each of the exits in and the result that satisfies a predetermined criterion (exit policy) is (exit policy) is returned as the prediction output, bypassing the rest of the the rest of the model. In fact, the exit policy can also reflect the capabilities and load of the target device, and dynamically adapt the network to meet specific runtime requirements [6].
Our project is to use cascade models and/or early-exit models in the context of Edge Computing to improve the delay and reduce the resource usage of ML inference tasks at the edge. Of crucial importance for cascade models or early-exit models, is the confidence of the fast model. Indeed, if the prediction of the first model is used but wrong, it may lead to a low accuracy of the cascade model, even if the accuracy of the best model is very high. Similarly, if the first model confidence is set too low, it will never be used, and the computations will be higher than using only the second model by itself, additionally, we will use unnecessary network resources and have higher deals than necessary. Researchers have proposed methods to calibrate such systems [4]. However, they have not explored the choice of the loss function of such systems in depth.
In this project, we will explore the use of a new loss function for the fast models (or first exit) of cascade networks (of early-exit models). Indeed, such networks do not have the same goal as the global system, as they should only act as a first filter.
Useful Information:
The PFE can be followed by an internship for interested students.
A grant is also funded for a potential future PhD on the topic.
Bibliography:
[1] Viola, P., & Jones, M. (2001, December). Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001 (Vol. 1, pp. I-I). Ieee.
[2] Wang, X., Kondratyuk, D., Christiansen, E., Kitani, K. M., Alon, Y., & Eban, E. (2020). Wisdom of committees: An overlooked approach to faster and more accurate models. arXiv preprint arXiv:2012.01988.
[3] Wang, X., Luo, Y., Crankshaw, D., Tumanov, A., Yu, F., & Gonzalez, J. E. (2017). Idk cascades: Fast deep learning by learning not to overthink. arXiv preprint arXiv:1706.00885.
[4] Enomoro, S., & Eda, T. (2021, May). Learning to cascade: Confidence calibration for improving the accuracy and computational cost of cascade inference systems. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 8, pp. 7331-7339).
[5] Laskaridis, S., Kouris, A., & Lane, N. D. (2021, June). Adaptive inference through early-exit networks: Design, challenges and directions. In Proceedings of the 5th International Workshop on Embedded and Mobile Deep Learning (pp. 1-6).
[6] Laskaridis, S., Venieris, S. I., Almeida, M., Leontiadis, I., & Lane, N. D. (2020, September). SPINN: synergistic progressive inference of neural networks over device and cloud. In Proceedings of the 26th annual international conference on mobile computing and networking (pp. 1-15).
Advisors: Frédéric Giroire and Stéphane Pérennes
Emails: frederic.giroire@inria.fr , stephane.perennes@inria.fr
Web Site:
http://www-sop.inria.fr/members/Frederic.Giroire/
Laboratory: COATI project - INRIA (2004, route des Lucioles – Sophia Antipolis) and the
startup Hive (https://www.hivenet.com/)
Web page: https://team.inria.fr/coati/
Pre-requisites:
Knowledge in networking and probability and graph theory.
Programming in python, C/C++ or java.
Description:
The internship will be done in collaboration with the startup Hive ( https://www.hivenet.com/ )
and may be followed by a internship for interested students.
Large scale peer-to-peer systems are foreseen as a way to provide highly reliable data
storage at low cost. To ensure high durability and high resilience over a long period of time the
system must add redundancy to the original data. It is well-known that erasure coding is a
space efficient solution to obtain a high degree of fault-tolerance by distributing encoded
fragments into different peers of the network. Therefore, a repair mechanism needs to cope
with the dynamic and unreliable behavior of peers by continuously reconstructing the missing
redundancy. Consequently, the system depends on many parameters that need to be well
tuned, such as the redundancy factor, the placement policies, and the frequency of data repair.
These parameters impact the amount of resources, such as the bandwidth usage and the
storage space overhead that are required to achieve a desired level of reliability, i.e.,
probability of losing data.
In this project, we will compare different repair policies and erasure codes. Indeed, some
erasure codes (maximum distance separable (MDA) codes) such as Reed Solomon [41] have
been shown to be optimal in terms of reception efficiency, i.e. the number of chunks required
for reconstructing a lost chunk in our context. This means that they have an optimal storage
space usage for a given number of tolerated failures for a distributed storage system.
However, they are not very efficient in terms of bandwidth usage when a reconstruction has to
be done. Indeed, the original data has to be fully reconstructed when a small chunk of data is
lost to keep the redundancy of the system. Now, this operation happens constantly as disk
failures are frequently happening in large distributed systems and peers may leave the
system. As bandwidth is a crucial resource in distributed systems, alternative repair policies
such as lazy reconstruction [2] or new codes such as hybrid codes [3], Hierarchical Codes [4],
and regenerating codes [1] have been proposed to decrease the bandwidth used for repair.
The latter are near optimal in terms of bandwidth usage. However, this comes at a cost of
much higher computational cost [5]. We will thus explore which codes present the best trade
off between storage space, bandwidth usage, computational cost, number of tolerated failures
and mean time to failure, data availability, and download speed.
References.
[1] Papailiopoulos, D. S., Luo, J., Dimakis, A. G., Huang, C., & Li, J. (2012, March). Simple
regenerating codes: Network coding for cloud storage. In 2012 Proceedings IEEE INFOCOM (pp. 2801-2805). IEEE.
[2] Giroire, F., Monteiro, J., & Pérennes, S. (2010, December). Peer-to-peer storage systems:
a practical guideline to be lazy. In 2010 IEEE Global Telecommunications Conference
GLOBECOM 2010 (pp. 1-6). IEEE.
[3] Rodrigues, R., & Liskov, B. (2005, February). High availability in DHTs: Erasure coding vs.
replication. In International Workshop on Peer-to-Peer Systems (pp. 226-239). Springer,
Berlin, Heidelberg.
[4] Duminuco, A., & Biersack, E. (2008, September). Hierarchical codes: How to make erasure
codes attractive for peer-to-peer storage systems. In 2008 Eighth International Conference on
Peer-to-Peer Computing (pp. 89-98). IEEE.
[5] Duminuco, A., & Biersack, E. (2009, June). A practical study of regenerating codes for
peer-to-peer backup systems. In 2009 29th IEEE International Conference on Distributed
Computing Systems (pp. 376-384). IEEE
Who?
Name: Arnaud Legout
Mail: arnaud.legout@inria.fr
Web page: https://www-sop.inria.fr/members/Arnaud.Legout/
Name: Damien Saucez
Mail: damien.saucez@inria.fr
Web page: https://team.inria.fr/diana/team-members/damien-saucez/
Where?
Place of the project: DIANA team, Inria, Sophia Antipolis
Address: 2004 route des Lucioles
Team: DIANA
Web page: https://team.inria.fr/diana/
Pre-requisites if any: Familiar with Python, Linux, basic system performance knowledge, highly
motivated to work in a research environment and exited to tackle hard problems.
Description:
Making datascience is not only a programming or machine learning issue, it is also
a system issue for most practical use cases. One of which is to make the best use of the available RAM.
Making computations requiring a large amount of memory that exceeds the available RAM
require the OS the swap memory pages on disk. Even in case your process does not exceed the
available RAM, the Linux memory management may proactively swap memory pages.
We observed that under certain circumstances, the swap process dramatically
reduces the performance leading to a pathological behavior in which retrieving pages
from the swap becomes much slower than the disk speed.
The goal of this PFE is to understand and document the current Linux memory management,
how it interacts with the Python interpreter, and how to reproduce the
circumstances under which we enter a pathological behavior.
This PFE requires a good understanding of the internals of the Linux operating system and
a good knowledge of C and Python. It will be mandatory to look at Linux
and Python source code (written in C) to understand the details and the undocumented
behavior.
Also a part of the PFE will be to run experiments to reproduce and understand the conditions
under which we observe a pathological behavior.
This PFE will continue for motivated students on an internship in which the intern will
have to tackle the pathological behavior and propose a solution. Excellent students will
have the possibility to continue for a Ph.D. thesis.
Useful Information/Bibliography:
What every programmer should know about memory
https://lwn.net/Articles/250967/
Python memory management
https://docs.python.org/3/c-api/memory.html
Memory Management
https://docs.kernel.org/admin-guide/mm/index.html
Who?
Name: Arnaud Legout
Mail: arnaud.legout@inria.fr
Web page: https://www-sop.inria.fr/members/Arnaud.Legout/
Where?
Place of the project: DIANA team, Inria, Sophia Antipolis
Address: 2004 route des Lucioles
Team: DIANA
Web page: https://team.inria.fr/diana/
Pre-requisites if any: Python and web development, basic knowledge on how LLM work
Description:
Large Language Models (LLM) have been a revolution is AI in the past two years.
In particular, copilots (such as GitHub copilot) are used to assist humans by predicting
what they want to do next. However, little is known on how a copilot interact
with the cognitive process. In particular, is it possible for a copilot to influence
and even change the mind of the assisted human?
The goal of this TER is to explore the literature on the domain and start implementing
an experimental prototype to test how a copilot can influence the cognitive process.
The prototype will be a simple Web page simulating a copilot, for which we control
the possible completed sentence.
This TER will continue for motivated students on an internship. Excellent student will
have the possibility to continue for a Ph.D. thesis.
Useful Information/Bibliography:
Erik Jones and Jacob Steinhardt. “Capturing failures of large language models via human cognitive biases”. In:
Advances in Neural Information Processing Systems 35 (2022), pp. 11785–11799.
Enkelejda Kasneci et al. “ChatGPT for good? On opportunities and challenges of large language models for
education”. In: Learning and individual differences 103 (2023), p. 102274.
Celeste Kidd and Abeba Birhane. “How AI can distort human beliefs”. In: Science 380.6651 (2023), pp. 1222–1223
Bill Thompson and Thomas L Griffiths. “Human biases limit cumulative innovation”. In: Proceedings of the Royal
Society B 288.1946 (2021), p. 20202752.
Canyu Chen and Kai Shu. “Can LLM-Generated Misinformation Be Detected?” In: arXiv preprint arXiv:2309.13788
(2023).
Who?
Name: Frédéric MALLET
Mail: Frederic.Mallet@univ-cotedazur.fr
Web page: https://www-sop.inria.fr/members/Frederic.Mallet/
Where?
Place of the project: Inria Lagrange
Address: 2000 route des Lucioles
Team: Kairos (i3S/Inria)
Web page: https://team.inria.fr/kairos/
Pre-requisites if any:
Description: CCSL is a constraint language based on logical clocks that can be used to specify temporal and timed contracts. In the context of the European Project HAL4SDV we would like to use CCSL and its recent extension (called RT-CCSL) to express contracts for autonomous vehicles. This is particularly tricky as autonomous vehicles evolve in a very uncertain environment where it is difficult to predict what can happen. Also autonomous vehicles use more and more AI-Based components the behaviour of which may be difficult to predict or explain (in the corner cases : best and worst), but when safety is astake, these corner cases cannot be left unattended. The PFE is about making a thorough bibliography of languages that can be used to specify temporal and timed scenarios and contracts for vehicles and compare them to CCSL and RT-CCSL. In particular the bibliography should focus on languages used to explain the behaviour of AI-based components in the context of ExplainableAI (not necessarily meant for Real-Time or critical systems). If the PFE goes well, there could be a continuation for an internship and possibly a PhD.
Useful Information/Bibliography:
• https://timesquare.inria.fr: gives a list of useful articles about CCSL.
• https://publications.pages.asam.net/standards/ASAM_OpenSCENARIO/ASAM_OpenSCENARIO_DSL/latest/index.html
• Supervisors: Joanna Moulierac (COATI) ; Alexandre Guitton (LIMOS, Université Clermont-
Auvergne)
• Mail: joanna.moulierac@inria.fr, alexandre.guitton@uca.fr
• Place of the project: Inria Sophia Antipolis
• Address: COATI project-team, 2004 route des Lucioles
• Web pages:
http://www-sop.inria.fr/members/Joanna.Moulierac/ ; https://perso.isima.fr/~alguitto/
Description:
In recent years, Mobile Edge Computing (MEC) has emerged as a promising solution for enhancing
energy efficiency in network applications. As a matter of fact, by offloading computational tasks to a
server deployed near the base station and by caching both the outputs and the related code for completed
tasks, it is possible to effectively reduce the energy consumption of mobile devices while ensuring
adherence to their latency constraints. Although there have been some previous works on task caching
and task offloading on the cloud, most of them focus on only one of these two strategies or formulate
optimization problems that are hard to solve and propose suboptimal solutions.
In this PFE, we will study a linear model for the joint task caching and offloading optimization problem.
The idea is to focus on this proposed model [1] where the nearby and grid powered computing and
storage facility can be used in two different manners. First, the mobile applications can decide to offload
part of their computation tasks, in terms of code and data to the MEC server. This is formally called task
offloading. Second, the control plane of the applications, likely deployed in the central cloud, can decide
to cache some tasks in advance. This is called task caching, and aims at storing popular or
computationally intensive tasks.
We will first propose to analyze this model, and then we will try to extend it by proposing a more
complete model. As an example, one extension will be to include the energy cost of the cloud (which is
not present in the initial model) for the transmission, the storing and the computation of the model. Other
interesting extensions will be studied, and a performance evaluation will be proposed in order to evaluate
the efficiency of the proposed solution.
The PFE can thus be followed by a master internship and by a PhD for a motivated student. The student
should have a taste for networking and optimization problems, algorithmic and linear programming.
[1] Y. Hao, M. Chen, L. Hu, M. S. Hossain, and A. Ghoneim, “Energy efficient task caching and
offloading for mobile edge computing,” IEEE Access, 2018.
Who?
Name: Thierry Turletti and Chadi Barakat and Walid Dabbous
Mail: Thierry.Turletti@inria.fr and Chadi.Barakat@inria.fr and Walid.Dabbous@inria.fr
Telephone: 04 92 38 77 77
Web page: https://team.inria.fr/diana/
Where?
Place of the project: Diana Project-Team, Inria centre at Université Côte d'Azur Address: 2004, route des Lucioles, 06902 Sophia Antipolis, France
Team: Diana team
Web page: https://team.inria.fr/diana/
What?
Pre-requisites if any: Strong programming skills, scripting, DevOps. Knowledge of Kubernetes is a plus. Knowledge in network protocols, cellular networks, data analytics.
Detailed description:
Nowadays, Mobile networks (5G and beyond) are witnessing a revolution with the increase in the bitrate, the densification of the wireless cells, and the advent of virtualization and softwarization allowing to deploy network functions and services in data centres, most of them placed at the edge of the network. The emergence of new services and applications (e.g., AR/VR, drones, autonomous vehicles) pose serious service quality requirements on the network. Network slicing was introduced to help operators create dedicated, virtualized, and isolated logical networks on a general physical network to meet the differentiated network capacity requirements of customers. An efficient network management that takes into account the quality of experience of end users does not only depend on simple network metrics such as the delay, or physical proximity, but rather on a complex set of metrics such as the bitrate in both directions, the jitter, the packet loss rate, the context of mobility, the device properties, etc. The monitoring and consideration of all these metrics by the network in an accurate and timely way represents a real challenge. Further, and given the large number of devices foreseen at the edge and their mobility and time dynamics, any management plane for the network has to be of low cost, able to scale with the number of users, devices and services, and must track the whole system in an efficient manner. The network should be able by the end to meet the requirements of the end users and to exploit to the maximum the available capacity in the underlying infrastructure.
In this TER, the student will learn how to deploy a 5G sliced network (both core network and radio access network parts) on a real testbed composed of 5G radio hardware located on the SophiaNode platform including the R2lab anechoic chamber and a Kubernetes cluster. In particular, she/he will overview the literature for models for the Quality of Service (QoS) promises of sliced wireless networks; learn how to use the R2lab testbed to deploy scenarios; get familiar with the 5G OpenAirInterface (OAI) software; propose and execute a scenario aiming to evaluate the slicing performance and assess its QoS capacity by analyzing key metrics previously identified.
This TER is proposed in the context of the SLICES European project and the national Priority Research Programme and Equipment (PEPR) on 5G, 6G and Networks of the Future. It will be followed by an internship if satisfactory results are obtained.
References: set of bibliographical references (article, books, white papers, etc) to be read by the student before starting to work on this subject
- R2lab anechoic chamber, https://r2lab.inria.fr/
- OpenAirInterface, https://openairinterface.org/
- ESFRI SLICES European project, https://www.slices-ri.eu/what-is-esfri/
- PEPR on 5G, 6G and Networks of the Future, https://pepr-futurenetworks.fr/en/home/
- M. Lahsini, T. Parmentelat, T. Turletti, W. Dabbous and D. Saucez, "Sophia-node: A Cloud-Native Mobile Network Testbed," 2022 IEEE Conference on Network Function Virtualization and Software Defined Networks (NFV-SDN), Phoenix, AZ, USA, 2022. https://inria.hal.science/hal-03907719v1/
- Arora, Sagar. Cloud Native Network Slice Orchestration in 5G and Beyond. Diss. Sorbonne Université, 2023. https://theses.hal.science/tel-04269161/
Who?
Name: Fabrice Huet, Françoise Baude
Mail: fabrice.huet@univ-cotedazur.fr,francoise.baude@univ-cotedazur.fr
Telephone: +33 4 92 94 26 91
Web page: https://sites.google.com/site/fabricehuet/
Where?
Place of the project: I3S Laboratory, Sophia Antipolis
Address: 2000 route des lucioles
Team: Scale
Web page: https://scale-project.github.io/
Description:
Distributed event queues have become a central component in constructing large-scale and
real-time cloud applications. They are currently employed in various latency-sensitive
cloud applications, such as recording and analyzing web accesses for recommendations and ad
placement, health care monitoring, fraud detection, smart grids, and intelligent
transportation.
A distributed event queue comprises several partitions or sub-queues deployed across a
cluster of servers. Applications (Event Consumers) that pull and process events from
distributed queues are latency-sensitive. They necessitate a high percentile of events to
be processed within a desired latency. Overprovisioning resources to meet this latency
requirement is suboptimal, as it incurs substantial monetary costs for the service
provider. Therefore, designing solutions for resource-efficient and latency-aware event
consumers from distributed event queues is crucial. Such an architecture should dynamically
provision and deprovision resources (event consumer replicas) to minimize resource usage
while ensuring the required service level agreement (SLA).
To achieve this objective, we have framed the problem of autoscaling event consumers from
distributed event queues to meet a desired latency as a bin pack problem. This bin pack
problem is dependent on the arrival rate of events into queues, the number of events in the
queues backlog, and the maximum consumption rate of the event consumers. Our solution has
been implemented in Java and runs on a Kubernetes infrastructure.
The objectives of this project are
- Study the litterature to identify important papers which evaluate the performance of distributed
event queues like Apache Kafka
- Extract informations about experimental setup like workload used, number of partitions...
- Implement and test some of these setups on our system and analyze the results.
References
Mazen Ezzeddine, Gael Migliorini, Françoise Baude, Fabrice Huet. Cost-Efficient and Latency-Aware Event Consuming in Workload-Skewed Distributed Event Queues. 6th International Conference on Cloud and Big Data Computing (ICCBDC’2022)
Mazen Ezzeddine, Francoise Baude, Fabrice Huet. Tail-latency aware and resource-efficient bin pack autoscaling for distributed event queues. CLOSER 2024 - 14th International Conference on Cloud Computing and Services Science, May 2024, Angers (FR), France. ?hal-04478363?
G. Shapira , T. Palino, R. Sivaram and K. Petty. Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale, second edition, O’Reilly Media, Inc., 2021.
Y.Ajiro, A.Tanaka. Improving packing algorithms for server consolidation. In int. CMG Conference,Vol. 253, 2007.
· Name: Chuan Xu
· Mail: chuan.xu@inria.fr
· Webpage: Chuan Xu
LOCATION
· Inria Sophia-Antipolis Mediterranee
· Address: 2004 route des Lucioles, 06902 Sophia Antipolis
· Team: COATI
· Webpage: Coati
Federated Learning (FL) empowers a multitude of IoT devices, including mobile phones and sensors, to collaboratively train a global machine learning model while retaining their data locally [1,2]. A prominent example of FL in action is Google's Gboard, which uses a FL-trained model to predict subsequent user inputs on smartphones [3].
Two primary challenges arise during the training phase of FL [4]:
Data Privacy: Ensuring user data remains confidential. Even though the data is kept locally by the devices, it has been shown that an honest-but-curious server can still reconstruct data samples [5,6], sensitive attributes [7,8], and the local model [9] of a targeted device. In addition, the server can conduct membership inference attacks [10] to identify whether a data sample is involved in the training or source inference attacks to determine which device stores a given data sample [11].
Security Against Malicious Participants: Ensuring the learning process is not derailed by harmful actors. Recent research has demonstrated that, in the absence of protective measures, a malicious agent can deteriorate the model performance by simply flipping the labels [12] and/or the sign of the gradient [13] and even inject backdoors into the model [14] (backdoors are hidden vulnerabilities, which can be exploited under certain conditions predefined by the attacker, like some specific inputs).
In this project, we aim to propose novel FL algorithms to effectively tackle these two mutually linked challenges.
In particular, we want to explore the potentialities of compression to FL training, as these techniques can highly reduce the model dimension d, which may provide a solution for a computation-efficient private and secure FL system.
Compression techniques were initially introduced to alleviate communication costs in distributed training processes, where only a proportion of model parameters are sent from the device to the server in each communication round [15,16,17]. The primary objective of compression design is to ensure a communication-efficient machine learning/FL system, by providing model parameters selection rules at the device side which optimize the trained model performance under a given communication budget. [18,19] combined Byzantine resilient methods with compression, to ensure a communication-efficient secure FL system. However, in these studies, even though devices transmit compressed models to the server, Byzantines resilient methods still operate on the full models of dimension d. Consequently, adopting their solutions to build a private and secure FL system requires high computation load.
In this project, our goal is different: we target a best compression strategy for a computation-efficient private and secure FL system. More precisely, the goal of this project is to study a compression strategy which provides the best trade-off among privacy, the robustness (against adversarial threats), computational complexity and model performance.
This research topic can lead to an internship and then to a PhD position. We are actively looking for students with a strong motivation to pursue a research career.
For this internship, we expect the student to:
• Familiarize himself/herself with the intricacies of Federated Learning.
• Implement the Byzantine resilient method using PyTorch.
• Evaluate its effectiveness in maintaining privacy and its robustness against malicious threats within the FL framework.
PREREQUISITES
We are looking for a candidate with coding experience in Python and good analytical skills.
[1] McMahan et al, Communication-Efficient Learning of Deep Networks from Decentralized Data, AISTATS 2017
[2] Li et al, Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, p.p. 50-60, 2020
[3] Hard, Andrew et al, Federated Learning for Mobile Keyboard Prediction. arxiv: 1811.03604, 2019
[4] Kairouz et al, Advances and Open Problems in Federated Learning. Now Foundations and Trends, 2021
[5] Geiping et al, Inverting gradients - how easy is it to break privacy in federated learning?, NeurIPS 2020
[6] Yin et al, See through gradients: Image batch recovery via gradinversion, CVPR 2021
[7] Lyu et al, A novel attribute reconstruction attack in federated learning, FTL-IJCAI 2021
[8] Driouich et al, A novel model-based attribute inference attack in federated learning, FL-NeurIPS22, 2022.
[9] Xu et al, What else is leaked when eavesdropping Federated Learning? PPML-CCS, 2021
[10] Zari et al, Efficient Passive Membership Inference Attack in Federated Learning, PriML-NeurIPS workshop, 2022
[11] Hu et al, Souce inference attacks in federated learning, ICDM 2021
[12] Fang et al, Local model poisoning attacks to Byzantine-robust federated learning, in 29th USENIX Security Symposium, 2020
[13] Wu et al, Federated variance-reduced stochastic gradient descent with robustness to byzantine attacks, IEEE Transactions on Signal Processing, vol. 68, pp. 4583–4596, 2020
[14] Wang et al, Attack of the tails: yes, you really can backdoor federated learning, NeurIPS 2020
[15] Alistarh et al, QSGD: Communication-efficient sgd via gradient quantization and encoding. NeurIPS 2017.
[16] Alistarh et al, The convergence of sparsified gradient methods. NeurIPS 2018.
[17] Haddadpour et al, Federated learning with compression: unified analysis and sharp guarantees, AISTATS 2021
[18] Gorbunov et al, Variance Reduction is an Antidote to Byzantines: Better Rates, Weaker Assumptions and Communication Compression as a Cherry on the Top, ICLR 2023
[19] Zhu, H et al. Byzantine-Robust Distributed Learning With Compression. IEEE Trans. on Signal and Inf. Process. over Networks 9, 280–294, 2023.
Name: Giovanni Neglia, Sara Alouf Mail: firstname.familyname@inria.fr Web page: https://www-sop.inria.fr/members/Giovanni.Neglia/, https://www-sop.inria.fr/members/Sara.Alouf/
Place of the project: Inria Address: 2004 route des Lucioles, 06902 Sophia Antipolis Team: NEO team Web page: https://team.inria.fr/neo/
Pre-requisites:
The ideal candidate should have strong programming skills, particularly in Python, with a propensity for simulation and experimental work. Additionally, a solid foundation in mathematics and analytical reasoning is essential, especially for candidates considering a follow-up internship or PhD. Experience with optimization techniques, machine learning, and deep learning models would be highly beneficial.
Description: Text-to-image generation using diffusion models [1] has seen widespread adoption due to its ability to generate high-quality images from textual prompts. However, the iterative denoising process inherent to diffusion models makes them computationally expensive, leading to increased latency and resource consumption, especially in real-time production environments. To address these challenges, systems like GPT-Cache [2] and Pinecone [3] have been developed, reducing latency by caching previously generated images and matching new prompts to cached images based on prompt similarity. More recently, NIRVANA [4] introduced a more efficient approach by reusing intermediate states of the diffusion process. This enables a reduction in the number of denoising steps required to generate new images, significantly lowering GPU usage and latency. Building on the supervisors' expertise on similarity caching [5-10], this student project will focus on the development and optimization of a novel caching technique designed to further minimize the computational cost and latency in text-to-image diffusion models.
Key research areas for this project include: - Designing and implementing new caching algorithms specifically tailored for text-to-image diffusion models. - Optimizing the balance between computational savings and image quality by tuning parameters and improving cache management strategies. - Evaluating the performance of the system under real-world conditions, using production traces to assess improvements in efficiency and latency.
This project offers the opportunity to contribute to cutting-edge advancements in machine learning and system optimization. With the potential for further exploration, it could expand into a internship or even a PhD project for motivated students interested in pursuing research in this area.
[1] Chenshuang Zhang, Chaoning Zhang, Mengchun Zhang, and In So Kweon. Text-to-image diffusion model in generative ai: A survey. arXiv preprint arXiv:2303.07909, 2023.
[2] zilliztech/gptcache: Semantic cache for LLMs. fully integrated with langchain and llama_index. https://github.com/zilliztec h/GPTCache.
[3] Making stable diffusion faster with intelligent caching | pinecone. ht tps://www.pinecone.io/learn/faster-stable-diffusion/
[4] Shubham Agarwal and Subrata Mitra and Sarthak Chakraborty and Srikrishna Karanam and Koyel Mukherjee and Shiv Kumar Saini, Approximate Caching for Efficiently Serving {Text-to-Image} Diffusion Models, 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24)
[5] Giovanni Neglia, Michele Garetto, and Emilio Leonardi, Similarity Caching: Theory and Algorithms, IEEE/ACM Transactions on Networking, online, December 2021
[6] Michele Garetto, Emilio Leonardi, Giovanni Neglia, Content Placement in Networks of Similarity Caches, Elsevier Computer Networks, online, November 2021
[7] Damiano Carra and Giovanni Neglia, Taking two Birds with one k-NN Cache, Proc. of The 2021 IEEE Global Communications Conference (Globecom 2021), Madrid, Spain, December 7-11, 2021
[8] Anirudh Sabnis, Tareq Si Salem, Giovanni Neglia, Michele Garetto, Emilio Leonardi, and Ramesh K. Sitaraman, GRADES: Gradient Descent for Similarity Caching, IEEE/ACM Transactions on Networking, online, November, 2022
[9] Tareq Si Salem, Giovanni Neglia, Damiano Carra, Ascent Similarity Caching with Approximate Indexes, IEEE/ACM Transactions on Networking, online, November, 2022
[10] Younes Ben Mazziane, Sara Alouf, Giovanni Neglia, Daniel Sadoc Menasche, "TTL Model for an LRU-Based Similarity Caching Policy" Computer Networks, Volume 241, March 2024
Name: Giovanni Neglia, Alain Jean-Marie Mail: firstname.familyname@inria.fr Web page: https://www-sop.inria.fr/members/Giovanni.Neglia/, https://www-sop.inria.fr/members/Alain.Jean-Marie/me.html Place of the project: Inria Address: 2004 route des Lucioles, 06902 Sophia Antipolis Team: NEO team Web page: https://team.inria.fr/neo/
Project Title: Confidential Computing for Distributed Machine Learning
Pre-requisites: Strong programming skills, particularly in Python, with a keen interest in research on distributed systems. A background in machine learning or cryptography is highly desirable. Experience with cloud platforms such as AWS or Azure is a significant advantage.
Description:
Confidential computing is a security paradigm designed to protect sensitive data during processing by keeping it encrypted even while in use. This is achieved through trusted execution environments (TEEs), such as AWS Nitro Enclaves [1] and Azure Confidential VMs [2], which isolate sensitive data from unauthorized access by cloud providers, administrators, and external attackers. This approach is particularly valuable in sectors with stringent data privacy requirements, including healthcare, finance, and government. By utilizing confidential computing, organizations can collaboratively train machine learning models without directly sharing raw data, ensuring compliance with privacy and security standards. This technology enables multiple stakeholders to pool their datasets and gain insights through secure multiparty computation, preserving data confidentiality throughout the process. This project investigates confidential computing as an alternative to federated learning [3] for enhancing the privacy and security of distributed machine learning systems. Key goals include: - Evaluating privacy guarantees: Analyze the privacy protections offered by confidential computing solutions. - Exploring trade-offs: Study the balance between performance, security, and resource overhead in different confidential computing architectures. - Efficiency comparison: Benchmark the computational efficiency of confidential computing against federated learning in distributed ML scenarios, evaluating factors like latency, scalability, and resource usage.
With the potential for further exploration, this project could expand into a internship or even a PhD project for motivated students interested in pursuing research in this area.
[1] https://aws.amazon.com/ec2/nitro/
[2] https://azure.microsoft.com/en-us/solutions/confidential-compute/#overview
[3] T. Li, A. K. Sahu, A. Talwalkar and V. Smith, "Federated Learning: Challenges, Methods, and Future Directions," in IEEE Signal Processing Magazine, vol. 37, no. 3, pp. 50-60, May 2020
Understanding the Energy Consumption of Generative AI models
1. Project Leaders
• Dino Lopez Pacheco <dino.lopez@univ-cotedazur.fr> - SigNet Team
• Fabrice Huet <fabrice.huet@univ-cotedazur.fr > - SCALE Team
2. Introduction
Information and Communication Technology (ICT) energy consumption represents
between 4% and 9% of the worldwide energy consumption, which represents between
1.4% and 4% of the Green House Gas (GHG) emissions. Energy and GHG emissions from
the ICT is also growing year by year [1]. Hence, decreasing the GHG emission footprint
from ICT is capital to tackle the current global warming trend.
One of the main factors behind the always growing energy demand of ICT is the incredible
high popularity of the generative Artificial Intelligence (AI). Indeed, as the popularity of
generative AI increases, the number of models behind them increases, as well as their
size (accounted in number of parameters) which is believed to have a direct impact on
the performance of models.
The increasing number of bigger AI models has driven both the increase in the number of
Data Center (DC) fabrics and the expansion of the existing ones, leading to huge energy
demands [4].
3. Project Objectives
Some reports exist on the energy consumption of the training and inference of very big
Large Language Models (such as the BLOOM 176B [3]). However, in this project we aim at
exploring with more details the energy consumption of different generative AI models, not
only the biggest ones, and under different hardware conditions.
In this project, the student will be required to
1. Keep doing the state of the art about the energy consumption of AI models.
2. Expand the current state of the art about the generative AI models to have a clear
view on the evolution of such models, from the chronology point of view, size,
purpose, software availability, etc.
3. Set up a testbed where the experiments will be conducted. Configure the server
and inverter to extract the energy related parameters.
4. Test available models, by designing multiple cases of training and utilization. Tests
will be done on the configured testbed and on Grid5000.
5. Explore the impact of different hardware models where generative IA is trained
and/or deployed.
4. Required Skills
• Python
• Networking
• System and Containers
5. Bibliography
[1] Erol Gelenbe. “Electricity Consumption by ICT: Facts, trends, and measurements. ”
Ubiquity 2023, August, Article 1, 15 pages. https://doi.org/10.1145/3613207
[2] Pablo Villalobos, Jaime Sevilla, Tamay Besiroglu, Lennart Heim, Anson Ho, Marius
Hobbhahn. "Machine Learning Model Sizes and the Parameter Gap." arXiv:2207.02852v1
[cs.LG] (2022).
[3] Alexandra Sasha Luccioni, Sylvain Viguier, and Anne-Laure Ligozat. “Estimating the
carbon footprint of BLOOM, a 176B parameter language model.” J. Mach. Learn. Res. 24,
1, Article 253 (January 2023), 15 pages.
[4] The Washington Post, “AI is exhausting the power grid. Tech firms are seeking a miracle
solution”. Last access: sept 2024.
https://www.washingtonpost.com/business/2024/06/21/artificial-intelligence-nuclear-
fusion-climate/