2021-2022

Description: The goal of the project is to develop methods to analyse the evolution over time of a social network. We will consider as example the graph of scientific collaborations as it can be crawled freely.

The project will have two phases:

- Data collection. In the first phase, the student will use the available bibliographic research tools (SCOPUS, Web of Science, Patstat) to create data sets. One corresponding to the current situation and others corresponding to past moments. The data sets will correspond mainly to networks (annotated graphs) of scientific collaborations.

- Data analysis. In the 2nd phase, the student will analyse this data. First, they will focus on simple metrics (number of publications, number of patent applications...) and compare the evolutions across time. Then, if there is time, she will start studying the evolution of the structure of the network and will look at whether they are observing an evolution of its clustering due to the emergence of new collaborations.

The project will be part of a larger project on the evaluation of the impact of funding on scientific research. The project involve researchers in economics, sociology, and computer science.

Keywords: graph algorithms, big data, graph algorithms, network analysis

Useful Information/Bibliography: The project is the continuation of the internship whose report can be found here: http://www-sop.inria.fr/members/Nicolas.Nisse/ReportsStudents/Shelest21.pdf

2- Evolution over time of the structure of social graphs: Fitting

Who?

Name: Malgorzata Sulkowska

Mail: malgorzata.sulkowska@inria.fr

Web page: https://team.inria.fr/coati/new-team-member-malgorzata-sulkowska/

Co-advisors : Frédéric Giroire (frederic.giroire@inria.fr) and Nicolas Nisse(nicolas.nisse@inria.fr)

Where?

Place of the project: Inria Sophia Antipolis

Address: 2004 route des Lucioles, 06902 Sophia Antipolis

Team: COATI

Web page: https://team.inria.fr/coati/

Pre-requisites if any:

Description: The goal of the project is to develop methods to analyse large scale networks and to develop models to generate random graphs with similar properties. On the practical side, we will analyse the evolution of the degree of the nodes in a large collaboration network of scientific publications (extracted from SCOPUS). On the theoretical part, we will test and analyse different preferential attachment models with varying attachment functions. The results of the simulations will help us to chose the proper families of attachment functions well modeling our data.

The project will be part of a larger project on the evaluation of the impact of funding on scientific research. The project involve researchers in economics, sociology, and computer science.

Keywords: graph algorithms, big data, graph algorithms, network analysis

Useful Information/Bibliography: The project is the continuation of the internship whose report can be found here: http://www-sop.inria.fr/members/Nicolas.Nisse/ReportsStudents/Ohulanski21.pdf

3- Leveraging the wealth of data available in the browser for network monitoring and troubleshooting

Who?

Name: Chadi Barakat (Inria, Diana project-team) and Yassine Hadjadj-Aoul (Inria, Dionysos project-team)

Mail: Chadi.Barakat@inria.fr, yhadjadj@irisa.fr

Web page: https://team.inria.fr/diana/chadi/, http://people.irisa.fr/Yassine.Hadjadj-Aoul/

Where?

Place of the project: Inria Sophia Antipolis

Address: 2004, route des Lucioles, 06902 Sophia Antipolis, France

Team: Diana

Web page: https://team.inria.fr/diana/

Prerequisites if any: Basic knowledge in web and network programming

Description:

Context:

Despite the considerable improvement we saw last years in terms of internet access performance and the quality of the physical and virtualised infrastructures hosting the internet services, we are still facing situations where the internet service degrades and the end user Quality of Experience (QoE) is less than expected. The reasons are many, from the slowness of the device of the user, to the bad configuration of the WiFi at home, to the interference caused by the neighbouring WiFi networks, to the saturation of the access link by the many devices and applications running at home, to the congestion in the ISP network especially on its peering links, till the overload of the servers of the content providers following a sudden increase in the users’ activities. There are also situations where the QoE degrades for other reasons than congestion or lack of resources, as when the ISP or the content providers decide to reduce the quality of their service to prioritise some part of the traffic over the rest a.k.a. network traffic differentiation, or to face scenarios of heavy service usage (video resolution reduction by major video stream platforms during confinement period). Those situations, and many others, exist well today and will not be solved in the immediate future despite the considerable advances seen and foreseen both at the network and the cloud levels. The problem is not only in the frustration they cause for the end user, but also in the difficulty for the end user to distinguish between them so as to be able to take the appropriate actions to counter their origins in the limit of possible.

A long list of solutions and tools have been proposed over the years to shed light on some of these problems, e.g., SpeedTest, MobiPerf, RTR - NetTest, ACQUA, WiFi Analyzer of MS Windows, WiFi Scanner of Apple, QoE Doctor. These tools and many others contribute to answering the questions of the end user and the network and service providers, but they are on one side limited to the specific problem to which they are designed thus requiring the user to install and master all of them, and second, they are in an important part of them intrusive thus requiring the installation of a list of tools, and the injection of traffic into the network thus causing overload on a network which is already loaded at the moment of the problem.

In this project, we want to explore a new approach consisting in leveraging the wealth of data available in the end user device as a result of her/his normal activity. These data, collected at almost no cost, is shaped according to what is going on inside the network, in the device of the user, and on the other side of the service provider. For example, congestion will result in an increase in the packet delay, the loss rate of packets and a drop in the network throughput. Saturation of the device shows a high CPU load and/or memory usage, whereas the saturation of the server on the other side results in part of the traffic exchanged with this server considerably delayed whereas the rest of the traffic behaving normally. All these signatures and others exist together, and the challenge is in identifying and isolating them from each other and in building appropriate classifiers using machine learning techniques in an effort to understand the origins behind any problem causing the drop in the internet service performance and the end-user Quality of Experience.

PFE objectives:

This PFE will address the problem by focusing on web browsing as a service and leveraging the wealth of measurements that can be collected from within the browser while browsing the Web. A long list of these measurements is available in our browser as the time to connect to the server, the time to download the DOM (Data Object Model) describing the page, the time to load the page, and so on. Information on the web page itself is also available as the number of objects and the size of each object. In addition, other information regarding the machine itself as its CPU load and memory usage can also be available. All this information if studied together can help shedding light on the performance of the underlying network and pointing to the origins of any service degradation.

The PFE will start reviewing the state-of-the art and establishing the list of measurements that can be collected from within the browser, and the information that can be extracted from these measurements. These measurements will be then categorized based on the aspects they cover (device, network, etc.). The PFE will then move to setting up a web extension to collect these measurements in the wild. Based on this web extension, we will plan experiments to crawl the web under controlled network conditions where we know the ground truth about the network. Setting up these experiments can go well beyond the PFE. An internship will then follow if results are satisfied to run these experiments and analyse the obtained data towards a solid understanding of the link that exists between network performance and browse-level measurements. The final objective of this project, in addition to the lessons learned and the models developed, is to transform the web extension into a light-weight browser-level non-intrusive network monitoring and troubleshooting tool.

4- Universal detectors or Counter-Detectors: efficiency vs. expressiveness

Supervisor: F. Mallet and G. Zholtkevych, Frederic.Mallet@univ-cotedazur.fr, http://www-sop.inria.fr/members/Frederic.Mallet/

Where?

Place of the project: Inria, Lagrange

Team: Kairos

Web page: https://team.inria.fr/kairos/

Description: The Clock Constraint Specification Language (CCSL) deals with logical clocks and logical time to specify the expected behavior of critical systems. The game is to find sets of valid schedules that satisfy a set of constraints. Most of the time, there are an infinite number of valid schedules and then selective the "most adequate" one requires the definition of additionnal, non-functional constraints, and an effective method to explore this infinite set of solutions.

Universal detectors for safe and recursively enumerable schedules gives a maximum expressiveness, even though there is no efficient method to deal with those universal detectors. Counter-detectors appear as a good alternative to get efficient methods, however they have reduced expressiveness. The reducibility of the violation recognizing problem using a counter-detector to the solvability problem of the Diophantine system calls into question such an impression. For example, the Diophantine approach explains the necessity of having bounded constraints.

The goal of the work is study various detectors of the litterature and propose a method to adapt the detector to the requirements. Determining the class of primitive detectors and the different types for their assembly in the detector category is a first step. The determined class and the assembly methods would give us a compositional theory of detectors. Later, it should give a good classification of clock constraints, which will assess the difficulty of recognizing violations.

Frédéric Mallet, Charles André, and Robert de Simone. CCSL: specifying clock constraints with UML/Marte. Innovations in Systems and Software Engineering, 4(3):309–314, 2008.
Judith Peters, Robert Wille, Nils Przigoda, Ulrich Kühne, and Rolf Drechsler. A generic representation of CCSL time constraints for UML/MARTE models. In 52nd Annual Design Automation Conference, DAC, pages 122:1–122:6. ACM, June 2015.
Ling Yin, Jing Liu, Zuohua Ding, Frédéric Mallet, and Robert de Simone. Schedulability analysis with CCSL specifications. In Pornsiri Muenchaisri and Gregg Rothermel, editors, 20th Asia-Pacific Software Engineering Conference, APSEC, pages 414–421. IEEE Computer Society, December 2013.
Min Zhang, Feng Dai, and Frédéric Mallet. Periodic scheduling for MARTE/CCSL: theory and practice. Sci. Comput. Program., 154:42–60, 2018.
Min Zhang, Fu Song, Frédéric Mallet, and Xiaohong Chen. SMT-based bounded schedulability analysis of the clock constraint specification language. In Reiner Hähnle and Wil M. P. van der Aalst, editors,Fundamental Approaches to Software Engineering, FASE/ETAPS, volume 11424 of Lecture Notes in Computer Science, pages 61–78. Springer, April 2019.

5- Verification environment for the behavior of communicating systems

Who?

Name: Eric Madelaine

Mail: eric.madelaine@inria.fr

Web page: http://www-sop.inria.fr/members/Eric.Madelaine/

Together with: : Rabea Ameur-Boulifa, Telecom Paris : Rabea.Ameur-Boulifa@telecom-paris.fr

Where?

Place of the project: INRIA

Address: Sophia-Antipolis (06), France

Team: KAIROS

Web page: https://team.inria.fr/kairos/

Pre-requisites if any:

The student should have knowledge of and interest in software development, in particular on model-driven development (MDE) platforms, in a Java environment. A taste for distributed or concurrent systems, logic, and / or formal methods will be appreciated.

Description:

Context: The problem of verifying the behavioral properties of software, and in particular of communicating systems, despite the progress of techniques and software tools, still comes up against problems of scaling up. This is particularly true when we are interested in realistic, large (or even industrial) systems, taking into account parameters of various types. The model known as “open networks of synchronized parameterized automata (Open pNets)” [Forte16, Avocs18] is a semantic model describing in a symbolic and hierarchical manner the behaviors of communicating systems, where the word “open” means that some of the components of the system are unknown. The explicit consideration of the data, and the symbolic nature of the behaviors allows to describe in a finite (and compact) way states spaces that would otherwise be infinite.

The originality of compositional verification is that behavioral properties can be proven on these (small) open systems, which will then be instantiated, and composed, to constitute a complete system. The semantics of an Open pNet is an "open", generally finite, symbolic automaton, which makes it possible to define finitely terminated analysis algorithms (state space generation, model-checking, equivalence). The magic comes from the fact that the relationships between the data (parameters) of the systems will be treated separately, by techniques called "Satisfiability Modulo Theory (SMT)" which today benefit from very powerful analysis engines.

In recent years we have developed theory, then algorithms, to test several types of equivalence between Open pNets, and typically to prove that an implementation of a component or a subsystem is equivalent to its specification. We have also developed a "pNet editor" and a concrete formalism allowing the user to directly develop his examples in an MDE environment.

Objectives of the project:

The trainee, after familiarization with the subject and our development environments, will:

- set up a high-level interface allowing the coding of pNets-based systems and the execution of the various analysis algorithms available to us,

- package the entire toolbox in the form of plugins that can be easily distributed and deployed by our users, typically in an academic environment,

- if time allows, validate and document all the tools thus assembled, as well as a library of examples.

The PFE may be continued as a full internship if both parties agree…

Useful Information/Bibliography:

[FORTE'16] A Theory for the Composition of Concurrent Processes, DOI: 10.1007/978-3-319-39570-8_12, extended version: https://hal.inria.fr/hal-01299562

[AVOCS'18] "Using SMT engine to generate symbolic automata", extended version: https://hal.inria.fr/hal-01823507v1

[PEPM’20] Symbolic Bisimulation for Open and Parameterized Systems - Extended version, 2020, https://hal.inria.fr/hal-02376147

6- The study of spectra of random graphs

Who?

Name: Konstantin Avrachenkov and Maximilien Dreveton

Mail: K.Avrachenkov@inria.fr and Maximilien.Dreveton@inria.fr

Web page:

https://www-sop.inria.fr/members/Konstantin.Avratchenkov/me.html

https://maximiliendreveton.fr/

Where?

Place of the project:

Address: Inria SAM, 2004 Route des Lucioles, Sophia Antipolis

Team: NEO

Web page:

https://team.inria.fr/neo/presentation/

Pre-requisites if any: A good knowledge of probability theory and/or linear algebra is desirable, Python programming

Description:

Many tasks in machine learning, data mining and network analysis rely on the properties of the spectrum of a graph. Some examples of such tasks are graph clustering, network embedding, metric learning, graph matching and graph generation. The first goal of this internship is to survey the state of the art, available analytical descriptions of random graph spectra. Some available results are described e.g., in [1,2,3]. Then, the second goal of the internship is, building upon [4,5], to extend analytical results from homogeneous to inhomogeneous or block settings.Numerical experiments with random graph models are also expected either to confirm new theoretical results or to construct working hypotheses.

Bibliography:

[1] Arnold, L. (1967) On the asymptotic distribution of the eigenvalues of random matrices. Journal of Mathematical Analysis and Applications, 20(2), 262–268.

[2] McKay, B. D. (1981). The expected eigenvalue distribution of a large regular graph. Linear Algebra and its Applications, 40, 203-216.

[3] Bordenave, C. (2008). Eigenvalues of Euclidean random matrices. Random Structures & Algorithms, 33(4), 515-532.

[4] Avrachenkov, K., Cottatellucci, L., & Kadavankandy, A. (2015). Spectral properties of random matrices for stochastic block model. In Proceedings of 2015 IEEE 13th WiOpt Conference (pp. 537-544).

[5] Avrachenkov, K., Bobu, A., & Dreveton, M. (2021). Higher-order spectral clustering for geometric graphs. Journal of Fourier Analysis and Applications, 27(2), 1-29.

7- How Wi-Fi SSIDs and Bluetooth device names reveal political opinions

Who?

Name: Arnaud Legout

Mail: arnaud.legout@inria.fr

Telephone: +33 4 92 38 78 15

Web page: http://www-sop.inria.fr/members/Arnaud.Legout/

Where?

Place of the project: Inria Sophia Antipolis

Address: 2004 route des Lucioles

Team: DIANA

Web page: https://team.inria.fr/diana/

Pre-requisites if any: Python, machine learning is a plus

Description:

Wi-Fi Access points and Bluetooth devices broadcast their IDs tens of meters around them. However, this information is considered public and can be measured by apps or companies is order to build databases. Such ID have never been studied as a way to express opinion. But recent research suggest that we can detect political opinion from such ID. This is a huge privacy issue that is overlooked.

The ElectroSmart project collected during the past 5 years hundred of millions of Wi-Fi and Bluetooth IDs worldwide. The PFE student will have to analyze the kind of private information that is disclosed by the IDs and correlate them with trends in social networks and news sources.

As a first step, the student will explore occurrence of terms related to Trump election and to the Covid19 pandemic. Then student will correlate the temporal appearance of such terms with social networks and news site to understand how such opinion propagate in Wi-Fi and Bluetooth IDs.

The student will have the possibility to work with real world unique data.

This PFE can be continued by an internship and a Ph.D. thesis for excellent students.

8- Coud we get rid of 5G with wifi?

Who?

Name: Arnaud Legout / Damien Saucez

Mail:arnaud.legout@inria.fr /damien.saucez@inria.fr

Telephone: +33 4 92 38 78 15 / +33 4 89 73 24 18

Web page:http://www-sop.inria.fr/members/Arnaud.Legout/

https://team.inria.fr/diana/team-members/damien-saucez/

Where?

Place of the project: Inria Sophia Antipolis

Address: 2004 route des Lucioles

Team: DIANA

Web page:https://team.inria.fr/diana/

Pre-requisites if any: Python, machine learning is a plus

Description:

More than ever we are in need for continuous high bandwidth - low latency internet connectivity, even when we are on the move. 5G has been proposed to offer such great connectivity. Unfortunately the adoption of the technology takes time as it requires operators to deploy new antennas and end users to buy new equipment.

In this work we will determine if in densely populated areas we could leverage WiFi access points to provide high quality internet connectivity to everyone. To answer this question, we will study the ElectroSmart project dataset. The ElectroSmart project collected during the past 5 years hundreds of millions of Wi-Fi and cellular signals worldwide. More precisely, we will determine if whenever users observe cellular connectivity they also observe good-enough wifi signals. The project is two-fold. On the one hand we will define what is a “good-enough" wifi signal w.r.t. cellular signal. On the other hand, we will determine if combining all wifi signals in large areas can be as effective as using cellular technology.

The student will have the possibility to work with real world unique data.

9- Federated Learning for IoT Devices

#SUPERVISORS

Name: Giovanni Neglia, Alain Jean-Marie, Othmane Marfoq

Mail: {firstname.familyname}@inria.fr

Telephone:

Web page: www-sop.inria.fr/members/Giovanni.Neglia/

Where?

Place of the project: Inria

Address: 2004 route des Lucioles

Team: NEO

Web page: https://team.inria.fr/neo/

#PRE-REQUISITES

The student should have good programming skills in Python to be able to run experiments with PyTorch. The project can lead to a theoretically-oriented internship to prove convergence and optimality of the proposed algorithms. For this reason, the student should also have good analytical skills and enjoy mathematical reasoning along the lines of the course "Machine Learning: Theory and Algorithms."

# DESCRIPTION

Federated learning (FL), “involves training statistical models over remote devices or siloed data centers, such as mobile phones or hospitals, while keeping data localized” [1] because of privacy concerns or limited communication resources. FL is at the core of many machine learning models running on smartphones like Google keyboard [2]. FL algorithms (e.g., FedAvg [3]) train a common machine learning model through multiple rounds: at each round a central orchestrator sends the current model to (a subset of) the clients, each client updates the model using its own local dataset to compute a stochastic gradient and then sends the updated model to the orchestrator, where all updates are aggregated.

Most of the FL algorithms consider that the client's local dataset does not change over time. In the case of IoT devices, new data is continuously generated and the device may be able to store only a part of it. As a result, some devices with high data generation rates and/or low storage capacity may have completely different datasets from one round to the other, while others may have their local datasets (almost) unchanged. Intuitively, when the orchestrator aggregates the updates, it should give higher weight to devices with more fresh data, than to devices reusing old data.

The first goal of this project is to overview related work on FL [4,5,6], combinations of online and batch learning [7,8,9], and the general framework of online to batch conversion [10, chapter 5]. The student can find useful the following surveys on online learning [11,12] and on FL [1,13]. The second goal of this project is to propose and test some heuristics for FL in the scenario of interest using a PyTorch framework developed in NEO team.

# REFERENCES

[1] Tian Li, et al. "Federated learning: Challenges, methods, and future directions." IEEE Signal Processing Magazine 37.3 (2020): 50-60.

[2] Andrew Hard et al. "Federated Learning for Mobile Keyboard Prediction." arXiv preprint arXiv:1811.03604 (2018).

[3] Jakub Konečný, et al. "Federated optimization: Distributed machine learning for on-device intelligence." arXiv preprint arXiv:1610.02527 (2016).

[4] Fernando Casado et al. "Federated and continual learning for classification tasks in a society of devices." arXiv preprint arxiv:2006.07129 (2020).

[5] Anastasiia Usmanova et al. "A distillation-based approach integrating continual learning and federated learning for pervasive services." arXiv preprint arXiv:2109.04197 (2021).

[6] Sannara Ek et al "Evaluation of Federated Learning Aggregation Algorithms Application to Human Activity Recognition." UbiComp-ISWC '20:

[7] A. Agarwal et al. "A Reliable Effective Terascale Linear Learning System." In: J Mach Learn Res 15.1 (Jan. 2014), pp. 1111–1133.

[8] O. Chapelle et al. "Simple and Scalable Response Prediction for Display Advertising." In: ACM Trans Intell Syst Technol 5.4 (Dec. 2014), 61:1–61:34.

[9] H. B. McMahan et al. "Ad Click Prediction: A View from the Trenches." In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’13. 2013, pp. 1222–1230.

[10] Shai Shalev-Schwartz "Online Learning and Online Convex Optimization." Foundations and Trends in Machine Learning Vol. 4, No. 2 (2011) 107–194

[11] H. Brendan McMahan "A Survey of Algorithms and Analysis for Adaptive Online Learning."

[12] Steven C.H. Hoi et al. "Online learning: A comprehensive survey." Neurocomputing, Volume 459, 2021, Pages 249-289

[13] Peter Kairouz et al. "Advances and Open Problems in Federated Learning." Foundations and Trends® in Machine Learning, Vol 14, Issue 1–2

10 - Online Algorithms with Predictions

#SUPERVISORS

Name: Giovanni Neglia, Eitan Altman, and Tareq Si Salem

Mail: {firstname.familyname}@inria.fr

Telephone:

Web page: www-sop.inria.fr/members/Giovanni.Neglia/

Where?

Place of the project: Inria

Address: 2004 route des Lucioles

Team: NEO

Web page: https://team.inria.fr/neo/

#PRE-REQUISITES

The student should have good analytical skills, a solid knowledge of algorithms, and basic programming skills in Python, C or Java. A background on optimization is a plus. The project can lead to a research internship to propose new algorithms and study their guarantees in terms of regret or competitive ratios.

#DESCRIPTION

A recent trend in networking is to apply online convex optimization [1] to design online algorithms with regret guarantees against an adversary which may arbitrarily select the input sequence. The regret is defined as the difference between the costs experienced---over the time horizon of interest T---by the online algorithm and by the optimal static solution with hindsight. If the algorithm's regret grows sublinearly with T, then the time average cost experienced by the online algorithm coincides with the cost of the optimal static solution and the algorithm is said to have no-regret. Online no-regret algorithms have been applied with success to caching problems [2,3].

Some no-regret algorithms can also exploit available predictions about the future input sequence (e.g., the future content requests in a caching network); such prediction may be provided, for example, by machine learning models trained on historical data. In the best cases, these algorithms enjoy the same regret achievable in the absence of any prediction when predictions are unreliable, and smaller regret, as the quality of the predictions improve.

The goal of this project is to adapt a specific no-regret online algorithm (Follow-the-Regularized-Leader [4]) to caching problems. The students will need to implement and test the algorithm, but also to work on deriving the regret guarantees of the proposed caching algorithm.

# REFERENCES

[1] E. Hazan, “Introduction to online convex optimization,” Found. Trends Optim., vol. 2, no. 3–4, p. 157–325, Aug. 2016.

[2] G. S. Paschos, A. Destounis, L. Vigneri, and G. Iosifidis, “Learning to cache with no regrets,” in IEEE INFOCOM 2019 - IEEE Conference on Computer Communications, 2019, pp. 235–243.

[3] T. Si Salem, G.Neglia, and S.Ioannidis,“No-Regret Caching via Online Mirror Descent,” in Proc. of IEEE ICC, 2021

[4] H. B. McMahan, “A survey of algorithms and analysis for adaptive online learning,” J. of Machine Learning Res., vol. 18, pp. 1–50, 2017.

11 - Federated learning on heterogeneous systems

SUPERVISORS

Names: Chuan Xu (chuan.xu@inria.fr)

Web pages: https://sites.google.com/view/chuanxu

LOCATION

Inria Sophia-Antipolis Méditerranée

Address: 2004 route des Lucioles, 06902 Sophia Antipolis

Team: COATI

Webpage: https://team.inria.fr/coati/

Federated learning (FL) enables a large number of IoT devices (mobiles, sensors) cooperating to learn a global machine learning model while keeping the devices’ data locally. For example, Google has applied FL in their application Gboard to predict the next word that the users would enter on their smartphones [1].

The traditional algorithms for FL (e.g., FedAvg [2], FedProx [3]) assume that the involved devices have the same storage and computation capacities, i.e., every device holds a neural network of the same architecture and spends the same amount of computational load during training. However, due to the variability in hardware (GPU, CPU, RAM) of these federated devices [4], this assumption is no more realistic which forces the traditional algorithms to either drop the low-tier devices [5] or limit the global model’s size to accommodate the weakest devices [6]. These adapted strategies will either bias the global model or degrade the performance of the model.

To mitigate the above FL issues with heterogeneous devices, recently, new algorithms were proposed to efficiently train multiple neural networks of different sizes in a federated network.

In [7], the authors propose a framework called Fjord where the neural network is pruned by channels to generate nested submodels of different sizes which can fit into heterogeneous devices. An FL algorithm is designed and shown experimentally to outperform the state-of-the-art baselines. A similar idea can be found also in [8]. On the other hand, networks with early exits [9] can be seen as another prominent approach for this problem, although their original motivation is for faster inference in an IoT environment. The early-exit networks consist of a backbone architecture with additional exit heads (or classifiers) plugged at different depths. Inspired by the FL algorithm in Fjord, we have proposed a way to train these early-exit networks in a federated network (ongoing work).

In this internship, the student is required first to acquire knowledge on federated learning and understand ideas of Fjord and networks with early exits. He/She needs to implement the methods mentioned above using PyTorch and compare their performances.

PREREQUISITES

We are looking for a candidate with strong coding experience in Python for a machine learning task.

REFERENCES

[1] Hard, Andrew et al, Federated Learning for Mobile Keyboard Prediction. arxiv: 1811.03604, 2019

[2] McMahan et al, Communication-Efficient Learning of Deep Networks from Decentralized Data, AISTATS 2017, pages 1273-1282

[4] Tian Li et al, Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, pages 50-60, 2020

[5] Kairouz, P et al, Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977, 2019.

[6] Caldas, S. et al, Expanding the Reach of Federated Learning by Reducing

Client Resource Requirements. In NeurIPS Workshop on Federated Learning for Data Privacy and Confidentiality, 2018b.

[7] FjORD: Fair and Accurate Federated Learning under heterogeneous targets with Ordered Dropout, Samuel Horvath, Stefanos Laskaridis, Mário Almeida, Ilias Leondiadis, Stylianos I. Venieris, N. Lane

[8] Enmao Diao et al, HeteroFL: Computation and Communication Efficient Federated Learning for Heterogeneous Clients, ICLR 2021.

[9] Enzo Baccarelli et al, Optimized training and scalable implementation of Conditional Deep Neural Networks with early exits for Fog-supported IoT applications, Information Sciences

Volume 521, June 2020, Pages 107-143.

12- Efficient Monitoring Service for 5G Wireless Access Networks

Who?

Name: Walid Dabbous & Thierry Turletti & Navid Nikaein

Mail: walid.dabbous@inria.fr & thierry.turletti@inria.fr & navid.nikaein@eurecom.fr

Phone: 0492387718

Web pages: https://team.inria.fr/diana/team-members/walid-dabbous/ & https://team.inria.fr/diana/team-members/thierry-turletti/ & https://www.eurecom.fr/fr/people/nikaein-navid

# Where? Place of the project: Inria

Address: 2004 route des Lucioles, 06902 Sophia Antipolis

Team: Diana project-team

Web page: https://www.inria.fr/equipes/diana

# Pre-requisites: RF Communications, Proficient programming skills in C/C++/Python and Linux. Experience with the cellular networks is desired, but not required.

# Description:

Digital Infrastructures as the future Internet, constitutes the cornerstone of the digital transformation of our society. Thanks to such infrastructures, new unforeseen applications will run on mobile devices with stringent latency and throughput requirements. To fulfill such requirements, where large amount of data are expected, different big cellular network vendors (e.g., Orange, Vodafone, T-Mobile or Telefonica) have joined their efforts and formed the ORAN alliance [1]. The ORAN alliance has defined a Near-Real Time Controller to dynamically control and monitor the cellular network.

The goal of this project is to monitor and analyze 5G wireless access networks. We will use the SDN-based Near-Real Time Controller developed at Eurecom running in the most advanced open-source 5G cellular network platform (i.e., OpenAirInterface [2]), which is capable of connecting and transferring data with 5G mobile devices. The student will develop a library to retrieve a large amount of parameters generated by this platform to control and monitor the traffic flows of different applications using real hardware (i.e., antennas and mobile phones).-

If time permits it, we will also explore data-driven Machine Learning and Artificial Intelligence algorithms that control the cellular network to achieve stringent latencies, improving the mobile phone’s user experience.

# Work plan:

The student will start by a review of the various 5G parameters to monitor and analyze following O-RAN architecture [1].

Then, she/he will develop an application that could store and analyze the 5G data streams using the controller APIs so as to (a) create the 5G network topology showing the distribution of the base station and connected terminal, and (b) compute the performance on per base station and per terminal in terms of data rate and resource usage.

This work is proposed in the context of the Slices-RI European project (https://slices-ri.eu/) including both INRIA and Eurecom partners. SLICES is a flexible platform designed to support large-scale, experimental research focused on networking protocols, radio technologies, services, data collection, parallel and distributed computing and in particular cloud and edge-based computing architectures and services.

This PFE study may be continued in an internship and a PhD for excellent students.

# References:

[1] https://www.o-ran.org/membership

[2] https://openairinterface.org/

13- Data acquisition and collection in harsh environment

Who?

Name: Christelle Caillouet and David Coudert

Mail: christelle.caillouet@inria.fr, david.coudert@inria.fr

Web page: http://www-sop.inria.fr/members/Christelle.Molle-Caillouet, http://www-sop.inria.fr/members/David.Coudert

Where?

Place of the project: Inria

Address: 2004 route des Lucioles, 06903 Sophia Antipolis

Team: Coati

Web page: https://team.inria.fr/coati/

# Pre-requisites if any: Combinatorial optimization, Algorithmics, Programming, Wireless networks

# Description:

Data collection is at the heart of the integrated management of road infrastructures and engineering structures. However, by definition, this data collection is carried out in very restrictive environments (e.g.: reduced accessibility), or even hostile environments (e.g.: weather conditions, luminosity, humidity, ...). The sensors are all of different natures with heterogeneous size, sensitivity and communication paradigms, providing heterogeneous data in size, type and frequency of acquisition. Collecting this data is a real challenge that requires the use of agile and adaptive communication protocols, the deployment of fleets of autonomous robots, and the planning of service vehicle routing.

The subject of the project combines the two following problems.

1. Sensors deployment : Given application needs and technical constraints, the goal is to determine the most suitable locations for each data acquisition. This study will take into account the business needs but also the physical deployment constraints (accessibility, radio environment, possibility of power supply or ambient energy recovery) and needs for the data collection.

This will have an impact on the choice of the most suitable means to collect the data according to the sensors locations, the necessary frequency and the collection costs: radio collection, multi-hop, potential intervention of robots/drones, etc.

2. Use of drones to collect monitoring data : In this part we will seek to design self-deployment techniques to allow each entity of a fleet of autonomous robots (either ground or drones) to know where to move, how to move, and this while keeping the connectivity with each other and carrying out the requested task. These local and adaptive algorithms will take as input the application constraints of the place in which the data must be collected (via pipes, at height under bridges, at different points of a dam, etc.).

14- Elastic datastream with stateful microservices

Name: Fabrice Huet and Françoise Baude

Mail: fabrice.huet@univ-cotedazur.fr, Francoise.Baude@univ-cotedazur.fr

Web page: https://sites.google.com/site/fabricehuet/home, https://www.i3s.unice.fr/~baude/newIndex.html

Where?

Place of the project: I3S Laboratory

Address: 2000 route des lucioles

Team: Scale

Web page: https://scale-project.github.io/

# Description:

Event streaming and event driven microservices are gaining more attention for architecting scalable cloud software and data systems. Event driven microservices communicate asynchronously over distributed event bus (also called distributed event broker). In event driven microservice architectures a producer microservice creates an event and pushes it into a distributed event broker. On the other side, a consumer microservice pulls the event out of the broker and performs the required business logic on the event. Minimizing the time an event spends in the waiting queue and its processing time by the consumer microservice is critical to achieve low response time for high percentile of events.

There has been a large effort by researchers to tackle microservices autoscaling aiming at granting performance service level agreement SLA [1,2,3]. Autoscaling of consumer event driven microservices necessitates a synchronization among the service replicas. This is needed to distribute the load of the events waiting in the queues among the scaled microservice replicas [4,5,6].

Most of the work published so far assumes stateless services. Hence, the synchronization when adding or removing replicas can be performed quickly. In

the case of stateful services, the replicas have to manage their state when down or up scaling. This might requires added synchronization between them or the use of some external storage.

The goal of this pfe is to investigate recent published works on stateful scaling of microservices. If time permit, some experiments will be performed to evaluate the cost of existing solutions.

#References

[1] G. Yu, P. Chen and Z. Zheng, "Microscaler: Cost-effective scaling for microservice applications in the cloud with an online learning approach," IEEE Transactions on Cloud Computing, 2020.

[2] B. Choi,, C. Byungkwon, J. Park, C. Lee and D. Han, "pHPA: A Proactive Autoscaling Framework For Microservice Chain," in 5th Asia-Pacific Workshop on Networking (APNet 2021). Association for Computing Machinery, Inc, 2021.

[3] G. Yu, P. Chen, H. Chen, G. Zijie , Z. Huang, L. Jing, T. Weng, X. Sun and X. Li, "MicroRank: End-to-End Latency Issue Localization with Extended Spectrum Analysis in Microservice Environments.," Proceedings of the Web Conference 2021, pp. 3087-3098, 2021.

[4] N. Narkhede, G. Shapira and T. Palino, Kafka: the definitive guide: real-time data and stream processing at scale, O'Reilly Media, Inc., 2017.

[5] G. Shapira , T. Palino, R. Sivaram and K. Petty, Kafka: The Definitive Guide Real-Time Data and Stream Processing at Scale, second edition, O’Reilly Media, Inc., 2021.

[6] S. Blee-Goldman, "From Eager to Smarter in Apache Kafka Consumer Rebalances," Confluent, 11 5 2020. [Online]. Available: https://www.confluent.io/blog/cooperative-rebalancing-in-kafka-streams-consumer-ksqldb/. [Accessed 11 9 2021].

15- On the digital footprint of Internet Users

Advisors: Dino Lopez Pacheco <dino.lopez@univ-cotedazur.fr>, Guillaume Urvoy-Keller <urvoy@univ-cotedazur.fr>

Description :

The digital footprint of computer networks and especially the Internet (from the cloud to the end user, with all intermediate AS networks) represents 4% of greenhouse gas emission (GHG) of the world [Shift1]. Forecasts for the following years estimate that it could reach 8%, which is equivalent to the total worldwide road traffic.

In France, think-tanks like the Shift Project [Shift1] and official organization like ARCEP [ARCEP] have tackled this issue, producing reports, e.g. on the crucial impact of video traffic [Shift2].

In the SigNet team, we recently started studies aiming at evaluating the digital footprint of the end user. We have devised a tool, following the idea of the Carbonalyzer [Carbo] Web plugin (also a mobile app). The current version of the tool works on MacOSX and enables to capture, with a minimal impact on user experience, her traffic and determine the networks (notional, European, International) that conveyed this traffic, each network having a specific energetic footprint; and send regularly summary to a centralized server.

The objectives of this PFE are to:

Extend the server-side data visualization capabilities of the tool;
Develop Linux and Windows versions of the client;
Refine the energetic models that relate the number of exchanged bytes to the electrical consumption [Modèle];
Carry longitudinal studies of specific users and mining the results along specific dimensions, e.g application or service level (which application consumes the most, depending on its servers’ localization) or access network (fixed vs. mobile).

This PFE will start with a study of the state of the art. Next, the student will have to test the server and client side of the tool and carry initial measurements. In parallel, extensions (Windows version, graphical interface) will be developed.

A 6-month internship is possible after this PFE.

Expected skills Python and client server programming. Computer networks.

References:

[Shift1] https://theshiftproject.org/article/pour-une-sobriete-numerique-rapport-shift/

[Shift2] https://theshiftproject.org/article/climat-insoutenable-usage-video/

[ARCEP] https://www.arcep.fr/uploads/tx_gspublication/reseaux-du-futur-empreinte-carbone-numerique-juillet2019.pdf

[Carbo] https://theshiftproject.org/carbonalyser-extension-navigateur/

[Modèle] Coroama, Vlad C., and Lorenz M. Hilty. "Assessing Internet energy intensity: A review of methods and results." Environmental impact assessment review 45 (2014): 63-68.

16- Bayesian graph clustering in degree-corrected block models

Who?

Name: Maximilien Dreveton and Konstantin Avrachenkov

Mail: Maximilien.Dreveton@inria.fr and K.Avrachenkov@inria.fr

Web page: https://maximiliendreveton.fr/

https://www-sop.inria.fr/members/Konstantin.Avratchenkov/me.html

Where?

Place of the project:

Address: Inria SAM, 2004 Route des Lucioles, Sophia Antipolis

Team: NEO

Web page:

https://team.inria.fr/neo/presentation/

Pre-requisites if any: A good knowledge of probability theory

and/or linear algebra is desirable, Python programming

Description:

Networks represent systems with pairwise interactions, with applications in many fields (sociology, biology, internet, physics, etc.). Most networks display community structures in which network individuals (nodes) are joined together in densely connected groups, between which there are only looser connections. Graph clustering aims to recover the nodes’ community assignment by observing the graph interactions (edges). In this project, the student will explore Bayesian clustering methods. Those methods typically achieve good performance, avoid overfitting, and are computationally efficient (using Monte-Carlo simulations).

The student can start with the reference [1], describing the stochastic block model and its degree-corrected variant. Reference [2] provides a review of Bayesian clustering, and references [3]-[4] can be used as additional material if needed. Finally, the documentation of graph-tool [5] provides a good entry point to the topic.

If time permits, the student will examine various regularization approaches for the choice of the number of clusters.

Bibliography:

[1] Karrer, B., & Newman, M. E. (2011). Stochastic blockmodels and community structure in networks. Physical review E, 83(1), 016107.

[2] Peixoto, T. P. (2019). Bayesian stochastic blockmodeling. Advances in network clustering and blockmodeling, 289-332.

[3] Peixoto, T. P. (2014). Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models. Physical Review E, 89(1), 012804.

[4] Peixoto, T. P. (2020). Merge-split Markov chain Monte Carlo for community detection. Physical Review E, 102(1), 012305.

[5] https://graph-tool.skewed.de/static/doc/demos/inference/inference.html

17- Learning sparse autoencoder networks for image coding

Contact: Michel Barlaud (barlaud@unice.fr), Marc Antonini (am@i3s.unice.fr)

Location: I3S laboratory, 2000 route des Lucioles, Bât. Algorithmes/Euclide B, 06903 Sophia Antipolis

Subject

Recently, researchers at Twitter proposed an autoencoder network structure for image compression [1]. The compression performance of this autoencoder is better than the JPEG 2000 image coding standard. However, the memory requirements to store the network in a smartphone and the computational cost are too high to consider real embedded applications.

In order to solve this problem, we proposed to train sparse variational autoencoders (VAE) in the I3S lab, in the MediaCoding team [3]. Sparse VAE have the property of reducing the number of weights needed for learning, thereby reducing the cost of the network.

VAEs have found many applications to learn the latent distribution of high-dimensional data. However, they assume Gaussian distributions, which gives a poor approximation of the latent distribution. The I3S/MediaCoding team has recently developed a new efficient coding method using a sparse VAE that relaxes the Gaussian assumption [2]. Our new method involves a novel nonparametric supervised automatic coder [2], but it still needs some improvement for being competitive with the image coding standard.

This TER project aims to adapt our new autoencoder to this image encoding challenge. The student will provide a python code and compare different solutions. This work can be continued in a Master internship depending on the work provided by the student and his motivation.

The student will work at I3S Laboratory.

Skills

Background in Machine learning and Deep Learning (DNN, Autoencoders...
Coding skills in Python, Pytorch
Fluent English

References

[1] Lucas Theis, Wenzhe Shi, Andrew Cunningham,and Ferenc Huszár,
Lossy image compression with compressive autoencoders, arXiv stat.ML/1703.00395, 2017.

[2] M. Barlaud and F. Guyard, Learning a sparse non-parametric supervised autoencoder, ICASSP 2021 Toronto Canada.

[3] M. Barlaud, Learning sparse structured auto-encoder for image coding, CORESA 2021 Sophia Antipolis.

18- Improving QoE estimation with content level information in video streaming applications

Who?

Name: Chadi Barakat (Inria, Diana project-team)

Mail:Chadi.Barakat@inria.fr

Web page:https://team.inria.fr/diana/chadi/

Where?

Place of the project: Inria Sophia Antipolis

Address: 2004, route des Lucioles, 06902 Sophia Antipolis, France

Team: Diana

Web page:https://team.inria.fr/diana/

Prerequisites if any: Standard knowledge in network programming and machine learning

Description:

Context:

Video streaming is the dominant contributor of today’s Internet traffic. Consequently, estimating Quality of Experience (QoE) for video streaming is of paramount importance for network operators. The QoE of video streaming is directly dependent on the network conditions (e.g., bandwidth, delay, packet loss rate) referred to as the network Quality of Service (QoS). This inherent relationship between the QoS and the QoE motivates the use of supervised Machine Learning (ML) to build models that map the network QoS to the video QoE, such as the video stalls, the video resolution, or more generally the end-user QoE modelled with the Mean Opinion Score. This was the main focus of the ACQUA project at Inria (http://project.inria.fr/acqua/), that with the help of network controlled experimentation, has targeted the establishment of such models linking network-level QoS to either application-level QoS or directly to end-user QoE. The models built were shown to provide a good level of accuracy for the majority of targeted metrics, still the study highlighted a dependency between the prediction accuracy and the content of the video itself, with the latter spanning a wide range of variability depending on the type of scene encoded within the video (category, motion, colours, etc). The purpose of this PFE is to update these models with content level information and evaluate the level with which one can improve their performance if video content is to be taken into consideration.

PFE objectives:

The PFE will start with reviewing the state of the art in the area of QoE modeling and understanding in particular the models and experiments that were carried out along the ACQUA project. Then, the candidate will seek for metrics able to capture the video content itself, add these metrics to the available datasets, retrain the machine learning models and evaluate the gain in prediction accuracy they allow. The candidate might need to redo parts of the video streaming experiments if needed. Overall, the objective of the PFE is to evaluate how the type of content encoded in the video stream can impact the level of QoE for the same network conditions, and vice versa, evaluate by how much one can improve the level of QoE prediction by adding some content-level information.

This PFE can continue towards an internship and even more by leveraging the content level information for a better network management and streaming experience. It can also be extended to account for further context level information such the mobile one.

Useful Information/Bibliography:

Othmane Belmoukadam, Muhammad Jawad Khokhar, Chadi Barakat, “On excess bandwidth usage of video streaming: when video resolution mismatches browser viewport“, in proceedings of the 11th IEEE International Conference on Networks of the Future (NoF 2020), Bordeaux, France, October 2020.

Muhammad Jawad Khokhar, Thibaut Ehlinger, Chadi Barakat, “From Network Traffic Measurements to QoE for Internet Video“, in proceedings of IFIP Networking, Warsaw, Poland, May 2019.

Muhammad Jawad Khokhar, Thierry Spetebroot, Chadi Barakat, “A Methodology for Performance Benchmarking of Mobile Networks for Internet Video Streaming“, in proceedings of the 21st ACM International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems (MSWIM), Montreal, Canada, October 2018.

19- Network simulator for reinforcement learning

Advisor: APARICIO PARDO Ramon

Mail: raparicio@i3s.unice.fr

Telephone: 04 92 94 27 72

Web page: http://www.i3s.unice.fr/~raparicio/

Place of the project: I3S: Laboratoire d'Informatique, Signaux et Systèmes de Sophia Antipolis

Address:2000, route des Lucioles - Les Algorithmes - bât. Euclide B, 06900 Sophia Antipolis

Team: SIGNET

Web page: http://signet.i3s.unice.fr

Pre-requisites if any:

An object oriented language, ideally C ++, but knowledge of java can be also sufficient.
Python language (recommended)
Notions of networks (and their protocols)
Notions of computer simulations, in particular discrete event simulations.

Description:

In reinforcement learning, the training process is guided by interactions with an environment. The mechanics are simple. During training, the learning algorithm makes a decision which is sent to the environment. This, in turn, processes the decision, changing the internal state of the environment and providing to the algorithm an assessment of the quality of the decision (a reward). The environment could be a real system (a robotic arm) or a simulation of that system (a simulation of a robotic arm). Within the framework of this internship, the target environment is a network of computers sending packets to each other. Since you are not a telecommunications network operator, you are forced to rely on simulation.

The objective of this internship is to design a packet network simulator that can be used as an environment for reinforcement learning. The intended platform and library for this task is the NS3 network simulator (https://www.nsnam.org/), which is programmed in C++. This simulator has a module called “ns3-gym”(https://apps.nsnam.org/app/ns3-gym/) that allows integrating the NS3 network simulator with “OpenAI Gym” (https://gym.openai.com/), a Python popular reinforcement learning toolkit.

The simulator must be able to do the following at a minimum:

Take as input a network topology (a graph) to place and connect IP network routers, for example, using the graph files from this site: http://sndlib.zib.de
Take as input a traffic matrix which controls the intensity of the generation of packets between a source <i> and a destination <j> according to the value t (i, j) of the matrix. Again, this data can be extracted from the site: http://sndlib.zib.de
Implement the traffic as UDP traffic (the simplest option).
Allow the IP routing tables of the routers to be reconfigured from instructions outside the simulator (an interface will be required, typically in Python).
Output statistics on the state of the network, in particular the number of UDP packets stored in the buffers of the output interfaces of network routers and the average packet wait times in these buffers.

20- Quantum Entanglement Routings

Advisor: APARICIO PARDO Ramon

Mail: raparicio@i3s.unice.fr

Telephone: 04 92 94 27 72

Web page: http://www.i3s.unice.fr/~raparicio/

Place of the project: I3S: Laboratoire d'Informatique, Signaux et Systèmes de Sophia Antipolis

Address:2000, route des Lucioles - Les Algorithmes - bât. Euclide B, 06900 Sophia Antipolis

Team: SIGNET

Web page: http://signet.i3s.unice.fr

Pre-requisites if any:

Languages:

Python language absolutely
Deep Learning libraries (like TensorFlow [6], Keras, rllab, OpenAI Gym)

Theory:

Machine Learning, Data Science, particularly Neural Networks theory
Classical optimisation theory (Linear Programming, Dual Optimisation, Gradient Optimisation, Combinatorial Optimization)

Technology:

Computer networking notions

Description:

In the long term, Quantum Communications promise to connect Quantum Processors placed at remote locations, giving rise to Quantum Cloud able to perform very complicated computation tasks in very shorter processing times. In the short term, Quantum Communications are applied in tasks such as cryptography key distribution or clock synchronization [1]. In both cases, the basic “operation” necessary to carry out as first step is “to quantum entangle” the source and the destination of the communication. To do this, first, we need to find a sequence of links connecting (a path) the source and the destination; second, to entangle the ending nodes of each link; finally, entangle the end-to-end path. Unfortunately, this is a probabilistic process whose result cannot be foreseen. In this project, we aim to study this so-called Quantum Entanglement Routing problem.

In a first term, we will review and [2-5] identify the most relevant entanglement routing algorithms to compare them. In a second term, we will develop our own proposal to tackle this problem.

Useful Information/Bibliography:

"Quantum Networks: From a Physics Experiment to a Quantum Network System" with Stephanie Wehner : https://www.youtube.com/watch?v=yD193ZPjMFE

M. Pant et al., “Routing entanglement in the quantum internet,” npj Quantum Inf, vol. 5, no. 1, pp. 1–9, Mar. 2019, doi: 10.1038/s41534-019-0139-x.
K. Chakraborty, F. Rozpedek, A. Dahlberg, and S. Wehner, “Distributed Routing in a Quantum Internet,” arXiv:1907.11630 [quant-ph], Jul. 2019, Accessed: Sep. 16, 2021. [Online]. Available: http://arxiv.org/abs/1907.11630
S. Shi and C. Qian, “Concurrent Entanglement Routing for Quantum Networks: Model and Designs,” in Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication, New York, NY, USA, Jul. 2020, pp. 62–75. doi: 10.1145/3387514.3405853.
C. Li, T. Li, Y.-X. Liu, and P. Cappellaro, “Effective routing design for remote entanglement generation on quantum networks,” npj Quantum Inf, vol. 7, no. 1, pp. 1–12, Jan. 2021, doi: 10.1038/s41534-020-00344-4.

21- Deep Reinforcement Learning for Video caching

Advisor: APARICIO PARDO Ramon

Mail: raparicio@i3s.unice.fr

Telephone: 04 92 94 27 72

Web page: http://www.i3s.unice.fr/~raparicio/

Place of the project: I3S: Laboratoire d'Informatique, Signaux et Systèmes de Sophia Antipolis

Address:2000, route des Lucioles - Les Algorithmes - bât. Euclide B, 06900 Sophia Antipolis

Team: SIGNET

Web page: http://signet.i3s.unice.fr

Pre-requisites if any:

Languages:

Python language absolutely
Deep Learning libraries (like TensorFlow [6], Keras, rllab, OpenAI Gym) appreciated

Theory:

Machine Learning, Data Science, particularly Neural Networks theory very recommendable
Classical optimisation theory (Linear Programming, Dual Optimisation, Gradient Optimisation, Combinatorial Optimization) appreciated

Technology:

Computer networking notions are welcome, but they are not necessary.

Description:

The application of novel techniques of machine learning, such as deep reinforcement learning [1], has gained attention of computer network community in the last years [2]. One of the problems that has been addressed is the caching of video contents in locations close to the users [3]. The caching has a twofold objective: to improve the experience of all the users regardless their location and to reduce the traffic load at the backbone networks.

Recently, Dai et al. [4] et Mittal et al [5] has shown the interest of deep reinforcement learning to learn heuristic algorithms to solve some classical NP-hard problems on graphs by combining RL with graph embedding (GE) [6], a kind of representation learning applied to graphs. GE obtains a more compacted and lower dimensional graph representation where the RL scheme can solve easier the optimization problem. The work of Mittal et al [ref] is particularly interesting because it addresses the cover set problem using this approach by proposing the GCOMB algorithm. Caching decisions can be reformulated as cover set problems.

Steps:

Phase 1: Getting familiar with the bibliography and the code of GCOMB algorithm [7]

Phase 2: To prepare the dataset to be employed: Trending YouTube Video Statistics [8]

Phase 3: Apply the GCOMB algorithm to this dataset to solve the caching problem

Phase 4: Benchmark GCOMB with other classical cover set algorithms to solve this caching problem.

Goal:

To apply the Mittal [5] approach to select the “best” videos to cache using as input the local distribution of the content popularity.

Useful Information/Bibliography:

Lil Log, Reinforcement Learning: https://lilianweng.github.io/lil-log/2018/02/19/a-long-peek-into-reinforcement-learning.html
N. Vesselinova, R. Steinert, D. F. Perez-Ramirez and M. Boman, "Learning Combinatorial Optimization on Graphs: A Survey With Applications to Networking," in IEEE Access, vol. 8, pp. 120388-120416, 2020, doi: 10.1109/ACCESS.2020.3004964. [Online]: https://arxiv.org/abs/2005.11081
Y. Wang and V. Friderikos, “A Survey of Deep Learning for Data Caching in Edge Network,” Informatics, vol. 7, no. 4, p. 43, Oct. 2020. [Online]: https://www.mdpi.com/2227-9709/7/4/43
H.Dai, E.Khalil,Y.Zhang,B.Dilkina,andL.Song,“Learning combinatorial optimization algorithms over graphs,” in Proc. Advances in Neural Inf. Process. Syst. (NeurIPS), 2017, pp. 6348–6358. [Online]: https://arxiv.org/abs/1704.01665
A.Mittal, A.Dhawan, S.Manchanda, S.Medya, S.Ranu,andA.Singh, “Learning heuristics over large graphs via deep reinforcement learning,” 2019. [Online]. Available: arXiv:1903.03332
W. L. Hamilton, R. Ying and J. Leskovec. Representation Learning on Graphs: Methods and Applications. arXiv:1709.05584, Apr. 2018.
GCOMB code: https://github.com/idea-iitd/GCOMB
Trending YouTube Video Statistics: https://www.kaggle.com/datasnaek/youtube-new

22- Human-centred Security and Privacy

Who?

Name: Karima Boudaoud, Université Nice Côte d’Azur

Mail: karima.boudaoud@unice.fr

Where?

Place of the project: Polytech’Nice Sophia

Pre-requisites if any:

Description:

The goal of this project is to conduct a systematic literature review (SLR) [(You can fin the definition of an SLR here [1]) regarding existing works using a user-centred and more generally Human-centred approach to manage Security and Privacy of applications, devices and systems. An example of these works are some studies conducted to identify usability issues regarding security tools and applications. You can find more examples here [2][3].

Keywords: Security, Privacy, usability.

Useful Information/Bibliography:

[1] https://journals.sagepub.com/doi/full/10.1177/0739456X17723971

[2] https://www.usenix.org/conference/soups2021

[3] https://eurousec2021.secuso.org/

23- Cybersecurity in Health sector

Who?

Name: Karima Boudaoud, Université Nice Côte d’Azur

Mail: karima.boudaoud@unice.fr

Where?

Place of the project: Polytech’Nice Sophia

Pre-requisites if any:

Description:

The main goal of this project is conduct a systematic literature review (SLR) [(You can fin the definition of an SLR here [1]) regarding Cybersecurity in health sector, i.e. on existing works focusing on attacks, security issues, and security solutions for health sector.

Keywords: Security, Privacy, Attacks, Pandemic, Hospitals, IoT

Useful Information/Bibliography:

[1] https://journals.sagepub.com/doi/full/10.1177/0739456X17723971

24- Cybersecurity in maritime transport

Who?

Name: Karima Boudaoud, Université Nice Côte d’Azur

Mail: karima.boudaoud@unice.fr

Where?

Place of the project: Polytech’Nice Sophia

Pre-requisites if any:

Description:

The main goal of this project is conduct a systematic literature review (SLR) [(You can fin the definition of an SLR here [1]) regarding Cybersecurity in maritime transport domain, i.e. on existing works focusing on security issues and security solutions for maritime transport domain.

Keywords: Security, Privacy, Attacks, maritime transport

Useful Information/Bibliography:

[1] https://journals.sagepub.com/doi/full/10.1177/0739456X17723971

25- Modeling and lifting into RDF of data automatically extracted from scientific papers about wheat genetics and phenotyping

Who?

Name: Catherine Faron

Mail: faron@i3s.unice.fr

Web page: https://www.i3s.unice.fr/~faron/

Co-supervisor: Franck Michel (fmichel@i3s.unice.fr, http://sparks.i3s.unice.fr/fmichel), Robert Bossy robert.bossy@inrae.fr, https://maiage.inrae.fr/en/robert-bossy

Where?

Place of the project: Building Templiers 1

Address: Campus SophiaTech

Team: WIMMICS

Web page: https://team.inria.fr/wimmics/

Pre-requisites if any: Motivation to get initiated to Knowledge Representation and to the Linked Open Data. Basics on RDF and other languages for the Linked Open Data is a plus.

Description:

This work takes place within the ANR D2KAB research project. This project aims to transform data in agronomy and biodiversity into knowledge - systematically described, interoperable, exploitable, open - and to study scientific methods and tools to exploit this knowledge for applications in science and agriculture.

Wheat is one of the oldest and most widely grown crops. It is the main source of protein for one third of the human population. Wheat cultivation is therefore a major component of food security in the world. For thousands of years, wheat producers have been selecting and hybridizing varieties to obtain the best possible properties: adaptation to the climate of a particular region, seed productivity, tolerance to drought or flooding, resistance to pests, etc. Today, the search for new seeds with desirable properties is becoming an urgent issue as climate change is disrupting the growing conditions.

Modern techniques of phenotyping and genomic screening allow better target selection and hybridization. They make it possible to obtain productive and resistant seeds without genetic modification or editing. Part of the research results in this domain is recorded in genomic databases that are easily and publicly accessible. Another part is only available in published scientific papers. It is impossible for researchers to absorb all the current scientific production (more than 4000 articles published per year). Researchers have to use text-mining, information extraction and automatic language processing methods to synthesize and extract structured information from a large number of documents.

The aim of the project is to lift into RDF the data extracted with text-mining methods from scientific articles concerning the genetics and phenotyping of wheat. The so-built knowledge graph will enable to link heterogeneous knowledge from different sources: databases and scientific literature. On a large scale, it provides uniform access to all the information needed to better target research on wheat phenotyping and genetics.

The data extracted from scientific texts consist in entities and relations between entities:

Entities are elements anchored in text frames. They are proper names or expressions that designate objects or concepts of the domain. In this project they are genes, alleles, phenotypes, genetic markers, varieties, etc.
Entities are linked to repositories shared by the research community. In this project we will focus on WTO, an ontology of wheat traits and phenotypes, and GrainGenes, the repository of known wheat genes. Entity linking that will enable to relate text annotations to data in genomic databases.
Relationships link pairs of entities and represent knowledge expressed in the text. For example, relationships will link a variety to the phenotypes it exhibits, the gene to the phenotype it controls. These relationships can be found exclusively in scientific texts and are therefore particularly valuable to researchers.

In this project, you will be given a dataset automatically extracted from abstracts in the PubMed bibliographic database using an annotation pipeline developed by the MaIAGE unit of the French National Research Institute for Agriculture, Food and the Environment (Inrae). The lifting of the data will use the XR2RML language and tool developed by WIMMICS, a joint research team between Inria and I3S/CNRS at University Côte d’Azur. It requires (1) writing mapping rules in XR2RML to transform the MAIAGE dataset into RDF, using a previously defined data model, and (2) operating some previously identified transformations on the WTO ontology used in this model. Overall, this project requires that you learn the basics of the RDF language and other related languages for the Web of Linked Open Data.

Useful Information/Bibliography:

https://www.fun-mooc.fr/en/courses/introduction-web-linked-data/

http://www.d2kab.org/

https://pubmed.ncbi.nlm.nih.gov/32634868/

https://graingenes.org, https://pubmed.ncbi.nlm.nih.gov/31210272/

https://pubmed.ncbi.nlm.nih.gov/

https://www.i3s.unice.fr/~fmichel/xr2rml_specification_v5.html

26- Using semantic graph clustering to explore study and career paths with the goal to help student and professional counselling.

Who?

Name: Nicolas Nisse

Mail: nicolas.nisse@inria.fr

Web page: http://www-sop.inria.fr/members/Nicolas.Nisse/

Co-advisors : Frédéric Giroire (frederic.giroire@inria.fr)

Where?

Place of the project: Inria Sophia Antipolis

Address: 2004 route des Lucioles, 06902 Sophia Antipolis

Team: COATI

Web page: https://team.inria.fr/coati/

Description:

The internship will be part of a larger project involving the Inria team COATI and a startup, MillionRoads. The goal of the project is to build digital tools to help students and professionals making the right choices in their study and career orientations. Career guidance is commonly recognized by governments and institutions as a key to good professional and social integration, reduction of unemployment, and inequality.

In particular, study and career changes are very frequent among students and may be the sign of or may lead to important difficulties [Gati et al (2019)] . It is thus crucial to help the students either to avoid them by guiding them to a first adequate career choice or to help them when choosing their new path [Masdonati (2017)].

We plan to use learning techniques and graph algorithms to detect course breaks, in parallel or in addition to a semantic study. The method is to use a semantic graph to represent the millions of career paths retrieved by the startup. We will then use graph clustering for the detection. Similar paths (for example, Bac S, preparatory classes, Engineering Diploma, Engineering Work) will generate many edges between similar nodes of the graph. These nodes, which are thus closely connected to each other, will therefore form clusters. Detecting breaks in the path could therefore be achieved by (i) calculating the clusters of the graph (ii) detecting the paths that change clusters. We will do a semantic study of the clusters and of the detected paths.

Scientific stakes and challenges. The career path graph is a very large graph, with several hundred million vertices (either path step nodes or semantic nodes). Such a graph is already complicated to even efficiently store in memory. Most clustering algorithms, such as the Louvain method [Louvain method 2008] which is one of the most widely used in practice, can hardly handle more than a few hundred thousand vertices. It will therefore be crucial to find an adequate method for our graph. There are statistical mechanics methods based on label propagation that are appropriate for some large graphs [Statistical mechanics 2006, Community detection 2010]. Other techniques to allow the computation of algorithms in very large graphs is to decompose the graph recursively into sub-graphs and to apply clustering methods in these sub-graphs and then recompose them together [Distributed Louvain 2018].

Keywords : graph algorithms, big data, graph algorithms, network analysis, natural language processing, semantics

References : Vincent A. Traag, Ludo Waltman, Nees Jan van Eck:

From Louvain to Leiden: guaranteeing well-connected communities. CoRR abs/1810.08473 (2018)