2019-2020

The goal of the project is to develop methods to analyse the evolution across time of a social network. We will consider as example the graph of scientific collaborations as it can be crawled freely.

The project will have two phases:

- Data collection. In the first phase, the student will use the available bibliographic research tools (SCOPUS, Web of Science, Patstat) to create data sets. One corresponding to the current situation and others corresponding to past moments. The data sets will correspond mainly to networks (annotated graphs) of scientific collaborations.

- Data analysis. In the 2nd phase, the student will analyse this data. First, they will focus on simple metrics (number of publications, number of patent applications...) and compare the evolutions across time. Then, if there is time, she will start studying the evolution of the structure of the network and will look at whether they are observing an evolution of its clustering due to the emergence of new collaborations.

The PFE will be part of a larger project on the evaluation of the impact of funding on scientific research. The project involve researchers in economics, sociology, and computer science.

The PFE can also be done in a group of two students.

The PFE may be followed by an internship for interested students.

Computational Complexity of Puzzles and Games

Who?

Name: Emanuele NATALE

Mail: natale@unice.fr

Telephone:

Web page: enatale.name

Where?

Place of the project: I3S

Address:

2004, route des Lucioles, B.P. 93

F-06902 Sophia Antipolis Cedex

Team: COATI Team (INRIA)

Web page:

Pre-requisites if any: Fundamental notions of Complexity Theory (P vs NP problem, hardness proofs)

Description:

The topic of this project is to contribute to the classification of the computational complexity of games and puzzles.

Since games and puzzles have been studied under a computational lens, researchers unearthed a rich landscape of complexity results showing deep connections between games and fundamental problems and models in computer science. Complexity of Games (CoG) is essentially a wiki of complexity results on games and puzzles which aims to serve as a reference guide for enthusiasts and researchers on the topic. The compendium indexes games and puzzles for which hardness results are known. However, the list is far from complete.

The main goal of this project is to contribute to CoG by studying papers which proves complexity-theory results on puzzles and games and by including in CoG clear expositions of such results. Hence, along the way, the student will acquire a good knowledge of NP-complete problems, computational complexity classes and other key concepts in computational complexity. Moreover, her/his work will be immediately useful to the research community interested in the complexity of games (such as the communities around the FUN and Complexity of Games conferences).

CoG also hosts related material such as implementations of hardness reductions, simulations, cool visualizations of proofs, etc. The student will be expected to work on the production of such material, for example by implementing complexity reductions in order to include them in CoG as playable games.

In a later stage of the project, the student is expected to work on novel complexity results for games that have not yet been studied.

Useful Information/Bibliography:

The student is encouraged to explore CoG at www.isnphard.com to get an idea of the topic that she/he will be learning about and contributing to.

Mining the ACQUA dataset: understanding mobile user Quality of Experience in the wild

Who?

Name: Chadi Barakat

Mail: Chadi.Barakat@inria.fr

Telephone: 04 92 38 75 96

Web page: http://team.inria.fr/diana/chadi/

Where?

Place of the project: Diana team @ Inria Sophia Antipolis

Address: 2004, route des lucioles, 06902 Sophia Antipolis

Team: Diana

Web page: http://team.inria.fr/diana/

Pre-requisites if any: Good knowledge in statistics, machine learning, and data analysis. The knowledge of Android programming is not required for this PFE, but it is more than welcome for the internship.

Description:

Context – ACQUA (http://project.inria.fr/acqua/) [1] is a framework and mobile Application for prediCting Quality of User Experience at Internet Access. It is developed by the Diana team at Inria Sophia Antipolis – Méditerranée. ACQUA presents a new way for the evaluation of the performance of Internet access. Starting from network-level measurements as the ones we often do today (bandwidth, delay, loss rate, jitter, signal strength, etc), ACQUA targets the estimated Quality of Experience related to the different applications of interest to the user without the need to run them (e.g., estimated Skype quality, estimated video streaming quality). An application in ACQUA is a function, or a model, that links the network-level and device-level measurements to the expected quality of experience. Supervised machine learning techniques are used to establish such link between measurements both at the network level and the device level, and estimations of the Quality of Experience for different Internet applications. The required data for such learning can be obtained either by controlled experiments as we did in two recent communications on Skype and YouTube Quality of Experience [2,3], or by soliciting the crowd (i.e. crowdsourcing) for combinations (i.e. tuples) of measurements and corresponding application-level quality of experience.

Today, the ACQUA dataset counts around two millions network performance tests carried out in different countries (mostly in France) and with different network access technologies. Each test contains a set of measurements, both active and passive, and can be seen as an enhanced SpeedTest. In addition, this dataset contains around 4K reports by end users on their real Quality of Experience for different types of applications, together with what ACQUA estimates as Quality of Experience for those users (only using the performance of the network). A first analysis has been carried out in [1], but a deep analysis is still missing, which is the main purpose of this PFE, and of any internship that follows.

Objectives – We seek in this study to answer the following questions:

- What is the current situation of mobile network performance, and how this performance compares to what is actually known in the literature.

- How the different measurements carried out by ACQUA correlate with each other. In particular we would like to know if we can apply any feature selection over these measurements.

- Understand how people use ACQUA and identify ways to improve the behavior of the application towards increasing the rate of measurements while reducing their overhead.

- Most importantly, understand how the reports of the users regarding their real QoE compare to the estimated QoE provided by ACQUA. In case of mismatch, which is normally to be expected given the subjective nature of the QoE, try to understand this mismatch and identify means to improve it.

- Finally, start looking at the troubleshooting aspect of the problem and propose methods to identify the root cause of QoE degradation. We are here concerned by both types of methods (i) network-wide methods that correlate the measurements coming from different users to detect and localize network anomalies, and (ii) methods that enhance the ACQUA application with more measurements at the physical layer (CQI and SNR in particular) that complement what is already collected by the application and shed light on the local network conditions that can lead to bad QoE scenarios.

To be noted here that the last point, which requires Android programming skills to add the new wireless features to the ACQUA application, will be fully developed within an internship that follows the PFE if the results of the latter are satisfactory and the conditions of a good work are satisfied.

Useful Information/Bibliography:

[1] Othmane Belmoukadam, Thierry Spetebroot, Chadi Barakat, “ACQUA: A user friendly platform for lightweight network monitoring and QoE forecasting“, in proceedings of the 3rd International Workshop on QoE Management, Paris, February 2019.

[2] Thierry Spetebroot, Salim Afra, Nicolas Aguilera, Damien Saucez, Chadi Barakat, “From network-level measurements to expected Quality of Experience: the Skype use case“, in proceedings of the IEEE International Workshop on Measurement and Networking (M&N), Coimbra, Portugal, October 2015.

[3] Muhammad Jawad Khokhar, Nawfal Abbasi Saber, Thierry Spetebroot, Chadi Barakat, “On active sampling of controlled experiments for QoE modeling“, in proceedings of ACM SIGCOMM Workshop on QoE-based Analysis and Management of Data Communication Networks (Internet-QoE), Los Angeles, August 2017.

Datatype axiomatisation for SMT engines

Who?

Name: Eric Madelaine

Mail: eric.madelaine@inria.fr

Telephone: +33 6 87 47 99 80

Web page: http://www-sop.inria.fr/oasis/Eric.Madelaine/

Where?

Place of the project: INRIA

Address: Sophia-Antipolis

Team: KAIROS

Web page: https://team.inria.fr/kairos/

Pre-requisites if any: Taste for rigourous reasonning. Some background in logics would be a plus.

Description:

SAT and SMT engines are software programs for solving systems of logical constraints. More precisely,

given a set of constraints, (equations, inequations, etc. over some symbolic expressions)

they will search for a set of values of the variables (called a model) that _satisfies_ the constraints.

SAT solvers work on boolean constraint systems, while SMT means "Satisfiability Modulo Theory", and can

solve satisfiability problems involving any kind of data, providing that the operations and predicates used on these

data are properly axiomatized.

Typical SMT engines (see e.g. Z3 below) provide libraries that can handle usual data types, including Integers,

Real numbers, BitVectors, Arrays, Records, Enumeration types, etc...

Each of these theories provide a specific set of operators and predicates over a given data domain, and some

axiomatisation used by the system to decide satisfiability of formulas. Adding new types, or new operations, requires

adding the axioms to reason about them.

The Kairos team uses SMT engines as solvers for several algorithms dealing with time-sensitive systems (simulator,

provers, model-checking engine, etc.). These algorithms make use of Z3 standard libraries, but also some extensions,

for which we would like to build theories.

The student will study existing bibliography and search previous work that can be useful. Then he/she will

build specific theories for the data types we needed, including:

- Additional Integer operations

- Intervals of Integers

- Sets

- User-defined structures (= records), including inductive structures beyond the capabilities of the standard library.

Useful Information/Bibliography:

Z3 Tutorial and online solver: https://www.rise4fun.com/Z3

Using SMT engine to generate Symbolic Automata: https://hal.inria.fr/hal-01823507v1

Streaming 360° VR videos with HTML5 and MPEG-OMAF

Advisor: Lucile Sassatelli

Emails: sassatelli@i3s.unice.fr

Webpage: http://www.i3s.unice.fr/~sassatelli

Laboratory: I3S (2000, route des Lucioles – Sophia Antipolis)

Description:

VR is growing fast with different companies rolling out cheap and not-so-cheap head-mounted sets, from dedicated headsets like Oculus Rift and HTC Vive down to smartphone-dependent headsets (e.g., Samsung Gear VR, Google Cardboard and alike). VR represents a tremendous revolution in the user’s experience, but VR is also a significant challenge for streaming transmission over the Internet (that is, Youtube-like, without download). The bit rates entailed by 360° videos are indeed much higher than for conventional videos.

In the context of several research projects, we have designed innovative streaming strategies for 360° videos, which are meant to both decrease the required bandwidth and improve the user experience [1-6].

Recently, specifications for the storage and delivery format for 360° videos have been standardized with MPEG-Omnidirectional Media Format (MPEG-OMAF). However, only very recent works [7,8] have looked into how to stream with such media format. Some works focus on how to produce the content [8], other on how to stream it to players embedded in HTML5-enabled Web browsers [7].

This PFE is made in collaboration with Franck Adrai, CEO of MyTourLive company.

Objective of the PFE: Progress towards a complete processing chain that would allow to efficiently process live 360° video feeds, and deliver them to Web-based players.

- 1st phase: Deploy and understand the code of [7] related to the delivery (an HTTP/2 server will be used).

- 2nd phase: Deploy and understand the code of [8] related to the production.

- 3rd phase: Propose a way to build an integrated delivery chain using both components. The architecture of the system of MyTourLive will be considered.

Pre-requisites:

Mastery of Javascript, knowledge of content distribution os a plus

Technical tools:

Javascript, W3C Media Source Extensions (MSE), HTML, DASH

Additional information:

This PFE can be followed by an intership.

References:

[1] L. Sassatelli, M. Winckler, T. Fisichella, R. Aparicio and D. Trevisan. New Interactive Strategies for Virtual Reality Streaming in Degraded Context of Use. Accepted for publication in Elsevier Computers & Graphics, Sep. 2019.

[2] L. Sassatelli, M. Winckler, T. Fisichella and R. Aparicio-Pardo. User-Adaptive Editing for 360 degree Video Streaming with Deep Reinforcement Learning. ACM International Conference on Multimedia, Demo track, Nice, France, October 2019.

[3] L. Sassatelli, M. Winckler, T. Fisichella, R. Aparicio-Pardo and A.-M. Pinna-Déry. A New Adaptation Lever in 360° Video Streaming. 29th ACM SIGMM Workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV), Amherst, MA, USA, June 2019.

[4] M. Romero, L. Sassatelli, F. Precioso and R. Aparicio-Pardo. Foveated Streaming of Virtual Reality Videos. Demo track, ACM International Conference on Multimedia Systems (MMSys), Amsterdam, The Netherlands, June 2018.

[5] S. Dambra, G. Samela, L. Sassatelli, R. Pighetti, R. Aparicio-Pardo and A.-M. Pinna-Déry. Film Editing: New Levers to Improve Virtual Reality Streaming. ACM International Conference on Multimedia Systems (MMSys), Amsterdam, The Netherlands, June 2018. (Acc. rate: 21%)

2nd place of the "Excellence in DASH award", awarded by the DASH Industry Forum.

ACM Reproducibility Badge.

[6] L. Sassatelli, A.-M. Pinna-Déry, M. Winckler, S. Dambra, G. Samela, R. Pighetti and R. Aparicio-Pardo. Snap-changes: a Dynamic Editing Strategy for Directing Viewer's Attention in Streaming Virtual Reality Videos. ACM International Conference on Advanced Visual Interfaces, Grosseto, Italy, May 2018.

[7] D. Podborski, J. Son, G. Singh Bhullar, R. Skupin, Y. Sanchez. HTML5 MSE Playback of MPEG 360 VR Tiled Streaming. ACM MMSys, Amherst, MA, June 2019.

[8] T. Ballard, C. Griwodz, R. Steinmetz and A. Rizk. RATS: Adaptive 360-degree Live Streaming. ACM MMSys, Amherst, MA, June 2019.

Studying network challenges of Microservices Environments.

Team: SigNet / I3S Advisors: Dino Lopez Pacheco <dino.lopez@univ-cotedazur.fr>, Guillaume Urvoy-Keller <urvoy@univ-cotedazur.fr>Introduction Nowadays, cloud services rely on complex ecosystems. For instance, a car-sharing service will rely

on multiple modules, like driver manager, passenger manager, billing, trip management, etc [1].

The complexity of cloud services has brought major challenges to the software development and

information technology operations (DevOps), as maintenance tasks (e.g. bug fixes) or application

upgrades might require the modifications of several functions. Consequently, software is moving

from a monolithic architecture, where one single program is responsible for implementing the entire

set of subservices as a unified whole, to a microservice architecture, where a service is composed of

multiple small independent interconnected functions or microservices [2].

Microservices ease the DevOps workflow since developers need to focus on small pieces of code,

instead of a huge unwieldy program.

As cloud services can be subject to high loads, microservices must be able to be dynamically

and automatically deployed and scaled across multiple servers to enhance the applications overall

performance. However, a recent study [3] showed that in a serverless environment, microservices QoS

decrease in presence of high load. Indeed, due to the nature of microservice-based systems where

functions intercommunicate to deliver a result, a bottleneck encountered on one of the (or multiple)

functions can slow-down the entire system.

The performance problem of microservices would be also exacerbated by the network overhead of virtual

overlay networks, which are deployed by clouds to create extensible and multiple subnets.

Objectives The objective of this project is to explore the complexity of microservices ecosystems and

understand the challenge that the network overlays introduces on such environments. In this PFE, the student will: * Explore the state of the art on the performance and architecture of microservices systems. * Deploy and explore DeathStarBench [4] * Analyze and explore different solutions to deploy microservices (e.g. Kubernetes). * Study the impact of different overlay network solutions on Microservices environments. References [1]: Gan, Y., Pancholi, M., Hu, S., Cheng, D., He, Y., & Delimitrou, C. (2018).

Seer: Leveraging Big Data to Navigate the Increasing Complexity of Cloud Debugging.

In 10th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 18). [2]: Thones, Johannes. Microservices. IEEE software 32.1 (2015): 116-116. [3]: Gan, Yu, et al. "An Open-Source Benchmark Suite for Microservices and Their Hardware-Software

Implications for Cloud & Edge Systems." Proceedings of the Twenty-Fourth International Conference on

Architectural Support for Programming Languages and Operating Systems. ACM, 2019. [4]: [DeathStarBench: An Open-Source End-to-End Microservices Benchmark Suite.]

(http://microservices.ece.cornell.edu/)

Security Testing for Cryptographic Protocols

Advisor:

Yves Roudier <Yves.Roudier@i3s.unice.fr>

Description:

Cryptographic protocols are an essential component of today’s distributed applications and networked infrastructures. The stacks implementing such cryptographic protocols therefore become subject to a wide variety of attacks. This has for instance been witnessed in the case of the TLS protocol [1], a central element of the Internet and of electronic commerce. Many attacks like Heartbleed, SMACK, FREAK, Logjam, or SLOTH have been discovered over the last years. Security testing has become an increasingly important area of investigation in this context (for instance [2], [3])

The objective of this work is to evaluate and experiment with security testing techniques for exploring implementations of cryptographic protocols. Two directions will be explored:

- test generation: diverse techniques, ranging from dumb fuzzing to smart fuzzing (based on the protocol specification), or even software engineering techniques based on code variability (as experienced on multiple versions of the protocol stack), will be deployed based on existing tools (e.g. FlexTLS [2] or [4]) or developed.

- test monitoring : one problem of testing is the need to implement oracles to decide on the results. Here again different techniques will be explored to understand their feasibility and adequacy, notably blackbox testing, greybox testing through stack code instrumentation and the insertion of checks, or through the use of hypervisors.

These techniques should be evaluated and compared in order to understand which one is best suited for covering the different behaviors of a given protocol. TLS will constitute an obvious target, but other cryptographic protocols may be investigated, notably recent protocols used in IoT devices.

Expected skills: Development skills and interest for cybersecurity and networking.

References:

[1] A Study of the TLS Ecosystem, PhD Thesis (Olivier Levillain)

https://tel.archives-ouvertes.fr/tel-01454976/document

[2] FLEXTLS: A Tool for Testing TLS Implementations (Benjamin Beurdouche, Antoine Delignat-Lavaud, Nadim Kobeissi, Alfredo Pironti, Karthikeyan Bhargavan), In USENIX Workshop on Offensive Technologies (WOOT), 2015.

https://www.usenix.org/system/files/conference/woot15/woot15-paper-beurdouche.pdf

[3] A Messy State of the Union: Taming the Composite State Machines of TLS (Benjamin Beurdouche, Karthikeyan Bhargavan, Antoine Delignat-Lavaud, Cédric Fournet, Markulf Kohlweiss, Alfredo Pironti, Pierre-Yves Strub, Jean Karim Zinzindohoue), In IEEE Symposium on Security & Privacy 2015 (Oakland), 2015.

http://www.ieee-security.org/TC/SP2015/papers-archived/6949a535.pdf

[4] Protocol State Fuzzing of TLS Implementations. Joeri de Ruiter, Erik Poll.

https://www.usenix.org/system/files/conference/usenixsecurity15/sec15-paper-de-ruiter.pdf

Error correction techniques for application to DNA digital data storage.

Advisors :

Marc Antonini, DR CNRS, Laboratoire I3S UCA/CNRS, Equipe SIS/MediaCoding

am@i3s.unice.fr, 06 32 13 24 69

Co-advisors : Melpomeni Dimopoulou, Eva Gil San Antonio

Description:

Living in the age of the Internet of Things and the constant use of social media, the data that is being generated and stored every minute is growing tremendously every year. However, the technological advances on storage devices has reached a certain limit while also facing the problem of short-term reliability. Conventional means of storage such as hard drives and servers can last for about 15 years maximum before they need to be replaced. Furthermore, running data centers requires huge amounts of energy. In short, we are about to have a serious data-storage problem that will only become more severe over time. An alternative to hard drives is the use of DNA, which is life’s information-storage material and consists of long chains of the nucleotides A, T, C and G, as a means of digital data storage. Recent works have proven that the storage of digital data into DNA is both feasible and very promising. The process of DNA data storage consists of 3 main steps. The encoding of a digital binary sequence into a new sequence of the 4 symbols A, T, C and G, the synthesis of this sequence into synthetic DNA (in vitro) and the storage of the synthesized DNA into specifically designed storage capsules. The reading of the stored information can be performed by special machines, the sequencers, which can decode DNA strands. The most challenging part of this project is the fact that the sequencing process is error-prone and thus it is highly important to apply error correction techniques to deal with the errors introduced in the decoding.

In the context of this PFE the student will work, jointly with two PhD students in the team, on the application of Machine Learning and error correction algorithms to correct erroneous DNA sequences to allow the correct decoding of the stored information.

Keywords:

DNA coding, DNA digital data storage, data decoding, Error Correction, Machine Learning

References:

[1] M. Dimopoulou, M. Antonini, P. Barbry, R. Appuswamy, A biologically constrained encoding solution for long-term storage of images onto synthetic DNA, in: EUSIPCO 2019, 2019.

[2] M. Dimopoulou, M. Antonini, P. Barbry, R. Appuswamy, DNA coding for image storage using image compression techniques, in: CORESA 2018, 2018.

[3] G. M. Church, Y. Gao, S. Kosuri, Next-generation digital information storage in DNA, Science (2012) 1226355.

[4] N. Goldman, P. Bertone, S. Chen, C. Dessimoz, E. M. LeProust, B. Sipos,E. Birney, Towards practical, high-capacity, low-maintenance information storage in synthesized DNA, Nature 494 (7435) (2013) 77

[5] Hawkins, John A., et al. "Error-correcting DNA barcodes for high-throughput sequencing." bioRxiv (2018): 315002.

Video coding using Spiking Neural Networks.

Advisors : Marc Antonini, DR CNRS and Jean Martinet, PR UCA, Laboratoire I3S UCA/CNRS, Equipe SIS/MediaCoding et SPARKS

am@i3s.unice.fr, jean.martinet@univ-cotedazur.fr

Description:

During the past three decades, research in lossy compression for video yielded several coding algorithms like the well-known MPEG standard. Since then, subsequent efforts for conceiving lossy video coders followed almost the same paradigm. These coding algorithms were mostly designed in a signal processing way of thinking and do not account for the actual biological visual systems behavior. To our point of view, the basic drawback of the conventional architecture stems from the fact that videos are dynamic signals which are processed by methods proposed for static images (DCT, wavelets, scalar quantization, etc.) whilst at the same time an increasing number of techniques is proposed in order to reduce temporal redundancy. As a result, we believe that a video should be processed dynamically. Yet, computational neuroscience made substantial progress during the same period in better understanding the internal representation of the sensory world. In particular, concerning the visual stimuli sensing, one can find many results and heuristics on how information is encoded, transmitted, and interpreted within the mammalians’ visual system. Based on those results, it is our conviction that the mammalians’ visual system developed efficient coding strategies that could be used as a source of inspiration to imagine novel video compression algorithms.

In the context of this PFE the student will work on the development of a video compression tool that combines spike-based quantizers [1,2] and spiking neural networks [3,4] in order to (i) extract relevant perceptual information contained by an image and (ii) encode simultaneously this information.

Keywords:

Compression, image analysis, Integrate and fire neuron, Spike-based coding, Spiking neural network.

References:

[1] E. Doutsi (2017), Retina-inspired image and video coding, Ph.D thesis, University Côte d’Azur.

[2] Gerstner, W. and Kistler, W. (2002). Spiking Neuron Models: An Introduction. Cambridge

University Press, New York, NY, USA.

[3] Filip Ponulak, Andrzej Kasinski. Introduction to spiking neural networks: Information processing, learning and applications. Acta Neurobiol Exp (Wars). 2011;71(4):409-33.

[4] Hélène Paugam-Moisy, Sander M. Bohte. Computing with Spiking Neuron Networks. G. Rozenberg, T. Back, J. Kok. Handbook of Natural Computing, Springer-Verlag, pp.335-376, 2012, ⟨10.1007/978-3-540-92910-9_10⟩. ⟨hal-01587781⟩

Brain-like connection of micro-services

Supervisor: Alexandre Muzy, I3S lab, muzy@i3s.unice.fr

Description:

Brain learning relies on the reinforcement of the connections between neurons based on their activity dynamics. This allows the brain to be able to aggregate on-the-fly billions of neurons to achieve new functions. In the Internet, micro-services are being intensively developed to increase reusability and decrease code complexity by connecting modules of codes together. However, connecting together automatically many micro-services is still a challenge. It is hard to predict the global behavior of the program obtained and also the way to connect them automatically is still unknown.

To solve this problem, a brain inspired learning algorithm, Activity-based Credit Assignment [ACA] can be used. ACA allows searching automatically the right connections and components, in the space of possible networks, to achieve a global behavior. Using both the activity of the components (here micro- services) and the performances of the networks (of micro-services), the credit of right components is increased while the credit of wrong components is decreased.

On the Internet, an existing company allows the users to set flow charts for connecting apps: Zapier. For example, it is possible to connect micro-services from Google Maps (e.g., your position with respect to a map) with your Phone Contacts (e.g., asking you to set a new appointment when your position is close to a contact). Currently, several platforms exist for implementing microservices: Kubernetes, Google cloud, .NET, Zeebe, etc.

The goal of this project is first to compare the existing microservice platforms. Based on this comparison, a platform will be chosen to implement a prototype in which ACA search will connect automatically few services. The prototype will be able to evaluate and compare different microservices achieving the same function.

References:

[ACA] A. Muzy (2019) "Exploiting activity for the modeling and simulation of dynamics and learning processes in hierarchical (neurocognitive) systems", IEEE Magazine of Computing in Science & Engineering (CISE), vol. 21, no. 1, pp. 84-93.

Automatic inference of Machine Learning (ML) pipelines

Who?

Name: Blay-Fornarino Mireille

Mail: mireille.blay@univ-cotedazur.fr

Telephone: +33 4 89 15 42 43

Web page: http://mireilleblayfornarino.i3s.unice.fr/

Where?

Place of the project: I3S - Templiers

Address: Bat Templiers, 930 route des colles,06903 Sophia Antipolis Cedex

Team: SPARKS

Web page: http://sparks.i3s.unice.fr/

Pre-requisites if any: Software Engineering Skills, practical knowledge in ML would help

Description:

Research and industry are increasingly using large amounts of data to inform their decisions. To do this, however, the data must be analyzed in typically non-trivial refinement processes, which require technical expertise on methods and algorithms, experience in how accurate analysis should be carried out and knowledge of an increasing number of analytical approaches. This project aims to mitigate these problems by reducing the choices available to users to only ML pipelines appropriate for their data sets.

Machine Learning's algorithm libraries expose a very large number of algorithms that must be combined to build ML models. Not all combinations are valid (i.e. they are not executable) and not all are effective (i.e. they do not give an « interesting » result). How to automatically propose valid and effective compositions from a large database of algorithms? This is the question we wish to address in this project.

By focusing on the problem of supervised classification, we want to build a model that allows to:

1- automatically complete the knowledge on algorithms (preconditions, post-conditions, properties,...) from algorithm libraries such as scikit-learn and weka

2- eliminate invalid and/or inefficient pipelines

3- propose potentially valid and efficient pipelines based on a data set

4- discover new valid and efficient pipelines not present in our study databases

The difficult points are:

- the space of algorithms which is large and not closed,

- the variability of meta-data,

- the combinatorics of possible pipelines

- effectiveness measurement

- the impossibility of testing everything

- …

In the context of this PFE, we will consider that we have validated the model if we are able to (1) automatically annotate a library of algorithms with a first set of pre and post-conditions, (2) build a model of all potentially valid pipelines, (3) compare this model with a set of experiments from existing platforms, (4) enrich our library of algorithms using the results obtained in 4, (5) reduce pipeline space by using heuristics.

Useful Information/Bibliography:

Serban, Floarea, Joaquin Vanschoren, Jörg-Uwe Kietz, and Abraham Bernstein. 2013. “A Survey of Intelligent Assistants for Data Analysis.” ACM Computing Surveys. https://doi.org/10.1145/2480741.2480748. (Overall objective)

Bilalli, Besim, Alberto Abelló, and Tomàs Aluja-Banet. 2017. “On the Predictive Power of Meta-Features in OpenML.” International Journal of Applied Mathematics and Computer Science 27 (4).

Olson, Randal S., and Jason H. Moore. 2019. “TPOT: A Tree-Based Pipeline Optimization Tool for Automating Machine Learning.” In , 151–60. Springer, Cham. https://doi.org/10.1007/978-3-030-05318-5_8. (To see the problem but especially not this solution)

Su, Jingyi, Mohd Arafat, and Robert Dyer. 2018. “Using Consensus to Automatically Infer Post-Conditions.” In Proceedings of the 40th International Conference on Software Engineering Companion Proceeedings - ICSE ’18, 202–3. New York, New York, USA: ACM Press. https://doi.org/10.1145/3183440.3195096

Khairunnesa, Samantha Syeda, Hoan Anh Nguyen, Tien N. Nguyen, and Hridesh Rajan. 2017. “Exploiting Implicit Beliefs to Resolve Sparse Usage Problem in Usage-Based Specification Mining.” Proceedings of the ACM on Programming Languages 1 (OOPSLA): 1–29. https://doi.org/10.1145/3133907.

Santiputri, Metta, Aditya K. Ghose, and Hoa Khanh Dam. 2017. “Mining Task Post-Conditions: Automating the Acquisition of Process Semantics.” Data and Knowledge Engineering. https://doi.org/10.1016/j.datak.2017.03.007.

Graph algorithms for abstraction in gene regulatory networks

Who?

Name:Hélène Collavizza

Mail: Helene.COLLAVIZZA@univ-cotedazur.fr

Co-supervisor: Gilles Bernot, Jean-Paul Comet

Where?

Place of the project: I3S

Address:

2004, route des Lucioles, B.P. 93

F-06902 Sophia Antipolis Cedex

Team: SPARKS (bio-info)

Pre-requisites if any: Graph algorithms. Open-minded and interested in research, because the project is exploratory. An interest for abstraction, modelling activity and biology would be a plus.

Description:

Gene regulatory networks defines the interactions of a biological system i.e. the individual influences of a gene x on the expression of a gene y https://en.wikipedia.org/wiki/Gene_regulatory_network.

If these static interactions are generally known, the dynamics of the network is governed by parameters that we have to identify. We have proposed formal methods to study this dynamic behavior [1,2]. The objective is to help the biologists to select some in-vivo or in-vitro experiments that would allow to verify the hypothesis under study. Within this formal modeling framework, the exploration of the qualitative dynamic of the system can be described as path exploration in a finite state space. However, the number of possible parameterizations can be huge, even for small interaction networks.

The objective of this project is to study how some parts of the regulatory network could be abstracted in order to ease the parameter identification of the whole network. More precisely, we already have established some preservation properties when embedding a sub-system in a larger one [3,4]. In that case, the behavior of the sub-system is studied in detail, and we want to know under which condition this behavior is preserved when embedded in the large system. The study here is orthogonal : the aim is to select a sub-system that could be abstracted while preserving some properties of the whole system. One application is the regulatory network of energy metabolism. A first model has been developed. A current work is to analyze how the dynamics of this system is disturbed in metastasized cells in pancreatic cancer (work in collaboration with an INSERM unit in Marseille).

In a first time, usual algorithms on graphs (e.g. strongly connected components, cycles search, metrics of connectivity, …) will be implemented in order to get information on the regulatory graph of energy metabolism. In a second time, in collaboration with bioinformaticist, this information will be used to select which sub-parts of the graph are good candidates for abstraction. Last, the temporal properties that we want to preserve will be confronted to these candidates.

Useful Information/Bibliography:

[1] G. Bernot, J.-P. Comet and E.H. Snoussi. Formal methods applied to gene network modelling. In Logical Modeling of Biological Systems (eds. K. Inoue and L. Fariñas). pp. 245-289, ISBN 978-1-84821-680-8, ISTE & Wiley, 2014.

http://www.i3s.unice.fr/~comet/publications/ChapLivres/chapterLogic2014.pdf

[2] G. Bernot, F. Tahi, Behaviour Preservation of a Biological Regulatory Network when Embedded into a Larger Network . Fundamenta Informaticae, IOS Press Amsterdam, Vol.91, Issue.3-4, p.463-485, ISSN:0169-2968 , 2009

http://www.i3s.unice.fr/~bernot/Biomodels/2009-FundamentaInformaticae.pdf

[3] M. Mabrouki, M. Aiguier, J.-P. Comet, P. Le Gall and A. Richard. Embedding of biological regulatory networks and properties preservation. Mathematics in Computer Science. 5(3):263-288, 2011. special issue

http://www.i3s.unice.fr/~comet/publications/ConfIntl-actes/confAB2008.pdf

Precise in-door localization in uncalibrated environments

Who?

Name: Arnaud Legout and Chadi Barakat

Mail: arnaud.legout@inria.fr and chadi.barakat@inria.fr

Telephone: +33 4 92 38 78 15

Web page: http://www-sop.inria.fr/members/Arnaud.Legout/

Where?

Place of the project: Inria Sophia Antipolis

Address: 2004 route des Lucioles

Team: DIANA

Web page: https://team.inria.fr/diana/

Pre-requisites if any: Python, machine learning is a plus

Description:

Precise in-door localization is a challenge. Today, such a localization is performed either with

dedicated Bluetooth beacons, or with a pre-calibration phase that is expensive to perform and

hard to scale.

The goal of this PFE is to study how machine learning can be used in combination

with re-training to perform accurate in-door localization. The goal is to leverage on

the measured power of surrounding Wi-Fi, Bluetooth and cellular sources. The challenge

is to be robust to the churn of the equipments with time and to accurately train

the classifier with minimum user feedback.

The student will have access billions of real measurements to study and train a

classifier and will have the opportunity to join a project designing a real in-door

positioning application based on a consumer app for smartphones.

This PFE can be continued by an internship and a Ph.D. thesis or an engineering

position for excellent students.

Synaptic delays for spatio-temporal pattern recognition

Who?

Name: Jean Martinet

Mail: jean.martinet@univ-cotedazur.fr

Telephone:

Web page: https://www.i3s.unice.fr/jmartinet/fr/accueil

Where?

Place of the project: I3S - Les Algorithmes - Euclide B

Address: Laboratoire I3S, Sophia Antipolis

Team: SPARKS

Web page: https://www.i3s.unice.fr/jmartinet/fr/node/9

Pre-requisites if any: Programming skills in Python, interest in research, machine learning, bio-inspiration and neurosciences are required.

Description: Spiking Neural Networks (SNN) represent a special class of artificial neural networks, where neurons communicate by sequences of spikes [Ponulak, 2011]. Contrary to deep convolutional networks, spiking neurons do not fire at each propagation cycle, but rather fire only when their activation level (or membrane potential, an intrinsic quality of the neuron related to its membrane electrical charge) reaches a specific threshold value. Therefore, the network is asynchronous and allegedly likely to handle well temporal data such as video. When a neuron fires, it generates a non-binary signal that travels to other neurons, which in turn increases their potentials. The activation level either increases with incoming spikes, or decays over time.

Regarding inference, SNN does not rely on stochastic gradient descent and backpropagation. Instead, neurons are connected through synapses, that implement learning mechanisms inspired from biology for updating synaptic weights (strength of connections) or delays (propagation time for an action potential).

In this project, we wish to design a specific SNN and tune synaptic delays to recognise prototypical temporal patterns, inspired by Reichardt detectors. The objective is to recognize features such as motion direction and speed. The implementation will use Brian 2 simulator.

Useful Information/Bibliography:

- [Ponulak, 2011] Filip Ponulak, Andrzej Kasinski. Introduction to spiking neural networks: Information processing, learning and applications. Acta Neurobiol Exp (Wars). 2011;71(4):409-33.

- [Paredes, 2019] Paredes-Vallés, F., Scheper, K. Y. W., & De Croon, G. C. H. E. (2019). Unsupervised learning of a hierarchical spiking neural network for optical flow estimation: From events to global motion perception. IEEE transactions on pattern analysis and machine intelligence.

- [Oudjail, 2019] Veïs Oudjail, Jean Martinet. Bio-inspired event-based motion analysis with spiking neural networks. International Conference on Computer Vision Theory and Applications, 2019.

- [URL] : https://isle.hanover.edu/Ch08Motion/Ch08ReichardtDetectors.html

Study of the scalability of a Jetson-based architecture for deep learning.

Who?

Advisor: Michel.Riveill@univ-cotedazur.fr

Where?

Inria-I3S research team MAASAI (actually ‘Templiers 4’ 4th floor) but moving in INRIA building soon.

http://www.i3s.unice.fr/~riveill

Description:

Even if the figure of 20 to 50% of global electricity consumption could be due to digital in 2030 (Cédric Villani's AI report published in March 2019) is controversial, the impact of digital on energy consumption cannot be ignored, particularly in the field of AI or the design of deeper and deeper architecture requires the use of increasingly voluminous data to build the matrix of weights that have become huge.

The objective of the project is to validate the hypothesis that a Low Power GPU-based architecture can be used to drive a deep network.

After a detailed study of the architecture, it will be necessary to define the main elements to be taken into account in order to minimize energy consumption and in particular to minimize memory movements. We propose a mixed approach to this end:

1- a model of the operation of the coarse-grained platform

2- an experiment in distributed TensorFlow to validate the experiment

For this purpose, we will provide a data set and a deep reference architecture. Code parallelization can be done using distributed TensorFlow or MPI/Keras/Tensorfow which are already installed on the target architecture. Different data placement strategies as well as different 'gradient descent' strategies can be considered to minimize energy consumption during model construction. For reasons of simplification, we will not ask ourselves the question of minimizing the space of the hyper-parameters to be covered.

Recommended Skills: Programming in python with tensorflow/keras

Reference:

- Energy and Policy Considerations for Deep Learning in NLP, https://drive.google.com/file/d/1v3TxkqPuzvRfiV_RVyRTTFbHl1pZq7Ab/view

- Distributed training with TensorFlow, https://www.tensorflow.org/guide/distributed_training

- Parallel and Distributed Deep Learning, https://stanford.edu/~rezab/classes/cme323/S16/projects_reports/hedge_usmani.pdf

- Distributed Deep Learning (Part 1 to 4), https://blog.skymind.ai/distributed-deep-learning-part-1-an-introduction-to-distributed-training-of-neural-networks/

Big Data Stream Processing in Multi-Source Cloud/Fog Environments

Who?

Name: Alessio Pagliari, Fabrice Huet

Mail: alessio.pagliari@univ-cotedazur.fr, fabrice.huet@univ-cotedazur.fr

Telephone:

Web page: www.i3s.unice.fr/~pagliari, https://sites.google.com/site/fabricehuet/

Where?

Place of the project: Laboratoire I3S

Address: 2000 route des Lucioles, Euclide A

Team: SCALE project

Web page: https://team.inria.fr/scale/

Pre-requisites if any: minimum knowledge of BigData systems especially Data Streaming, Unix systems, Java programming, scripting languages (e.g. python, bash)

Description:

Data Streaming is nowadays a cornerstone in most applications (e.g. social Networks, e-commerce, sensors networks). Allowing to collect data and immediately process it, giving almost real-time results. These properties are useful especially in sensible contexts that we can find in IoT networks, where multiple devices or sensor are continuously generating data and feeding the Data Streaming Platforms (DSP) [1].

The trend has always been to offload all the work of the data stream platform on the clouds where is possible to easily scale computing power. However, with these new applications having multiple data sources, the impact of this data moving between edge and cloud may impact the network and the application performances, in particular latency. Thus, new solutions are moving the computation, or part of it, directly in the edge devices, namely Fog Computing [2].

In this context, the objective of the student in this PFE is to:

· Perform the State of the Art of distributed Data Streaming solutions in Fog environments (e.g. platforms, scheduling, typical scenarios)

· Implement a testing environment in a large-scale testbed (e.g. Grid 5000)

· Evaluate the impact of multiple distributed sources over current DSPs (e.g. Flink, Storm, Spark)

Useful Information/Bibliography:

[1] Shukla, Anshu, Shilpa Chaturvedi, and Yogesh Simmhan. "Riotbench: An iot benchmark for distributed stream processing systems." Concurrency and Computation: Practice and Experience 29.21 (2017)

[2] Cardellini, Valeria, et al. "New Landscapes of the Data Stream Processing in the era of Fog Computing." (2019).

Deep Reinforcement Learning for Cloud Networks

Advisor: APARICIO PARDO Ramon

Mail: raparicio@i3s.unice.fr

Telephone: 04 92 94 27 72

Web page: http://www.i3s.unice.fr/~raparicio/

Place of the project: I3S: Laboratoire d'Informatique, Signaux et Systèmes de Sophia Antipolis

Address:2000, route des Lucioles - Les Algorithmes - bât. Euclide B, 06900 Sophia Antipolis

Team: SIGNET

Web page: http://signet.i3s.unice.fr

Pre-requisites if any:

Languages:

• Python language absolutely

• Deep Learning libraries (like TensorFlow [6], Keras, rllab, OpenAI Gym) appreciated

Theory:

• Machine Learning, Data Science, particularly Neural Networks theory very recommendable

• Classical optimisation theory (Linear Programming, Dual Optimisation, Gradient Optimisation, Combinatorial Optimization) appreciated

Technology:

• Computer networking notions are welcome, but they are not necessary.

Project size team: 2 people are expected.

Description:

In the last years, Deep Reinforcement Learning [1] have obtained ground-breaking results at solving highly complex tasks, such as beating AlphaGo world champion or achieving state of the art results at video games (Atari, Doom).

In the Cloud network paradigm, NP problems arise when we try to smartly route the IP flows and to optimally place the CPU resources to process them in the Cloud. Such problems are usually tackled with classical heuristic methods providing approximated solutions [2].

A few years ago, Dai et al. [2] has shown the interest of Deep RL to learn heuristic algorithms to solve some classical NP-hard problems on graphs by combining RL with graph embedding (GE) [3], [4], a kind of representation learning applied to graphs. GE obtains a more compacted and lower dimensional graph representation where the RL scheme can solve easier the optimization problem.

In this PFE, we want to assess how much we can gain if we adopt this RL+GE architecture to solve the pivotal control problem in the cloud network paradigm: the dynamic allocation of Service Chains (SC) of Virtual Network Functions (VNF), as stated in [5]. A SC of VNFs is an ordered sequence of virtualized functions (deported to the cloud DCs) required to accomplish a network service, e.g. a video delivery service. The dynamic allocation of SCs of VNFs can be defined as “given a set of SC requests, to find (i) the placement of the VNFs at the DCs and (2) the routing of traffic flows between the VNFs, respecting the ordered sequence, and optimizing a given objective (e.g. minimizing the infrastructure cost or the power consumption).” This is the reference problem that we will consider in this project,

Goal: The goal of the PFE is to apply bi first time a RL+GE approach to the dynamic allocation of SCs of VNFs.

Phase 1: Getting familiar with the Cloud network control problem, the envisioned solution and existing codes (GE and RL algorithms).

Phase 2: Implementation of a Python-based discrete event simulator representing the Cloud network environment

Phase 3: Integration of GE within the existing RL code used to solve the Cloud network problem

Phase 4: Assessment of the gain with RL+GE with respect to the classical algorithms.

Further work

• 1 or 2 internships are expected to be carry out in the subject under the framework of the French ANR Artic project

• In Oct. 2020, a PhD theses in the subject will start up under the framework of the French ANR Artic project.

Useful Information/Bibliography:

[1] V. Mnih et al.. Asynchronous Methods for Deep Reinforcement Learning. Int. Conf. On Machine Learning (ICML), 2016, https://arxiv.org/pdf/1602.01783.pdf

[2] A. Tomassilli, F. Giroire, N. Huin, S. Pérennes, “Provably Efficient Algorithms for Placement of Service Function Chains with Ordering Constraints,” in Proc. IEEE International Conference on Computer Communications, INFOCOM 2018, Honolulu, HI, USA, Apr. 2018

[3] H. Dai, E. B. Khalil, Y. Zhang, B. Dilkina and L. Song. Learning Combinatorial Optimization Algorithms over Graphs. Conf. On Neural Information Processing Systems (NIPS), Dec. 2017.

[4] W. L. Hamilton, R. Ying and J. Leskovec. Representation Learning on Graphs: Methods and Applications. arXiv:1709.05584, Apr. 2018. [4] H. Cai, V. W. Zheng and K. Chen-Chuan Chang. A Comprehensive Survey of Graph Embedding: Problems, Techniques and Applications. IEEE Transactions on Knowledge and Data Engineering 30 (2018): 1616-1637.

[5] H. Feng, J. Llorca, A. M. Tulino, and A. F. Molich, Optimal dynamic cloud network control,” in Communications (ICC), 2016 IEEE International Conference on. IEEE 2016, pp. 1-7.

[6] TensorFlow Guide, the TensorFlow's official documentation. (https://www.tensorflow.org/guide/)

Resource Discovery Mechanisms For Iot In Onem2m Standard

Advisor : Luigi Liquori, Directeur de Recherche Inria, Kairos Team

Email : Luigi.Liquori@inria.fr,

Web : https://luigiliquori.wixsite.com/atinria

Team : Kairos project (Inria)

Keywords : IoT, OneM2M, (discovery) protocols, simulation

Description. The activity proposed is the study and development of Resource Discovery mechanisms and Query languages for oneM2M IoT standard and its contribution to the oneM2M (http://onem2m.org/).

The goal is to enable an easy and efficient discovery of information and a proper interworking with external source/consumers of information (e.g. a distributed data base in a smart city or in a firm), or to directly search information in the oneM2M system for big data purposes.

oneM2M has currently native discovery capabilities that work properly only if the search is related to specific known sources of information (e.g. searching for the values of a known set of containers) or if the discovery is well scoped and designed (e.g. the lights in a house). When oneM2M is used to discover wide sets of data or unknown sets of data, the functionality is typically integrated by ad hoc applications that are expanding the oneM2M functionality. This means that this core function may be implemented with different flavours and this is not optimal for interworking and interoperability.

This activity requires a little expertise in protocols, routing mechanisms, distributed data bases topologies, query languages. Knowledge in network simulators could be useful. It could require also expertise in Syntax and Types and Ontologies of Query Languages to capture discovery mechanisms in a fully distributed scenario and abstract features objects, security and control access mechanisms, complexity issues management to increase efficiency.

The work will look at the query and discovery mechanisms, complexity, queries exhaustiveness already available in industrial solutions, to extract (and adapt) the applicable components and to assure as smooth interworking with relevant non-oneM2M solutions.

The supporting companies active in oneM2M will provide the oneM2M architectural, functional and knowledge needed to integrate the specific expertise on discovery and distribute languages, and to support the contribution to oneM2M IoT standard. This PFE could continue with a 6-months internship.

References : https://www.dropbox.com/s/461w048d0jr61pj/links-submitted.pdf?dl=0 (and references therein)

Security Mechanisms In Resource Discovery Protocols Over Internet

Advisor : Luigi Liquori, Directeur de Recherche Inria, Kairos Team

Email : Luigi.Liquori@inria.fr,

Web : https://luigiliquori.wixsite.com/atinria

Team : Kairos project (Inria)

Description: Internet in recent years has become a huge set of channels for content distribution highlighting limits and inefficiencies of the current protocol suite originally designed for host-to-host communication.

In particular, we propose a discovery service, extending the current TCP/IP hourglass Internet architecture, that provides a new network aware content discovery service. The CNS behavior and architecture is partly inspired by the DNS service, whose discovery process logic uses the BGP protocol inter-domain routing information.

The service registers and discovers resource names in each Autonomous System (AS): contents are discovered by searching through the augmented AS graph representation classifying ASes into customer, provider, and peering, as the BGP protocol does: our protocol guarantees the so called “valley-free” property, namely that the discover process does not generate any supplementary cost for the AS involved in the discovery. Performance of the proposed CNS is represented by the hit probability (that is defined as the fraction of ASes that successfully locate a requested object) and the average lookup length representing the average number of CNS servers explored during the search phase. A C-based simulator of CNS is developed and is run over real ASes topologies provided by CAIDA to provide estimates of both performance indexes.

In section 2.1 of the paper https://www.dropbox.com/s/461w048d0jr61pj/links-submitted.pdf?dl=0

we need to understand security issues in our ad-hoc query syntax

[fing_princ:][fing_cont:][hosts:][tags:]cont_name

where

– cont_name is a (possibly human readable) string denoting a content name (e.g. “openoffice.iso”, “traffic light”, “defibrillator”, “plastic bottle”, “pedes- trian”, URI, MAC, GUID, etc.);

– tags is an optional (possibly human readable) list of keywords (e.g. “sell”, “buy”, “rent”, “cars”, etc) associated with a given content;

– hosts is an optional list of hostnames being the purveyors of the content: when a hypername contains a list of hostnames, then the content name is retrieved from one of the hostnames: the local CNS perform a DNS query, transforms one (or all) hostname(s) into IP address(es) and return that list to the sender of the discovery request;

– fing_cont is an optional digital signature (hash) denoting the integrity of the content to be retrieved;

– fing_princ is an optional digital signature denoting the public asymmetric key of the principal, i.e. the owner of the content: it allow to identify the identity of the latter as soon as we retrieve the content itself.

Therefore, a hypername is characterized by an external and an internal view: the external view only includes a content name and a tag list.

This PFE could continue with a 6-months internship.

References : https://www.dropbox.com/s/461w048d0jr61pj/links-submitted.pdf?dl=0 (and references therein)

Modeling Safety in Autonomous Vehicles

Who?

Name: MALLET Frédéric

Mail: Frederic.Mallet@unice.fr

Telephone: 04 92 38 79 66

Web page: http://www-sop.inria.fr/members/Frederic.Mallet/

Where?

Place of the project: INRIA Lagrange

Address:

Team: Kairos (I3S/Inria)

Web page: http://team.inria.fr/kairos/

Description: Safety is a fundamental issue for autonomous vehicles. The acceptance of autonomous vehicles in an urban environment depends on the reliability and the robustness of their behavior. It is important to show that the autonomous vehicle will make better decisions than a human driver under any circumstances. However, this security strongly depends on the quality of the sensors. When we talk about sensor qualities, we talk about areas of confidence, measurement errors, etc. Example of sensors: camera, lidar and radar. In an autonomous and connected vehicle, the multi-sensor fusion is in fact a necessity to carry out the various tasks of perception. All sensors have limitations and reliability requirements that make them unusable under certain conditions: masking, reduced range, bias and inaccuracies… For example, for cameras this efficiency can be limited by conditions such as light, heavy rain or darkness that affects a vehicle’s ability to accurately detect and react to changes in its environment. Indeed, the metrological characteristics help us to know which sensor to use, in which range, with what precision the result will be given to us and other precisions which can be very useful in order to choose the sensor. But before seeing all these characteristics, it is good to see the errors that appear during the measurement and their degree of confidence.

In addition, data fusion remains today a big challenge in terms of scientific and technical parts. Multiple fusion techniques are required by the applications, but also by the vehicle evaluation systems. Thus, at the industrial level, functional and safety requirements will push manufacturers to develop sensor fusion systems to qualify their driver assistance systems (ADAS). Whatever the data is, defining the dependence of the quality of the data fusion on the results is a must to determine the risk criticality and confidence degree.

Mission:

At the initiative of this project, the student who wishes to discover the research will be part of the ADAS team in Renault Software Labs. (S)He will first write a state of art on the reliability of sensors that are essential for the future of autonomous driving and will also explain how to ensure the operational efficiency of these vehicles. This state of art will also serve to determine and study the compromise between the number of sensors and their price-related qualities. We need answers to the following questions: can we use the least number of sensors with an acceptable cost to have reliable data? And what is the impact of errors and uncertainties of these sensors on the degradation of the result?

Once the security rules are built, we must then know which data we will analyze or take into consideration in our work and decide whether we will apply stochastic and probabilistic aspects on them in order to aggregate to better solutions regarding confidence degree.

The student will also propose fusion and aggregation models by choosing appropriate input data and compare them to those of the ADAS team.

Useful Information/Bibliography:

- https://newsroom.intel.com/newsroom/wp-content/uploads/sites/11/2017/10/autonomous-vehicle-safety-strategy.pdf

- https://newsroom.intel.com/wp-content/uploads/sites/11/2019/07/Intel-Safety-First-for-Automated-Driving.pdf

- Shai Shalev-Shwartz, Shaked Shammah, Amnon Shashua, “On a Formal Model of Safe and Scalable Self-driving Cars”, Mobileye 2017

- Siwar Kriaa, Ludovic Pietre-Cambacedesa, Marc Bouissou, Yoran Halganda, “A survey of approaches combining safety and security for industrial control systems”, Elsevier, 2015.

Stack overflow

Who?

Name: Sid TOUATI

Mail: Sid.Touati@inria.fr

Telephone:

Web page: http://www-sop.inria.fr/members/Sid.Touati/

Where?

Place of the project: INRIA-Sophia

Team: Kairos

Web page: https://team.inria.fr/kairos/

Pre-requisites if any: processor architecture, operating systems, compilation

Description:

This internship is devoted to computer scientists wanting to have a first experience in hacking. You will learn a common method called "Stack overflow", well known attack that exploits program bugs (buffer overflow).

With this sort of hacking, someone can introduce malicious codes in any software, even operating systems, webservers, etc.

The internship will be done on Linux. The candidate must enjoy low level stuff in computing, like assembly languages and so on.

Useful Information/Bibliography:

[1] Smashing the Stack For Fun and Profit. Aleph One.

[2] https://travisf.net/smashing-the-stack-today

Probabilistic extensions for Logical Time in CPS design

Who?

Name: Robert de Simone / Frederic Mallet

Mail: rs@inria.fr

Telephone: 04 92 38 79 41 ou 04 92 38 79 66

Web page: https://team.inria.fr/kairos/

Where?

Place of the project: Inria Lagrange

Address:

Team: Kairos

Web page: https://team.inria.fr/kairos/

Description:

Cyber-Physical Systems (CPS) design combines modeling and programming of both physical environment and digital cyber controllers. While real-time controllers are desirably time predictable, modular and provable trustworthy, modeling of the environment contains uncertainties due to a partial understanding of the underlying physical process.

In the past we have designed an approach to cyber embedded design based on logical time as a way to express timely constraints between the main meaningful time events involved. The next challenge is to couple this modeling with a probabilistic environment modeling, while retaining, as many as possible, of the features that made logical time originally able to provide a deep understanding of system correctness by formal analysis.

The student will explore existing early proposals and will develop an temporal logical time approach for probabilistic CPS chosen on the ground of available analysis tools and related algorithmic methods.

Useful Information/Bibliography:

PRISM : http://www.prismmodelchecker.org/

PLASMA Lab: http://plasma-lab.gforge.inria.fr/plasma_lab_tutorial/plasma_tutorial.pdf

PROPHESY (from Joost-Pieter Katoen): https://moves.rwth-aachen.de/research/tools/prophesy/

Holger Hermanns, Ulrich Herzog, and Joost-Pieter Katoen. 2002. Process algebra for performance evaluation. Theor. Comput. Sci. 274, 1-2 (March 2002), 43-87. DOI=http://dx.doi.org/10.1016/S0304-3975(00)00305-4

Eun-Young Kang, Dongrui Mu, Li Huang: Probabilistic Verification of Timing Constraints in Automotive Systems Using UPPAAL-SMC. IFM 2018: 236-254

Dehui Du, Ping Huang, Kaiqiang Jiang and F. Mallet. pCSSL: a Stochastic Extension to MARTE/CCSL for Modeling Uncertainty in Cyber Physical Systems. Science of Computer Programming 166:71-88, 2018.

Privacy-preserving data analysis

Supervisors

Names:

Giovanni Neglia (giovanni.neglia@inria.fr), Michela Chessa (michela.chessa@gredeg.cnrs.fr)

Web pages:

http://www-sop.inria.fr/members/Giovanni.Neglia/

http://michelachessa.fr/

Location:

Inria Sophia-Antipolis Méditerranée

Address: 2004 route des Lucioles, 06902 Sophia Antipolis

Teams:

Neo, Inria (https://team.inria.fr/neo/)

GREDEG, CNRS laboratory, UMR 7321, http://unice.fr/laboratoires/gredeg

Decription:

Many official statistics organizations are traditionally charged with collecting information from individuals or establishments and publishing aggregate data to serve the public interest. Nowadays, many companies collect information about users’ online behavior, to sell average profiles (e.g. customers’ profiles) to third parties. Individuals may be reticent to provide personal information if there is the risk that such information may be extracted by the aggregate data. The recent Cambridge Analytica scandal [Gre18] has once more raised attention on the concrete risk of misuse of the data collected by information technology companies.

Privacy-preserving data analysis—-also known as statistical disclosure control, inference control, privacy-preserving data mining, and private data analysis—-addresses the problem of how to release statistical information without compromising the privacy of the individual respondents. This expression denotes a set of techniques used in data-driven research to ensure no person or organization is identifiable from the results of an analysis of survey or administrative data, or in the release of microdata, while still releasing the maximal amount of information for further analysis from third-parties [Hun12]. Adding noise to quantitative variables and swapping categorical ones are two basic approaches in this field, and have been studied in a number of papers (see [Rub93] and [Fie98] as starting points). Recently, the most established framework of aggregated data perturbation for privacy protection has become the one of differential privacy [Dwo06].

Differential privacy is a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset. Roughly, an algorithm is differentially private if an observer seeing its output cannot tell if a particular individual's information was used in the computation.

Differentially private algorithms are now used by some government agencies (including the United States Census Bureau [USCB]) to publish demographic information and other statistical aggregates while ensuring confidentiality of survey responses, and by some companies to collect information about user behavior while controlling what is visible even to internal analysts. For example, Google is using differential private algorithms for sharing historical traffic statistics [Ela15] as well as pushing them as a key element for federated learning of AI models [McM18].

The student should overview the existing literature to get a clear understanding of the main concepts and algorithms in the field. He/she should also implement one of the differential privacy algorithms and illustrate its behavior in a toy-example.

Prerequisite

We are looking for a candidate with a strong background on probability and statistics

Other information

This subject is research oriented and can lead to a following internship.

References

[Dwo08] Dwork, Cynthia. "Differential Privacy: A Survey of Results." Theory and Applications of Models of Computation (2008): 1-19.

[Che15] M. Chessa, J. Grossklags, P. Loiseau, A game-theoretic study on non-monetary incentives in data analytics projects with privacy implications. In: Proceedings of the 2015 IEEE 28th Computer Security Foundations Symposium (CSF). pp. 90-104, 2015

[Ela15] A. Eland, Tackling Urban Mobility with Technology, Google Policy Europe Blog, Nov 18, 2015. https://europe.googleblog.com/2015/11/tackling-urban-mobility-with-technology.html

[Fie98] S. E. Fienberg, U. E. Makov, and R. J. Steele, Disclosure Limitation Using Perturbation and Related Methods for Categorical Data, Journal of Official Statistics, Vol. 14, No. 4, 1998, pp. 485- 502

[Gre18] The Cambridge Analytica files: the story so far https://www.theguardian.com/news/2018/mar/26/the-cambridge-analytica-files-the-story-so-far

[Hun12] A. Hundepool, J. Domingo-Ferrer, L. Franconi, S. Giessing, E. Schulte Nordholt, K. Spicer, P. de Wolf, Statistical Disclosure Control, John Wiley & Sons, Jul 5, 2012

[McM18] McMahan, H. B., Ramage, D., Talwar, K., and Zhang, L. Learning differentially private recurrent language models. In International Conference on Learning Representations (ICLR), 2018.

[Rub93] D. B. Rubin, Discussion statistical disclosure limitation, Journal of official Statistics, 1993

[USCB] Protecting Privacy with math https://www.youtube.com/watch?time_continue=6&v=pT19VwBAqKA

Estimating Content Popularity in Cache Networks

Supervisors

Name: Sara Alouf

Mail: sara.alouf@inria.fr

Web page: http://www-sop.inria.fr/members/Sara.Alouf/

Location

Inria Sophia-Antipolis Méditerranée

Address: 2004 route des Lucioles, 06902 Sophia Antipolis

Team: Neo, https://team.inria.fr/neo/

Description

This internship is in the framework of Neo’s research cooperation with Akamai Technologies, the world leader in the field of Content Delivery Networks.

Caching policies try, implicitly or explicitly, to estimate the popularities of the different contents, in order to store those more likely to be requested in the near future. [1] advocates that popularity estimation will play a fundamental role in future cellular networks, while [2] stresses the importance in such scenario to perform the estimation at the right level of the cache hierarchy.

Efficient estimation of popularities can be done with counting extensions [3] of Bloom filters. The specific variant in [4] is conceived to quantify request rates through an auto-regressive filter that can track also time-variant popularities. [5] suggests that the counting error floor (due to false positives) does not allow to evaluate correctly the popularity but for the most popular m contents, where m is the number of counters used. A similar remark on how memory affects estimation quality is in [6]. In [7], the request rate for content i is estimated simply as r_i = 1/T_i where T_i is the most recent time-interval between two consecutive requests. [8] proposes a new caching policy relying on more sophisticated estimation techniques. [9] suggests a novel approach to implicitly estimate popularities that does not require additional memory. [10] presents an interesting framework to estimation techniques by looking

at both the learning rate and the learning accuracy.

The student will read the papers below and compare the popularity estimation techniques proposed in these papers in terms of their algorithmic complexity as well as of their caching performance (e.g. their hit rate). The last aspect will be carried on by simulations using synthetic traffic traces, but also real

ones provided by Akamai Technologies.

Pre-requisite:

The student should have good programming and analytical skills (probability, algorithms).

Other information:

This subject is research oriented and can be continued with a longer internship. In particular, some preliminary results from a Ubinet internship during the previous academic year have questioned the usefulness of neural networks for caching, when very precise popularity information is available [11]. The results could be different in the case when memory constraints require the use of popularity estimators with small memory footprint.

References:

[1] E. Zeydan, E. Bastug, M. Bennis, M. Abdel Kader, I. Alper Karatepe, A. Salih Er, and M. Debbah, “Big Data Caching for Networking: Moving from Cloud to Edge,” IEEE Communications Magazine, Volume: 54 Issue: 9, 2016

[2] M. Leconte, G. Paschos, L. Gkatzikis, M. Draief, S. Vassilaras, S. Chouvardas Placing, “Dynamic Content in Caches with Small Population,” in Proc. of IEEE INFOCOM 2016, San Francisco, USA

[3] A. Broder and M. Mitzenmacher, “Network applications of bloom filters: A survey,” Internet Math., vol. 1, no. 4, pp. 485–509, 2003. [Online]. Available: http://projecteuclid.org/euclid.im/1109191032

[4] G. Bianchi, N. d’Heureuse, and S. Niccolini, “On-demand time-decaying bloom filters for telemarketer detection,” Computer Communication Review, vol. 41, no. 5, pp. 5–12, 2011.

[5] G. Bianchi, K. Duffy, D. J. Leith, and V. Shneer, “Modeling conservative updates in multi-hash approximate count sketches,” in 24th International Teletraffic Congress, ITC 2012, Krakow, Poland, September 4-7, 2012, 2012, pp. 1–8.

[6] G. Neglia, D. Carra, P. Michiardi, Cache Policies for Linear Utility Maximization, Proc. of INFOCOM 2017, Atlanta, GA, USA, 1-4 May 2017

[7] M. Dehghan, L. Massoulie, D. Towsley, D. Menasche, and Y. Tay, “A Utility Optimization Approach to Network Cache Design,” in Proc. of IEEE INFOCOM 2016, San Francisco, USA.

[8] S. Li, J. Xuy, M. van der Schaarz, W. Li, “Popularity-Driven Content Caching,” in Proc. Of IEEE INFOCOM 2016, San Francisco, USA

[9] G. Neglia, D. Carra, M. D. Feng, V. Janardhan, P. Michiardi, and D. Tsigkari, “Access-time aware cache algorithms,” Proceeding of ITC 28, Würzburg, September 2016. BEST PAPER AWARD

[10] J. Li, S. Shakkottai, J. Lui, V. Subramanian, “Accurate Learning or Fast Mixing? Dynamic Adaptability of Caching Algorithms,” CoRR abs/1701.02214 (2017)

[11] V. Fedchenko, G. Neglia, B. Ribeiro, "Feedforward Neural Networks for Caching: Enough or Too Much?," under submission, available upon request.

Building realistic datasets for similarity caching

Supervisors:

Giovanni Neglia, giovanni.neglia@inria.fr, http://www-sop.inria.fr/members/Giovanni.Neglia/

Alain Jean-Marie, Alain.jean-marie@inria.fr, http://www-sop.inria.fr/members/Alain.Jean-Marie/

Location:

Inria Sophia-Antipolis Méditerranée

Address: 2004 route des Lucioles, 06902 Sophia Antipolis

Team: Neo, https://team.inria.fr/neo/

Description:

Caching at the network edge plays a key role in reducing user-perceived latency, in-network traffic, and server load. In the most common setting, when a user requests a given content c, the cache provides c if locally available (hit), and retrieves it from a remote server (miss) otherwise. In other cases, a user request can be (partially) satisfied by a similar content c’. For example, a request for a high-quality video can still be met by a lower resolution version. In other scenarios, a user query is itself a query for contents similar to a given object c. This situation goes under the name of similarity searching, proximity searching, or also metric searching [1]. Similarity searching plays an important role in many application areas, like multimedia retrieval [2], recommender systems [3], [4], genome study [5], machine learning training [6], [7], [8], and serving [9], [10]. In all these cases, a cache can deliver to the user one or more contents similar to c among those locally stored, or decide to forward the request to a remote server. The answer provided by the cache is in general an approximate one in comparison to the best possible answer the server could provide.

The goal of this project is to build realistic datasets that can be used to test some new similarity caching algorithms developed in the team.

We envisage two possible approaches. The first one is to use Netflix challenge's dataset [11] to generate an undirected weighted graph, whose nodes are movies and whose edge weights indicate how close the corresponding movies are.

The other approach is to i) develop a crawler that downloads a set of wikipedia pages and ii) generate an undirected weighted graph, whose nodes are pages and whose edge weights indicate how close the corresponding pages are using some text similarity metric, like cosine similarity of bag-of-words feature vectors.

Pre-requisites:

The student should have good programming and analytical skills (mostly algorithms).

Other info:

This subject is research oriented and can be continued with a longer internship.

References:

[1] Edgar Chavez, Gonzalo Navarro, Ricardo Baeza-Yates, and Jose Luis Marroquin. Searching in metric spaces. ACM Comput. Surv., 33(3):273–321, September 2001.

[2] Fabrizio Falchi, Claudio Lucchese, Salvatore Orlando, Raffaele Perego, and Fausto Rabitti. A metric cache for similarity search. In Proceedings of the 2008 ACM Workshop on Large-Scale Distributed Systems for Information Retrieval, LSDS-IR ’08, pages 43–50, New York, NY, USA, 2008. ACM.

[3] Sandeep Pandey, Andrei Broder, Flavio Chierichetti, Vanja Josifovski, Ravi Kumar, and Sergei Vassilvitskii. Nearest-neighbor caching for content-match applications. In Proceedings of the 18th International Conference on World Wide Web, WWW ’09, pages 441–450, New York, NY, USA, 2009. ACM.

[4] Pavlos Sermpezis, Theodoros Giannakas, Thrasyvoulos Spyropoulos, and Luigi Vigneri. Soft cache hits: Improving performance through recommendation and delivery of related content. IEEE Journal on Selected Areas in Communications, 36(6):1300–1313, June 2018.

[5] Alexander F Auch, Hans-Peter Klenk, and Markus Goker. Standard operating procedure for calculating genome-to-genome distances based on high-scoring segment pairs. Standards in genomic sciences, 2(1):142, 2010.

[6] Jason Weston, Sumit Chopra, and Antoine Bordes. Memory networks. In Proceedings of the International Conference on Learning Representations (ICLR), 2015.

[7] Alex Graves, Greg Wayne, and Ivo Danihelka. Neural Turing Machines. arXiv preprint arXiv:1410.5401, 2014.

[8] Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, and Timothy Lillicrap. Meta-learning with memory-augmented neural networks. In Maria Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 1842–1850, New York, New York, USA, 20–22 Jun 2016. PMLR.

[9] Daniel Crankshaw, Xin Wang, Joseph E Gonzalez, and Michael J Franklin. Scalable training and serving of personalized models. In NIPS 2015 Workshop on Machine Learning Systems (LearningSys), 2015.

[10] Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J. Franklin, Joseph E. Gonzalez, and Ion Stoica. Clipper: A low-latency online prediction serving system. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17), pages 613–627, Boston, MA, 2017. USENIX Association.

[11] Netflix prize https://netflixprize.com/index.html

OCaml tool development for the automatic verification of communicating systems

Supervisors:

Cinzia Di Giusto & Etienne Lozes

etienne.lozes@unice.fr cinzia.digiusto@gmail.com

Location:

I3S - Les Algorithmes - Euclide B

Team: SCALE/C&A

https://www.i3s.unice.fr/fr/node/518 https://team.inria.fr/scale/

Pre-requisites if any:

Notions of finite automata and functional programming are welcome.

Description:

Communicating systems are a simple yet powerful model of message-passing programs. Although such systems only allow two basic operations, namely message sending and message reception, the communication errors of such systems, like message loss or unspecified receptions, are difficult to detect fully automatically due to the asynchrony of communications. Actually, this problem has even been proved undecidable by Brand and Zafiropoulo in the last century [1].

Nevertheless, several tools, like SPIN [2] or CADP [3] were designed for the analysis of communicating systems and are commonly used to detect bugs in industrial software. The major limitation of these tools, however, is that they require communication buffers to be bounded a priori. In particular, these tools miss the bugs that happen for larger buffers than the ones they assumed during the analysis. To overcome this limitation, several theoretical foundations were proposed in the recent years.

The goal of this PFE is to join and actively contribute to the development of an analyzer of communicating systems based on one of these theoretical foundations [4].

The analyzer will be developed reusing bricks of OCaml code written for two other well-documented analyzers, namely Scm and McScm [5].

The exact task of the PFE student in this project will be determined depending on the experience with OCaml programming [6].

References:

[1] Daniel Brand, Pitro Zafiropulo: On Communicating Finite-State Machines. J. ACM 30(2): 323-342 (1983) [2] http://spinroot.com [3] https://cadp.inria.fr [4] Ahmed Bouajjani, Constantin Enea, Kailiang Ji, Shaz Qadeer: On the Completeness of Verifying Message Passing Programs Under Bounded Asynchrony. CAV (2) 2018: 372-391 [5] https://svn.labri.fr/repos/acs/www/redmine/projects/mcscm/wiki.html [6] https://ocaml.org

Enhancing Geolocation with Circular Antenna Arrays in LoRa Low Power Wide Area Networks.

Who?

Name: Walid Dabbous & Thierry Turletti

Mail: walid.dabbous@inria.fr & thierry.turletti@inria.fr

Telephone: 0492387718 & 0492387879

Web page: https://team.inria.fr/diana/team-members/walid-dabbous/

& https://team.inria.fr/diana/team-members/thierry-turletti/

Where? Place of the project: Inria

Address: 2004 route des Lucioles, 06902 Sophia Antipolis

Team: Diana project-team

Web page: https://www.inria.fr/equipes/diana

Pre-requisites if needed: Signal Processing, RF communication, Matlab.

Description: The Internet of Things (IoT) is playing an increasingly

important role today and more than half of major new business systems

are expected to incorporate IoT elements by 2020.

LoRa [1,2] is an emerging communication technology for Low Power Wide

Area Network (LPWAN) which is known to be particularly efficient for

long range communication links (several kilometers) at very low cost.

In a large campus or building environment with dynamic nodes deployment

it is important to localise the end nodes with high accuracy. Previous work in

the Diana team showed that Angle of Arrival (AoA) of the signal can be detected

using MIMO transmission at the gateways [3,4]. The higher the number of antennas,

the better is the estimation of the angle of arrival. A linear virtual antenna array technique

was used to enhance AoA estimation accuracy with a relatively small device size [5].

However, this technique requires gateway mobility and may not be applicable in some situations.

In this PFE, we will study the possibility to extend previous work in two directions:

the first one is by using "circular" antenna arrays. The aim here is to reduce further

the device size and antenna mobility, while still providing higher accuracy AoA estimation.

The second direction is to investigate the use of techniques such as Time Modulated

Arrays [6] to provide spatial diversity and hence higher accuracy without the need for antenna mobility.

Work plan:

The student will start by a state-of-the-art review on LoRa geolocation, virtual antenna arrays and Time Modulated Arrays.

The result of this study is a comparison of the potential benefits of the two techniques.

Then she/he will perform indoor/outdoor tests with circular antenna arrays based on the modules developed by last years interns [3,4].

This PFE study may be continued in an internship and a PhD for excellent students.

References:

[1] Wireless Communications, Andrea Goldsmith, Cambridge University

Press, 2005.

[2] N. Sornin, M. Luis, T. Eirich, T. Kramp, O.Hersent , “LoRa

Specification 1.0,” LoRa Alliance Standard specification.,

2016. https://www.lora-alliance.org/

[3] Geolocation for LoRa Low Power Wide Area Network, Othmane Bensouda

Korachi, Ubinet Internship report. August 2018.

[4] Mahfoudi MN, Sivadoss G, Korachi OB, Turletti T, Dabbous W.

Joint range extension and localization for low-powerwide-area network.

Internet Technology Letters. 2019. https://doi.org/10.1002/itl2.120

[5] Enhancing geolocation accuracy in LoRa Low Power Wide Area Networks,

Ubinet Internship report. August 2019.

[6] Maneiro-Catoira, Roberto et al. “Time Modulated Arrays:

From their Origin to Their Utilization in Wireless Communication

Systems.” Sensors (2017).

Realistic Emulation of Mobile Edge Computing and Vehicular Networks (VANET)

Who? Name: Thierry Turletti & Thierry Parmentelat. Mail: thierry.turletti@inria.fr & thierry.parmentelat@inria.fr

Telephone: 0492387718 Web https://team.inria.fr/diana/team-members/thierry-turletti/

Where?

Place of the project: Inria

Address: 2004 route des Lucioles, 06902 Sophia Antipolis

Team: Diana project-team

Web page: https://www.inria.fr/equipes/diana

Pre-requisites : Good programming skills, Script programming (Python, Bash).

Detailed description:

Context:

The Diana Project-Team at INRIA works on wireless network experimentation platforms. A few years ago we built R2lab[1], a wireless testbed based on an anechoic chamber. The room includes about 40 wireless nodes with RF absorbers that prevent radio waves reflections and a Faraday cage for blocking external interferences. This tesbed represents an ideal environment to perform wireless experiments in a reproducible way, which is key in research. Along with the testbed, we developed the nepi-ng software[2], a python library used to quickly script experiment scenarios that is complete from nodes provisioning to data collection. Very recently, we also designed Distrinet[3], a software that allows running network emulation on a set of nodes. More precisely, this tool makes scalable the well-known mininet[4] network emulator by enabling running the mininet scenario on multiple nodes so that running large-scale scenarios such as real-size Data Centers is now possible.

Work plan:

In this PFE, the student will first analyse features of our testbed[1,2] and emulation tools (mininet[4], mininet-wifi[5] and Distrinet[3]) and learn how to use them. Then, she/he will study how to integrate Mininet-wifi and Distrinet within R2lab to extend the set of possible experimentation scenarios (e.g., adding mobility features with mininet-wifi[5]). Possible target use cases include scenarios with node mobility such as as vehicular networks (VANETs) and Mobile Edge Computing (MEC).

This PFE study may be continued in an internship and a PhD for excellent students.

References:

[1] FIT R2lab Wireless Tesbed : http://fit-r2lab.inria.fr/

[2] Parmentelat, T., Turletti, T., Dabbous, W., Mahfoudi, M. N., & Bronzino, F. "nepi-ng: an efficient experiment control tool in R2lab". In ACM WiNTECH 2018-12th ACM International Workshop on Wireless Network Testbeds, Experimental evaluation & CHaracterization (pp. 1-8), November 2, 2018, New Delhi, India.

[3] Giuseppe di Lena., Andrea Tomassilli., Damien Saucez., Frederic Giroire., Thierry Turletti., and Chidung Lac. "Mininet on steroids: exploiting the cloud for Mininet performance", IEEE CloudNet, 4-6 November 2019, Coimbra, Portugal.

[4] B. Lantz, B. Heller, and N. McKeown. "A network in a laptop: Rapid prototyping for software-defined networks". In ACM SIGCOMM Workshop HotNets, New York, NY, USA, 2010. ACM.

See also Mininet: https://github.com/mininet/mininet

[5] Fontes, Ramon R., et al. "Mininet-WiFi: Emulating software-defined wireless networks." 2015 11th International Conference on Network and Service Management (CNSM). IEEE, 2015.

Cooperative Localization in LoRa Low Power Wide Area Networks

Who?

Name: Walid Dabbous & Thierry Parmentelat

Mail: walid.dabbous@inria.fr & thierry.parmentelat@inria.fr

Telephone: 0492387718

Web page: https://team.inria.fr/diana/team-members/walid-dabbous/

Where?

Place of the project: Inria

Address: 2004 route des Lucioles, 06902 Sophia Antipolis

Team: Diana project-team

Web page: https://www.inria.fr/equipes/diana

Pre-requisites if needed: Signal Processing, RF communication, Matlab.

Description: The Internet of Things (IoT) is playing an increasingly

important role today and more than half of major new business systems

are expected to incorporate IoT elements by 2020.

LoRa [1,2] is an emerging communication technology for Low Power Wide Area

Network (LPWAN) which is known to be particularly efficient for long

range communication links (several kilometers) at very low cost.

MIMO techniques applied in LoRa context were proven beneficial to estimate

the angle of arrival (AoA) of the signal [3]. In order to provide full localisation

information distance information is needed in addition to the AoA. Ranging

techniques can provide distance information based on Time of flight but need

costly synchronisation. Another way to obtain distance information is to use

RSSI measurements.

The goal of this PFE is to explore a way to combine AoA and distance information

to provide precise localisation information in a collaborative way. MIMO equipped

gateways detect the AoA of the signal coming for a target node and ask a number

of relays to collaborate in localising this target node by providing RSSI information.

This work is proposed in the context of the I-LL-WIN project to develop intelligent

wireless IoT networks capable of self-reconfiguration to optimise the application

scenario in collaboration with LEAT. There is a possibility to propose a continuation

of this internship in PhD for excellent students.

Work plan:

The student will start by a state-of-the-art review on LoRa ranging techniques[4].

Then she/he will perform simulations with channel models corresponding to different environments

(Indoor, outdoor) to evaluate the interest of the collaborative localisation in these environments.

This PFE study may be continued in an internship and a PhD for excellent students.

References:

[1] N. Sornin, M. Luis, T. Eirich, T. Kramp, O.Hersent , “LoRa

Specification 1.0,” LoRa Alliance Standard specification.,

2016. https://www.lora-alliance.org/

[2] Augustin, A., Yi, J., Clausen, T., & Townsley, W. M. (2016). A

study of LoRa: Long range & low power networks for the internet of

things. Sensors, 16(9),

1466. http://www.mdpi.com/1424-8220/16/9/1466/pdf

[3] Mahfoudi MN, Sivadoss G, Korachi OB, Turletti T, Dabbous W.

Joint range extension and localization for low-powerwide-area network.

Internet Technology Letters. 2019. https://doi.org/10.1002/itl2.120

[4] An Introduction to Ranging with the SX1280 Transceiver. Semtch document.

https://www.semtech.com/uploads/documents/introduction_to_ranging_sx1280.pdf

Analysis and optimization of drones trajectory in wireless flying ad-hoc networks

Who?

Name: Christelle Caillouet

Mail: christelle.caillouet@unice.fr

Telephone: +33 4 92 38 79 29

Web page: http://www-sop.inria.fr/members/Christelle.Molle-Caillouet/

Where?

Place of the project: COATI, joint project team between Inria and I3S lab

Address: Inria, 2004 route des lucioles, Sophia Antipolis

Team: COATI

Web page: https://team.inria.fr/coati/

What?

Pre-requisites if needed: Linear programming, Algorithmic, Wireless Networks

Description: Recent advances of technology have led to the development of flying drones

that act as wireless base stations to track objects lying on the ground. This kind of robots

(also called Unmanned Aerial Vehicles or UAVs) can be used in a variety of applications

such as vehicle tracking, traffic management and disaster management systems.

Deploying these Unmanned Aerial Vehicles to detect and cover targets is a complex problem

related to network planning with coverage and connectivity constraints, while minimizing

several parameters such that deployment cost, UAV's altitudes to ensure good communication

quality, energy consumed, UAV's move, ...

The project direction is to compute effective drone trajectories in terms of flying distance,

energy consumption, and covered targets. Several optimization can be investigated in order to

optimize the deployement of effective wireless flying ad-hoc networks (FANET).

The guideline of the proposed project is the following :

* Bibliographic analysis and understanding of papers [1] and [2]

* Development of a linear model extending [1] with trajectory modelling and scheduling constraints

* Implementation and analysis of obtained solutions

Useful Information:

[1] C. Caillouet, F. Giroire, T. Razafindralambo, "Efficient Data Collection and Tracking with

Flying Drones", in Ad Hoc Networks, Elsevier, 89(C), pages 35-46, 2019.

[2] L. Di Puglia Pugliese, F. Guerriero, D. Zorbas, T. Razafindralambo, "Modelling the mobile

target covering problem using flying drones", Optimization Letters, Springer Verlag, volume

10(5), pages 1021–1052, June 2016.

This project can be followed by an internship.

Machine learning for dynamic network resource allocation

Advisors : Frédéric Giroire and Hicham Lesfari

Emails : frederic.giroire@cnrs.fr

Laboratory : INRIA Sophia Antipolis, COATI team-project, https://team.inria.fr/coati/

Pre-requisites if any:

Languages:

- Python language absolutely

- Deep Learning libraries (like TensorFlow [6], Keras, rllab, OpenAI Gym) appreciated

Theory:

- Machine Learning, Data Science, particularly Neural Networks theory very recommendable

- Classical optimisation theory (Linear Programming, Dual Optimisation, Gradient Optimisation, Combinatorial Optimization) appreciated

Description :

With the latest wave of Cloud and IoT adoption, a sweeping technological change has been affecting our daily uses and opening up new opportunities for people and businesses. According to Cisco [4], it is estimated that there will be 28.5 billion connected devices by 2022, up from 18 billion in 2017. Furthermore, the appearance of these new heterogeneous devices is leading to a wide range of applications (e.g., wearable activity monitors, autonomous cars, industrial robotics) that can be built to enable smart healthcare, intelligent transportation and smart logistics. However, these kinds of applications are typically latency-sensitive and require intensive computation resources as well as high energy consumption for processing.

In addition, several emerging approaches promise to simplify the management of the network and increase network capability, primarily Software-Defined Network (SDN) and Network Function Virtualization (NFV). By enabling scaled on demand instantiation of Virtual Network Functions (VNFs), they facilitate cost efficiency of the network and deliver more agile programmable networks.

These two paradigms give new opportunities but also lead to challenges of resource provisioning and service placement. The latter have been studied using various optimization methods such as approximation algorithms [1,3], and column generation [2] with theoretically proven performance. Still, due to the high dynamicity of the demands and resources within the IoT environment, traditional algorithms are limited by new arising network reconfiguration challenges. To this end, recent trends in networking are proposing the use of machine learning approaches [5,6] for the control and operation of networks.

A fundamental problem that arises in this context is how to transform a network into a set of features, to be then able to apply classic machine learning technics such as deep neural networks. This problem is referred to as the graph embedding problem. Several solutions have been proposed in the literature, see [7] for a survey.

The goal of the PFE will be to test the efficiency of different network embeddings for dynamic network resource allocation problems.

The PFE can be followed by an internship for motivated students.

Some references:

[1] A. Tomassilli, F. Giroire, N. Huin, and S. Perennes, “Provably Efficient Algorithms for Placement of Service Function Chains with Ordering Constraints”, in Proceedings of IEEE Infocom, 2018.

[2] N. Huin, B. Jaumard, F. Giroire, “Optimization of Network Service Chain Provisioning”, In IEEE/ACM Transactions on Networking (ToN), vol. 26, nb. 3, pp. 1320-1333, 2018.

[3] R. Cohen, L. Lewin-Eytan, J. S. Naor, and D. Raz, “Near optimal placement of virtual network functions”, in Proceedings of IEEE Infocom, 2015.

[4] Cisco visual networking index: Forecast and methodology, 2017–2022.

[5] T. Ouyang, R. Li, X. Chen, Z. Zhou, X. Tang, “Adaptive User-managed Service Placement for Mobile Edge Computing: An Online Learning Approach”, in Proceedings of IEEE Infocom, 2019.

[6] S. Wang, T. Tuor, T. Salonidis, KK. Leung, C. Makaya, T. He, K. Chan, “When Edge Meets Learning: Adaptive Control for Resource-Constrained Distributed Machine Learning”, in Proceedings of IEEE Infocom, 2018.

[7] W. L. Hamilton, R. Ying and J. Leskovec. Representation Learning on Graphs: Methods and Applications. arXiv:1709.05584, Apr. 2018.

Title: Gateway placement optimization in LoRaWAN

Who?

Name: Christelle Caillouet and Walid Dabbous

Mail: christelle.caillouet@unice.fr, walid.dabbous@inria.fr

Telephone: +33 4 92 38 79 29

Web page: http://www-sop.inria.fr/members/Christelle.Molle-Caillouet/

https://team.inria.fr/diana/team-members/walid-dabbous/

Where?

Place of the project: COATI, joint project team between Inria and I3S lab

Address: Inria, 2004 route des lucioles, Sophia Antipolis

Teams: COATI and DIANA

Web page: https://team.inria.fr/coati/

What?

Pre-requisites if needed: Linear programming, Algorithmic, Wireless Networks, LPWAN

Description: LoRa networks enable long range communications at low power and low cost for Internet of Things (IoT) applications.

The performances of such networks depends on various parameters such as the location of the gateways, energy consumption of end

devices and radio configuration of the communications. The goal is to deploy the network gateways in order to ensure the coverage

of end devices, limiting congestion and interferences. Additionnaly the network capacity can be improved by properly allocating radio

ressources like channel bandwidth, spreading factor, coding rate, and transmission power.

In order to maximize the LoRa network capacity, cross-layer optimization approaches have to be investigated. The goal of this project

is to review recent work about planning LoRa networks, and analyze the various parameters to consider for an accurate cross-layer

model to garanty good network performances with a large number of gateways and end devices.

The guideline of the proposed project is the following :

* Bibliographic analysis and understanding of papers [1]-[3]

* Focus on gateway deployment for performance optimization

* Development of a linear model or algorithm

* Implementation and analysis of obtained solutions

Reference:

[1] M. Cesana, A. Redondi, J. Ortin, "A Framework for Planning LoRaWAN Networks", IEEE PIMRC, Sep. 2018.

[2] D. Zorbas, G. Papadopoulos, P. Maille, N. Montavont, C. Douligeris, "Improving LoRa Network Capacity Using Multiple Spreading Factor Configurations", ICT 2018.

[3] C. Caillouet, M. Heusse, F. Rousseau, "Optimal SF Allocation in LoRaWAN Considering Physical Capture and Imperfect Orthogonality, IEEE Globecom, 2019.

Deep Learning for Software Security

Supervision

- Laboratoires I3S
- Supervisor:
  - - Prof. Yves Roudier, yves.roudier@univ-cotedazur.fr
    - Prof. Frédéric Precioso, frederic.precioso@univ-cotedazur.fr

Context

Researchers have defined software security in various ways: Gary McGraw for example described software security as a system-wide issue that takes into account both security mechanisms (such as access control) and design for security (such as robust design that make software attacks difficult) [1]. Greg Hoglund presented it as “an approach to defend against software exploit by building software to be secure in the first place, mostly by getting the design right and avoiding common mistakes" [2].

And despite the variety of definitions, there is a shared characteristic between them all: Software security is an important part of the development process and not just an add-on feature.

According to CERT Coordination Centre (CERT/CC) of SEI¹, about 90% of the reported security issues are a result of exploits in design and development flaws [3]. Most of these flaws are the result of implementation bugs for example buffer overflows, race condition to name a few and poor design style: low cohesion and strong coupling between classes/methods.

Hackers can exploit these system weaknesses even in the presence of the several security techniques like firewalls, intrusion detectors and anti-virus. Thus, software static metrics (which define software quality in terms of design and complexity) can reflect in so many ways the software security vulnerability: more complex code equals less secure and attack-prone system, because complex system are hard to test and more probably to be left untested. And according to McCabe rule: “Miss a Test Path and You Could Get Hacked" complexity is the number one enemy of software security [4].

Related Work

After the attainment of all the quality metrics and the vulnerability, the Empirical Analysis of Static Code Metrics for Predicting Risk Scores in Android Applications [5] presents machine learning algorithms as attractive solutions to predict the number of issues obtained from the vulnerability study based on all the quality metrics issued. As a matter of fact, the article aims to answer three main questions, the first one is to discover the correlation between the quality metrics, the second is to see whether these metrics can produce an effective prediction model for the vulnerability and the final question is about discovering which of the metrics are more relevant in our prediction. These questions are central as well as evaluating the evolution of a predictive model with every enlargement of the considered database. To have a broader idea about the predictive models and accuracy metrics used to judge the models, the project will focus on clarifying the framework in which software vulnerability can be determined with deep learning approaches.

Indeed, deep learning techniques have revolutionized the field of data analysis, impacting almost all scientific domains thanks to amazing performances. This is the same for software vulnerability analysis which has seen many publications of research works involving deep learning [6-12]. A lot of strategies based on deep networks have been proposed using generative adversarial deep networks, using recurrent deep networks, using deep convolutional neural networks, etc.

Project

In this project, we want to explore deep learning solutions for software vulnerability detections. To this end, you will first have to clarify some starting points with an extensive literature study:

• Is there any standard benchmark(s) identified to learn deep models for software vulnerability detection?

• What are the different deep learning based strategies to address which vulnerability problem?

• What are the available codes (github, or other) of these strategies?

Once this first step achieved, we will design together one or few vulnerabilities to tackle with deep learning approaches and an evaluation plan to compare different strategies for the same type of vulnerability.

Skills

- Fluent in Python
- Being interested by more research-oriented projects

Next Steps

This project can be purused with an internship in I3S Lab.

References

[1] G. McGraw, "Software security," in IEEE Security & Privacy, vol. 2, no. 2, pp. 80-83, March-April 2004.

[2] Hoglund, G., & McGraw, G. (2004). Exploiting software: How to break code. Pearson Education India.

[3] CERT. \https://www.sei.cmu.edu/about/divisions/cert/".

[4] McCabe. \www.McCabe.com".

[5] Alenezi, Mamdouh, and Iman Almomani. "Empirical analysis of static code metrics for predicting risk scores in android applications." 5th International Symposium on Data Mining Applications. Springer, Cham, 2018.

[6] Li, Z., Zou, D., Xu, S., Ou, X., Jin, H., Wang, S. & Zhong, Y. (2018). VulDeePecker: A deep learning-based system for vulnerability detection. arXiv preprint arXiv:1801.01681.

[7] Lee, Y. J., Choi, S. H., Kim, C., Lim, S. H., & Park, K. W. (2017, December). Learning binary code with deep learning to detect software weakness. In KSII The 9th International Conference on Internet (ICONI) 2017 Symposium.

[8] Lin, G., Zhang, J., Luo, W., Pan, L., Xiang, Y., De Vel, O., & Montague, P. (2018). Cross-project transfer representation learning for vulnerable function discovery. IEEE Transactions on Industrial Informatics, 14(7), 3289-3297.

[9] Dam, H. K., Tran, T., Pham, T. T. M., Ng, S. W., Grundy, J., & Ghose, A. (2018). Automatic feature learning for predicting vulnerable software components. IEEE Transactions on Software Engineering.

[10] Ban, X., Liu, S., Chen, C., & Chua, C. (2019). A performance evaluation of deep‐learnt features for software vulnerability detection. Concurrency and Computation: Practice and Experience, e5103.

[11] Harer, J., Ozdemir, O., Lazovich, T., Reale, C., Russell, R., & Kim, L. (2018). Learning to repair software vulnerabilities with generative adversarial networks. In Advances in Neural Information Processing Systems (pp. 7933-7943).

[12] Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Yawei Zhu, Zhaoxuan Chen, Sujuan Wang, Jialai Wang, "SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities", IEEE Dataport, 2018.

1^. SEI is the coordination center of the computer emergency response team (CERT) for the Software Engineering Institute (SEI), a non-profit United States federally funded research and development center.

Design of a reconfigurable rectenna for adaptive RF energy harvesting systems

Supervisor:

Leonardo Lizzi, leonardo.lizzi@univ-cotedazur.fr, +33489154436

Location:

LEAT, 930 Route des Colles, 06903 Sophia Antipolis

Description:

Ambient RF energy harvesting is an attractive and green solution to provide sufficient energy to power small IoT devices and sensors. This technique recycles ambient RF energy associated to the electromagnetic waves that originate from mobile cell phone towers or abundantly available Wi-Fi routers and access points.

RF energy harvesting systems rely on efficient and well-designed “rectennas” that are the combination of an antenna and a rectifier. One of the main challenges in designing an efficient rectenna is the incorporation of an appropriate matching network that achieves adequate matching between the Schottky diode used in the rectifier and the antenna. However, the Schottky diode’s impedance varies with both frequency and input power. This implies that a reference design point must be chosen and any variation of the operating frequency or the input power level will results in a sub-optimal efficiency.

The aim of the PFE research activity is the design of a reconfigurable antenna solution capable of modifying its input impedance to adapt to the variation of the Schottky diode characteristics. Towards this end, an active reconfigurable component will be integrated into the antenna structure. The choice of the specific component will be made by the students based on the analysis of the state of the art of the domain. During the PFE, one of student will be focus on the design of the antenna geometry, while the second one will concentrate on the control of the reconfiguration mechanism.

Pre-requisites:

Antenna basics

References:

[1] A. Eid et al., "A Flexible Compact Rectenna for 2.40Hz ISM Energy Harvesting Applications," 2018 IEEE International Symposium on Antennas and Propagation & USNC/URSI National Radio Science Meeting, Boston, MA, 2018, pp. 1887-1888. doi: 10.1109/APUSNCURSINRSM.2018.8608525

[2] A. Eid et al., "An efficient RF energy harvesting system," 2017 11th European Conference on Antennas and Propagation (EUCAP), Paris, 2017, pp. 896-899. doi: 10.23919/EuCAP.2017.7928573