2018-2019

Title: A web application to enable biologists to aggregate data from multiple biodiversity data sources (Wimmics)

Who?

Name: Catherine Faron, Franck Michel

Mail: faron@i3s.unice.fr, franck.michel@cnrs.fr

Telephone: Web page:

Where?

Place of the project: I3S lab, Campus SophiaTech, Building Templiers 1, 4th floor

Address:

Team: SPARKS / Wimmics

Web page: http://wimmics.inria.fr/

Pre-requisites if needed:

- skills in Web programming

- knowledge or interest in Semantic Web standards

Description:

This PFE takes place within a collaborative project between the I3S lab and the Museum of Natural History in Paris. The Museum has developed a web application for biologists to edit information about living species known in France and overseas territories. In this respect, a common problem is to confront the information they have with related information coming from other web portals and data aggregators in the biodiversity area. To solve this problem, developers have to write specific pieces of code for each of the Web APIs they want to get data from.

In this PFE, the students will develop a prototype web application that investigates how Semantic Web technologies can help do this data integration in a more uniform manner. Using an existing framework to query JSON-based Web APIs with the SPARQL query language, they will develop an application that aggregates information from multiple Web APIs, compares this information with the one existing in the Museum database, and suggests updates to the biologists. The information concerned consists of species names and bibiographic references, media such as photos discovered on Flickr or audio/video recordings, life traits (e.g. body size and length, total life span), etc.

Useful Information:

As an example, this web page is generated dynamically by gathering data from multiple Web APIs within a single SPARQL query: http://sms.i3s.unice.fr/demo-sms?param=Delphinapterus+leucas

The project will rely on the same technology to go further: develop a web application that not only displays information but can compare information and enable a biologist to decide which information is relevant and integrate it into a local database.

These scientific papers present the results of our previous work showing how Semantic Web technologies can ease the development and maintenance of a web application when it comes to integrate new data sources:

[Michel et al. 2018a] SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data. LDOW 2018. https://hal.archives-ouvertes.fr/hal-01722792

[Michel et al. 2018b] Integration of Biodiversity Linked Data and Web APIs using SPARQL Micro-Services. Biodiversity Information Standards (TDWG). https://hal.archives-ouvertes.fr/hal-01856365

Title: Study and Development of tools for annotating artifacts used to develop interactive systems (Sparks)

Who?

Name: Marco Winckler

Mail: winckler@unice.fr

Telephone: +33 (0)4.89.15.42.99

Web page: http://www.i3s.unice.fr/~winckler/

Where?

Place of the project: I3S Laboratory, Polytech

Address: Site des Templiers, Bât. ESSI, Bureau 446, 930 Route des Colles, BP 145 | 06903 Sophia Antipolis Cedex, France

Team: SPARKS Team

Web page: https://sparks.i3s.unice.fr/

Pre-requisites if needed: Basic knowledge on User Centered Design process and (at least) curiosity o Human-Computer Interaction is required.

Description: This PFE is aimed at improving the annotation process of artifacts (such as user interface prototypes, dialog models, architectural models, data models, task models, etc) used along the development process of interactive systems. It is is motivated by the fact that lots of information used to make decision along the development process of interactive systems are not necessary connected to the artifacts describing the systems. The PFE requires a review of tools and approach for annotating artifacts. Moreover, it also requires the design and the development a proof of concept tool that allows to annotate models and trace annotations to multiples all artifacts where such annotations are relevant.

Useful Information:

• Jean-Luc Hak, Marco Winckler, and David Navarre. 2016. PANDA: prototyping using annotation and decision analysis. In Proceedings of the 8th ACM SIGCHI Symposium on Engineering Interactive Computing Systems (EICS '16). ACM, New York, NY, USA, 171-176. DOI: http://dx.doi.org/10.1145/2933242.2935873

• Thiago Rocha Silva, Jean-Luc Hak, Marco Antonio Winckler, Olivier Nicolas. A Comparative Study of Milestones for Featuring GUI Prototyping Tools. Dans : Journal of Software Engineering and Applications, Scientific Research Publishing, Irvine - USA, Vol. 10 N. 06, p. 564-589, 2017. http://doi.org/10.4236/jsea.2017.106031

• Marisela Gutierrez, Gustavo Rovelo, Mieke Haesen, Kris Luyten and Karin Coninx. Capturing Design Decision Rationale with Decision Cards. In proceedings of INTERACT 2017, September 2017, Lecture Notes in Computer Science . DOI: 10.1007/978-3-319-67744-6_29

Title: Study and Development of visualization techniques for tracing requirements and decisions along the development process of interactive systems (Sparks)

Who?

Name: Marco Winckler

Mail: winckler@unice.fr

Telephone: +33 (0)4.89.15.42.99

Web page: http://www.i3s.unice.fr/~winckler/

Where?

Place of the project: I3S Laboratory, Polytech

Address: Site des Templiers, Bât. ESSI, Bureau 446, 930 Route des Colles, BP 145 | 06903 Sophia Antipolis Cedex, France

Team: SPARKS Team

Web page: https://sparks.i3s.unice.fr/

Pre-requisites if needed: Basic knowledge on User Centered Design process and (at least) curiosity o Human-Computer Interaction is required.

Description: This PFE is aimed at investigating visualization techniques tools for following the evolution of requirements (raised by clients and users) and the decisions and actions made (by the development team) along the development process of interactive systems. The PFE requires a review of existing tools (especially those used in agile processes). Moreover, it also requires the design and the development a proof of concept tool that allows to visualize the evolution of requirement and decisions along the multiple iterations. The development should be done using the D3.JS framework.

Useful Information:

• Data-Driven documents. https://d3js.org/

• Telea, A. C., Voinea, L., & Sassenburg, H. (2010). Visual Tools for Software Architecture Understanding: A Stakeholder Perspective. Ieee software, 27(6), 46-53. Also available at: www.cs.rug.nl/~alext/PAPERS/IEEESW10/paper.pdf

• Towards an Integrated Web-based Visualization Tool. Available at: http://www.ep.liu.se/ecp/065/009/ecp11065009.pdf

....... .

..

Title: ElectroSmart: Taking care of people sensitive to electromagnetic waves (Diana)

Who?

Name: Arnaud Legout (Inria)

Mail: arnaud.legout@inria.fr

Telephone: 04 92 38 78 15

Web page: http://www-sop.inria.fr/members/Arnaud.Legout/

Where?

Place of the project: Inria Sophia Antipolis

Address: 2004 route des Lucioles

Team: DIANA

Web page: https://team.inria.fr/diana/

Pre-requisites if needed: Python programming, Android programming, statistical analysis

(depending on the task)

Description:

The Internet and new devices such as smartphones have fundamentally

changed the way people communicate, but this technological revolution

comes at the price of a higher exposition of the general population to

microwave electromagnetic fields (EMF). This exposition is a concern

for health agencies and epidemiologists who want to understand the

impact of such an exposition on health, for the general public who

wants a higher transparency on its exposition and the health hazard it

might represent, but also for cellular operators and regulation

authorities who want to improve the cellular coverage while limiting

the exposition. Despite the fundamental importance to understand the

exposition of the general public to EMF, it is poorly understood

because of the formidable difficulty to measure, model, and analyze

this exposition.

The goal of the ElectroSmart project is to develop the instrument,

methods, and models to compute and analyze the exposition of the general public to

microwave electromagnetic fields used by wireless protocols and

infrastructures such as Wi-Fi, Bluetooth, or cellular. Then using crowd based

supervised learning, we want to propose personalized recommendations to

our users.

We currently have an Android application deployed in Google Play : ElectroSmart.

We have 91k downloads, a mark of 4.4/5, and 1 billion measurements. We have a team of 5

persons working full time on the project and we are in the process of

creating a startup. This PFE will take place in that context. We can

propose a broad spectrum of subjects that we will adapt depending on

the competencies of the candidate. Possible subjects are: i) making

data science on the huge amount of collected data to understand the

exposition of persons (requirements: Python, statistical analysis, machine learning),

ii) make android development to improve the ElectroSmart application

(requirements: Android), iii) contribute to calibration experiments in

an anechoic chamber (requirements: Electromagnetic fields knowledge,

physical experimental skills)

You can find details on the ElectroSmart project on

https://es.inria.fr/

Useful Information:

This PFE might be continued with an internship, a Ph.D. thesis or

an engineering position in the startup for excellent candidates.

Title: Automatic parser generator for real-time networking languages (Diana)

Who?

Name: Damien Saucez

Mail: Damien.Saucez@inria.fr

Telephone: +33 4 89 73 24 18

Web page: https://team.inria.fr/diana/team-members/damien-saucez/

Where?

Place of the project: Inria Sophia Antipolis

Address: 2004 route des Lucioles, 06560 Valbonne

Team: DIANA

Web page: https://team.inria.fr/diana/

Pre-requisites if needed: Fair knowledge of network protocol (e.g., Ethernet,

IP, MPLS); fair knowledge of computer language principles (e.g., programming

paradigms, compiler and interpreter structure); Fair understanding of formal

verification and logic. Good level of programming in Python or Java.

Description:

Recent manufacturing trends have highlighted the need to adapt to volatile,

fast-moving and customer-driven markets. As a result, the "factory of the

future" will constitute a large cyber-physical system and it's spine will be a

programmable network. The idea of programmable networks is becoming a reality

in the Internet and in Data-Centers but not in the case of industrial networks.

The main reason being that industrial networks require strict determinism and

real-time guarantees, which are not required in the other types of networks. In

this PFE, we will make a comprehensive study of the different programming

languages that exist to program networks and the programming languages that

exist to program real-type cyber-physical systems. Based on the comprehensive

understanding of their striking features, we will sketch a proven generic

parser that will accept the feature set of the two type of languages (i.e.,

networking systems + real-type cyber-physical systems). The novelty of this

work lays in the fact that we aim at proposing an automatic parser generator

that is proven. There is indeed a plethora of automatic parser generators, but

most of them are not proven to be correct and none can cover the requirements

incurred by both network and cyber-physical real-time programming languages.

Useful Information: 1 student

Title: Building a Nvidia Jetson TK1 Cluster for image detection machine learning algorithms (Sparks)

Who?

Name: Michel RIVEILL

Mail: michel.riveill@univ-cotedazur.fr

Telephone: 06 15 61 34 49

Web page: http://www.i3s.unice.fr/~riveill

Where?

Place of the project: I3S Laboratory

Address: Polytech Building, 930 route des Colles, 06900 Sophia Antipolis

Team: Sparks

Web page: https://sparks.i3s.unice.fr/

What?

Pre-requisites if needed:

    • MPI, OpenCV, Python will be used in the internship period.

    • Basic knowledge of Linux Terminal will be helpful.

Description:

The objective of the internship is to build a Jetson cluster, portable and usable for machine learning algorithms if possible written in Python.

We already have experience using a cluster using the TensorFlow programming library, which supports transparent but inefficient data placement. If this approach is relevant in cases where resources (memory, CPU, GPU) are abundant, it becomes problematic when using modules with low capacities. Unfortunately, this is the case with the Jetsons.

Indeed, Jetson is an excellent IA computing platform for deploying high-performance parallel computing on GPUs in Embedded mobile solutions. Indeed, its high computing capacities and low power consumption for Deep Learning make it the ideal solution for Embedded projects requiring intensive computing, especially for image processing. Its low memory capacity becomes problematic when you want to process very large image databases, hence the advantage of being able to compare several Jetsons.

As part of this project, we would like to experiment with the gain we could achieve by using an MPI-based approach, which allows us to distribute the tasks on the different nodes of the cluster rather than leaving it to TensorFlow to make this placement.

Useful Information:

In recent years, the field of in-depth learning has grown exponentially, both in terms of the number of articles published each year and in terms of public attention. What is interesting in this area is the performance of neural networks in classifying different types of data such as images, audio and video.

One of the best advantages of in-depth learning over other techniques is that it has proven to be very effective on raw data, i.e. it does not apply heavy pre-processing, but rather feeds the raw data directly to a learner (i.e. a machine learning model) in the hope that he or she will learn to somehow characterize the target's properties to solve a given problem (e.g. classification or prediction). This approach is often referred to as "End to End Machine Learning" because it is based on the idea of letting the model automatically learn all the necessary features during all or most of the problem solving steps. It is also called "agnostic approach", because the development of this type of model does not require specific

knowledge related to the field.

This internship is part of a series of projects whose objective is to propose a medical data processing chain. The use of a low-cost (and low-energy footprint) GPU cluster is driven by the need to preserve data confidentiality and one way to address this issue is to be able to process data in situ at the production site.

References:

For cluster building:

For eHealth machine learning

    • Edward Choi, Mohammad Taha Bahadori, and Jimeng Sun. “Doctor AI: Predicting Clinical Events via Recurrent Neural Networks”. In: CoRR abs/1511.05942 (2015). arXiv: 1511.05942. URL: http://arxiv.org/abs/1511.05942.

    • Riccardo Miotto et al. “Deep patient: an unsupervised representation to predict the future of patients from the electronic health records”. In: Scientific reports 6 (2016), p. 26094.

    • Benjamin Shickel et al. “Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis”. In: arXiv preprint arXiv:1706.03446 (2017).

Title: Enabling large scale network experiments with Mininet (SigNet)

Who?

Dino Lopez

dino.lopez@unice.fr

http://www.i3s.unice.fr/~lopezpac

Where?

SigNet team / I3S Laboratory

Les Algorithmes

Bât. Euclide B - BP 121

2000, route des Lucioles

06903 Sophia Antipolis Cedex

Web : http://signet.i3s.unice.fr/

Description:

Mininet [1,2] enables to emulate a complete network with hosts, switches and routers within a single or several physical servers. It relies on Linux namespaces, which form the basis of container solutions like Docker, to achieve an efficient emulations of real networks.

Mininet is currently widely used in the networking research community to evaluate the performance of current network protocols, as well as new (and obviously exciting) proposals. Indeed, the promise of obtaining experimental results close to the real life with the help of emulation technologies. The ability to experiment with large network scenarios by employing a few physical servers has motivated the wide adoption of Mininet by researchers. Hence, we observe a move from classical simulators (ns-2 or ns-3), or from expensive hardware-based network testbeds to virtual testbeds based by Mininet.

In the SigNet team, we have extensively relied on Mininet to evaluate some of our proposals. However, according to our experience, with Mininet, one needs to carefully tune the virtual network parameters to obtain sound results. For instance, we have observed that Mininet might fail when one needs to setup networks with heterogeneous link capacities, hence leading to "strange" results (e.g. the observed end-to-end delay might be incredibly large sometimes). More generally, the research community will benefit from a more in-depth understanding of the limits, lacks and the underpinning technologies of Mininet.

The objective of this PFE is to conduct a large set of experiments in order to understand the limits of Mininet. The student is expected to explore (i) several network topologies, i.e. different Data Center topologies and ISP topologies; (ii) several network link parameters, i.e. different bandwidth capacities and delays; and (iii) different network sizes. A deep exploration of the underlying technologies in Mininet and its relation with the Linux kernel is also suitable as it will explain also the root of the Mininet's inaccuracies. Later, once the limitations of Mininet have been identified, the student shall provide solutions to solve or mitigate its impact on the experimental results.

Useful Information:

[1] Brandon Heller. Reproducible network research with high-fidelity emulation. Ph.D. Thesis, Stanford University, 2013.

[2] http://mininet.org/

Title: Understanding Web Tracking using Graph Mining Technics (Coati/Indes)

Who?

Frédéric Giroire and Nataliia Bielova

Emails: frederic.giroire@inria.fr, nataliia.bielova@inria.fr

Where?

Laboratory: COATI project - INRIA (2004, route des Lucioles – Sophia Antipolis)

Web Site:

http://www-sop.inria.fr/members/Frederic.Giroire/

http://www-sop.inria.fr/members/Nataliia.Bielova/

Pre-requisites if any:

Knowledge and/or taste for graph algorithms, web, and data mining

Description:

When a user browses the web, information about her interest and/or profile are collected by a large number of advertising companies. Moreover, these advertising companies are exchanging information between them. This makes it very hard for an Internet user to control the use of her personal information. Our aim is to understand how the advertising companies exchange data with the goal of helping users to regain control.

To this end, we have extracted tens of millions of small chains describing the interactions happening between advertising companies when users browse the web. We are interested in using graph algorithms and data mining technics to analyze these interactions. In particular, we want to categorize the millions of small interaction chains .

The work will be done following several steps:

    1. Test classical algorithms to compare graphs (isomorphism and graph edit distance) to determine the number of different interaction chains and/or the list of frequent chains or subchains.

    2. Test data mining methods such as graph kernels and clustering techniques to categorize them.

The PFE may be followed by an internship for interested students.

Keywords: data mining, web privacy, graph algorithms, big data.

Title: Root cause analysis of measurement cost in virtualized environments (Diana/SigNet)

Who?

Name: Chadi Barakat, Guillaume Urvoy-Keller

Mail: Chadi.Barakat@inria.fr, urvoy@unice.fr

Web page: http://team.inria.fr/diana/chadi/

Web page: http://www.i3s.unice.fr/~urvoy/

Where?

Place of the project: Inria Sophia Antipolis

Address: 2004, route des lucioles, 06902 Sophia Antipolis

Team: Diana project-team

Web page: http://team.inria.fr/diana/

Pre-requisites : if needed: system programming skills

Description:

The current trend in application development and deployment is to package applications and services within containers or virtual machines. This results in a blend of virtual and physical resources with complex network interconnection schemas mixing virtual and physical switches along with specific protocols to build virtual networks spanning over several servers. While the complexity of this setup is hidden by private/public cloud management solutions, e.g. OpenStack, this new environment constitutes a challenge when it comes to monitor and debug performance related issues. In this project, we are interested in the problem of measuring traffic in a virtualized environment and in assessing the cost of this measurement in terms of the access to the physical resources of the server and the impact on the efficiently of the virtual machines themselves. In a recent contribution that will be presented this October at the Cloudnet 2018 conference [1], we have established the presence of such a cost and have evaluated its amplitude. The cost manifests itself in the form of slowness of the virtual machines and the applications that run therein. There is however the need to reach a clear understanding of the origin of this slowness, whether it comes from a competition on the server resources, or slowness of the measurement path between the virtual switch (in this case, OvS [2,3]) and the measurement tool itself. We would like this PFE to explore this issue and pinpoint to the root causes of this slowness in the application traffic when measurement is enabled on board of the virtual switch.

The PFE will first go via the state of the art on the topic (e.g. [4]), got familiarized with our previous work done so far, then set up an experimental roadmap to shed light on the root causes of traffic slowness. Once these root causes are identified, the plan is to build upon this finding to propose solutions to reduce the cost of measurement and pinpoint the optimal configuration of measurement plane that balances between accuracy of measurements and impact on the data plane of the virtual machines. This latter plan can be developed in a master internship that follows the PFE.

[1] Karyna Gogunska, Chadi Barakat, Guillaume Urvoy-Keller, Dino Lopez Pacheco, “On the Cost of Measuring Traffic in a Virtualized Environment“, in proceedings of IEEE CloudNet, Tokyo, October 2018. Available at https://hal.inria.fr/hal-01870293

[2] http://www.openvswitch.org/

[3] Ben Pfaff, Justin Pettit, Teemu Koponen, Ethan J. Jackson, Andy Zhou, Jarno Rajahalme, Jesse Gross, Alex Wang, Joe Stringer, Pravin Shelar, Keith Amidon, Martín Casado:

The Design and Implementation of Open vSwitch. NSDI 2015: 117-130

[4] Paul Emmerich, Daniel Raumer, Sebastian Gallenmüller, Florian Wohlfart, Georg Carle:

Throughput and Latency of Virtual Switching with Open vSwitch: A Quantitative Analysis. J. Network Syst. Manage. 26(2): 314-338 (2018)

Title: Dealing with uncertainty and logical clocks(Kairos)

Who?

Name: Frederic Mallet & Julien DeAntoni

Mail: Frederic.Mallet@univ-cotedazur.fr

Telephone: 04 92 38 79 66

Web page: http://www-sop.inria.fr/members/Frederic.Mallet/

Where?

Place of the project: Equipe Kairos, I3S/INRIA

Address: Lagrange 2004 route des Lucioles

Team: Kairos

Web page: http://team.inria.fr/kairos/

Description: pCCSL builds on CCSL to deal with uncertain behaviors. This is done by a fine characterization of the flexibility given by logical clocks through the use of stochastic descriptios. While the ground for theoretical foundations has been laid, the implementation is still partial. The goal of the projects is to potentially restrict the expressiveness of CCSL to allow for practical implementations and verification. The first step is to understand the operational semantics of CCSL and adapt it to deal with the stochastic description. This can be done either with a deep embedding inside TimeSquare or through a light one by defining a proper simulation policies. Both solutions should be explored. An export to PRISM statistical model-checker is a third option to be explored. This project is meant for one student but can accomodate 2 students with a good coordination.

Useful Information:

- Dehui Du, Ping Huang, Kaiqiang Jiang and F. Mallet. pCSSL: a Stochastic Extension to MARTE/CCSL for Modeling Uncertainty in Cyber Physical Systems. Science of Computer Programming, 2018. DOI

- TimeSquare: timesquare.inria.fr

Title: Deployment and scalability of a web service composition architecture (Scale)

Who?

Name: Alain Tchana

Mail: alain.tchana@unice.fr

Telephone: 06 19 56 04 52

Web page:

Where?

Place of the project: I3S lab

Address: 2000, route des Lucioles - Les Algorithmes - bât. Euclide B 06900 Sophia Antipolis - France

Team: Scale

Web page:

Description:

Service-oriented architectures are based on 3 entities: a client, a service provider, and a registry. The service provider places information about its services within the registry. When a client needs a service, she sends a request to the registry. The latter gives her information on the availability of the service and those that allow her to access it. These different exchanges are done through a set of protocols and standards of web technologies. One of the major benefits of the SOA paradigm is service reuse. Indeed, when a service has already been developed, another developer can integrate it into a process of its application. It may therefore happen that a service is combined with one or more other services to provide a more complex service. This method is referred to as service composition.

Service composition can be static or dynamic based on the time of construction of its path. It can be automatic if it is done by a computer or manual when done by the developer.

The first objective the internship is to deploy an experimental application relying on both static and dynamic composition. Secondly, the student will evaluate the performance of the two service composition methods. The third objective (if enough time) is the improvement of the composition engine in order to make it scalable.

Title: Logical Clocks for Real-Time Schedulability through SMT(Kairos)

Who?

Name: Frederic Mallet & Marie-Agnes Peraldi-Frati

Mail: Frederic.Mallet@univ-cotedazur.fr

Telephone: 04 92 38 79 66

Web page: http://www-sop.inria.fr/members/Frederic.Mallet/

Where?

Place of the project: Equipe Kairos, I3S/INRIA

Address: Lagrange 2004 route des Lucioles

Team: Kairos

Web page: http://team.inria.fr/kairos/

Description:

Using SMT allows for efficient reasoning on logical time specifications. Some specific nice configurations allows to reduce general (infinite) logical time specifications into finite representations. The project should explore the combination of CCSL and SMT to address a generic problem of task schedulability. While traditional schedulability analysis relies on specific task models and analytical results (Liu Layland 1976) others rely of timed specification to propose generic framwork (eg. TIMES). Most alternative solutions rely on physical time model with over-constrained specifications. To relax these constraints, one usually need to use parametric approaches. Logical Time can play an intermediate role by providing the same flexibility as parametric approaches without having their prohibitive cost. This project should explore this possibility. The subject is meant for one student willing to continue as a research project.

Useful Information:

    • Min Zhang, Feng Dai and F. Mallet.Periodic scheduling for MARTE/CCSL: Theory and practice. Science of Computer Programming 154:42-60, Mar. 2018. DOI

    • Min Zhang, Yunhui Ying: Towards SMT-based LTL model checking of clock constraint specification language for real-time and embedded systems. LCTES 2017: 61-70

    • TimeSquare: timesquare.inria.fr

Title: Empirical Analysis of Docker Networking Solutions (SigNet)

Who?

Name: Guillaume Urvoy-Keller, Dino Lopez-Pacheco

Mail: urvoy@i3s.unice.fr, lopezpac@i3s.unice.fr

Telephone: https://annuaire.unice.fr/ ;-)

Web page: http://www.i3s.unice.fr/~urvoy/, http://www.i3s.unice.fr/~lopezpac/

Where?

Place of the project: I3S

Address: 2000, route des Lucioles - Les Algorithmes - bât. Euclide B 06900 Sophia Antipolis - France

Team: SigNet

Web page: http://signet.i3s.unice.fr/

Pre-requisites if needed: scripting in bash/python to set up experiments,

collect measurements and exploit.

Description: Modern applications are often modular with clear interfaces between

components, e.g. REST APIs. A natural deployment method is to use containers.

Container solutions such as Docker [1] rely on Linux (and other OS) ability to

logically separate groups of processes, assign them specific volumes/user id

space, network stack and also to specify the amount of resources they can use

(e.g. how many cores, etc). At the end, containers are ligther deployment solution

than virtual machines, even though both can be combined in practice.

Typical deployment of a containerized application leads to create many containers,

on the same or different physical (or virtual) hosts. As they need to communicate

with one another, Docker and other projects provide different means to interconnect

containers (NAT, local bridge, VLXAN overlays).

The objective of this project is twofold. First, perform an extensive state of

the art on the performance of container networking solutions, e.g. [2] as well

software switches (Linux bridge [3] or OvS[4,5]). Second, one would like also to

explore the combination of OvS with DPDK [6], which enables

to boost switching performance by creating a direct path between the physical

NIC and the virtual switch. DPDK attracts a lot of attention (e.g. refer to the last OvS

conference [7]) as it promises near line-rate performances when OvS is combined

with Network Function Virtualization (NFV).

Useful Information:

[1] https://www.docker.com/

[2] Suo, Kun, et al. "An Analysis and Empirical Study of Container Networks." Proceedings of IEEE Conference on Computer Communications (INFOCOM). 2018.

[3] Varis, Nuutti. "Anatomy of a Linux bridge." Proceedings of Seminar on Network Protocols in Operating Systems. 2012.

[4] http://www.openvswitch.org/

[5] Pfaff, Ben, et al. "The Design and Implementation of Open vSwitch." NSDI. Vol. 15. 2015.

[6] https://www.dpdk.org/

[7] http://www.openvswitch.org/support/ovscon2017/

Title: Dynamic batch-sizing for machine learning problems (Neo)

Who?

Name: Giovanni Neglia, Chuan Xu

Mail: giovanni.neglia@inria.fr, chuan.xu@lri.fr

Web page: http://www-sop.inria.fr/members/Giovanni.Neglia/, https://www.lri.fr/~chuan/

Where?

Place of the project: Inria

Address: 2004 route des Lucioles

Team: NEO

Web page: https://team.inria.fr/neo/

Description:

Typically, iterative Machine Learning (ML) distributed algorithms begin with a guess of an optimal vector of parameters and proceed through multiple iterations over the input data to improve the solution. The process evolves in a data-parallel manner: the input data is divided among worker threads, each of which iterates over its data subset and determines solution adjustments based on its local view of the latest parameter values.

In particular, many ML distributed computing frameworks, like DistBelief, Project Adam, and MXNet, rely on the parameter-server’s paradigm to scale training across multiple machines. This paradigm comprises workers, that perform the bulk of the computation, and a stateful parameter server that maintains the current version of the model parameters and could be itself distributed across different machines. Workers can use stale versions of the model to compute “delta” updates of the parameters, that are then aggregated by the parameter server and combined with its current state.

In most cases, the model updates are based on a mini-batch approach: each worker randomly collects a small subset of samples (a batch) and uses them to compute a noisy estimate of the gradient. The size of the mini-batch is a fundamental parameter determining the convergence speed of the algorithm. It is usually tuned empirically through trial-and-error, but, recently, dynamic batch-sizing algorithms have been proposed.

The goal of this project is to 1) understand the effect of the batch size on convergence speed, 2) overview the existing literature on dynamic batch-sizing (starting from the papers listed below), 3) implement some of the algorithms and test them.

Pre-requisites:

The student should have good programming and analytical skills (probability, algorithms). Some background on stochastic gradient methods is provided during the first lessons of the course "Distributed optimization and games." Attending the course is then definitely a plus for this project.

Useful Information:

This subject is research oriented and can be continued with an internship.

References

- Scaling Distributed Machine Learning with the Parameter Server, Mu Li, Jun Woo Park, Alexander J. Smola, et al, OSDI 14

- Optimization Methods for Large-Scale Machine Learning, Leon Bottou, Frank E. Curtis, Jorge Nocedal, SIAM Rev., 60(2), 223–311

- Cost-Sensitive Approach to Batch Size Adaptation for Gradient Descent, Matteo Pirotta, Marcello Restelli, NIPS 2016

- Don't decay the learning rate, increase the batch size, Samuel L. Smith, Pieter-Jan Kindermans, Chris Ying, Quoc V. Le, ICLR 2018

- Coupling Adaptive Batch Sizes with Learning Rates, Lukas Balles, Javier Romero, Philipp Hennig, UAI 2017

- Train longer, generalize better: closing the generalization gap in large batch training of neural networks, Elad Hoffer, Itay Hubara, Daniel Soudry, NIPS 2017

- Adaptive Stream Processing using Dynamic Batch Sizing, Tathagata Das, Yuan Zhong, Ion Stoica, Scott Shenker, SOCC 2014

Title: Modeling WebRTC Data Channels (for computer simulations) (Comred)

Who?

Name: Olivier DALLE

Mail: olivier.dalle@univ-cotedazur.fr

Telephone: +33 (0)603-92-19-14

Web page: https://www.olivier-dalle.fr/

Where?

Place of the project: Laboratoire I3S

Address: Batiment Algorithmes, 2000 route des Lucioles, Sophia Antipolis

Team: COMRED

Web page: https://www.i3s.unice.fr/

Pre-requisites if needed: Python, C/C++, node.js programming, socket programming

Description:

WebRTC[1] is a common project of the major web browser vendors (Google/Chrome, Mozilla/Firefox, Opera, ...) to implement real-time multimedia peer-to-peer communication within their browsers. The main applications of WebRTC include browser-to-browser direct video-conferencing and real-time data exchanges.

In this work, we are more specifcally interested in the so-called Data Channels of WebRTC. As their name suggests, Data Channels are meant for exchanging any kind of (raw/binary) data.

The protocol stack chosen for implementing WebRTC Data Channels is composed of three layers, namely SCTP, DTLS, and ICE/UDP [2].

SCTP is a (superior) successor for TCP to provide end-to-end transport service. DTLS provides an end-to-end crypted secured channel, and ICE/UDP is a set of protocols and services designed for NAT traversal, using "hole punching" techniques and dedicated servers.

The main goal of this PFE is to build an empirical model of WebRTC Data Channels communications using simple benchmarking techniques.

This work will help to build a new simulator for our research on large-scale peer-to-peer communications using WebRTC Data Channels.

Put simply, the work of this PFE should produce, empirically, by measuring communication delays of existing WebRTC Data Channel implementations, a model to predict the communication time depending on the message(s) size(s), the number of parallel data streams, and the communication context.

The first step of this work will consist in identifying the candidate implementations of WebRTC Data Channels to be chosen, both browser-embedded and self-contained. One implementation of each kind (stand-alone and browser-embedded) will be used for each experiment.

Then a set of experiments will be devised to cover the broadest spectrum of communication scenarios and contexts using Data Channels (one stream one way, n streams one way, 2 streams both ways, ...).

Finally, after collecting and analyzing the results of experiments, an analytical model will be proposed and, if possible, its correctness assessed on test scenarios. If time permits, a performance comparison of various implementations can be proposed.

Useful Information:

[1] https://webrtc.org/

[2] https://tools.ietf.org/html/draft-ietf-rtcweb-data-channel-13

Title: Modeling HypeLedger Fabric (HLF) transactions (Comred)

Who?

Name: Olivier DALLE

Mail: olivier.dalle@univ-cotedazur.fr

Telephone: +33 (0)603-92-19-14

Web page: https://www.olivier-dalle.fr/

Where?

Place of the project: Laboratoire I3S

Address: Batiment Algorithmes, 2000 route des Lucioles, Sophia Antipolis

Team: COMRED

Web page: https://www.i3s.unice.fr/

Pre-requisites if needed: Python, node.js programming, be familiar with docker containers

Description:

HyperLedger Fabric (HLF) is a permissioned, non-anonymous, distributed transaction framework (a kind of block-chain), initially proposed by IBM, and widely supported by major companies and actors[1]. HLF transactions involve multiple parties: Orderer Peers, Ledger Peers, Clients, Cert Authority, and so on. Luckily, the HLF implementation comes with a handy testing framework based on docker, in which all the parties can be instantiated on one single machine, within separate docker containers, connected using the docker virtual networking functions. Programming HLF transactions can be achieved at various levels: either at the lowest level using the native smart-contract API in Go, or at higher level using more devloper-friendly framework such as HyperLedger Composer[2].

The goal of this project is to build a communication model of HLF transactions, to be able to predict the volume of traffic generated at each parties depending on the volume of transactions and peers involved. To build this model, a set of simple transaction scenarios will be designed, and the corresponding traffic will be observed on the docker virtual network. Based on these observations, the typical sequence and volume of interactions between the parties will be explicited.

Finally a testing configuration using real peer node distributed on the network will be used to assess the correctness of the model, and identify potential model discrepancies and limitations.

This work will help our current research on simulating large scale HLF scenarios to support the design of new HLF-based services.

Useful Information:

[1] https://www.hyperledger.org/projects/fabric

[2] https://hyperledger.github.io/composer/latest/

Title: Dynamic cloud network control in video delivery service chain (SigNet)

Who?

Name: Ramon Aparicio-Pardo and Lucile Sassatelli

Mail: {first.last}@unice.fr

Telephone: +33 4 92 94 27 72

Web page: http://www.i3s.unice.fr/~sassatelli/

http://www.i3s.unice.fr/~raparicio/

Where?

Place of the project: I3S

Address: Les Algorithmes - Euclide B, 2000, route des Lucioles, BP 121, 06903 Sophia-Antipolis Cedex, France

Team: Signet

Web page: http://signet.i3s.unice.fr/

Number of Students : 2-3 students recommended

Pre-requisites if needed:

Languages:

C++ language, absolutely

CPLEX library, appreciated

Theory:

communication network modeling, recommendable

linear programming, appreciated

convex optimization, appreciated

Description:

According to Cisco’s report, Global IP video traffic will progress from 73% in 2016 up to 82% in 2021 of all consumer Internet traffic [1]. This growth will be particularly pushed by two traffic types: live streaming and virtual-augmented reality, which that will increase 15-fold and 20-fold between 2016 and 2021, respectively [1]. Such a traffic increase will not only imply a higher traffic volume, i.e. in terms of bits, but a larger number of traffic flows, since live streaming rise is driven by social and crowdsourced live video platforms like Twitch, Periscope, Facebook live or Youtube Live.

The sequence of functions and operations required from the video content production up to the video content consumption is the so-called Video Delivery Service Chain (VDSC). The communication network infrastructure devoted to carry out this service chain constitutes a Content Delivery Network (CDN): a set of geographically distributed data centers (DC) connected by networking routers and transmission links. We detail the basic functions in such a service chain: 1. Content ingestion and preparation, which is the encoding and storage of the original content at the source node (the content producer).

2. Content caching, which is the eventual storage of video versions with potentially different qualities in the DC nodes, usually closer to the users than the source node. 3. Content transcoding, which is the eventual transcoding in the DC nodes of the original video to generate new different qualities versions to better adapt to the effective viewers’ bandwidth. 4. Content forwarding, which is the selection of an output link of the networking routers to send the video packets. The sequence of these selected router ports is the route that packets follows. 5. Content consumption, which is simply the reception and playing of the video by the viewers. The service chain functions 2, 3 and 4 can be “virtualized”, that means, executed by virtual machines allocated in any of the DC nodes under the paradigm of Network Function Virtualization (NFV).

The CDN decisions about how the aforementioned video service chain should be allocated over the infrastructure constitutes a challenging network control problem, that fits into the cloud network control model proposed by Llorca and Tulino in [3]-[5] and the green control proposed by [6]. First, at the middle of the chain (functions 2, 3 and 4), the allocation of storage, computing and transmission resources are not solvable in polynomial time: (a) the placement of VMs is an example of bin packing problem, a classical NP-complete problem; and, (b) the bandwidth and flow allocation problem when the bandwidth is split into discrete modules and/or single path routing is required (our case) is also a NP-complete problem [2]. Secondly, at the ends of the chain (functions 1 and 5), the rise of social live streaming, deployment of 5G mobile technologies and introduction of new applications imply an augmentation of the volume and heterogeneity of produced video contents along with an increased variety of the access, devices, roles and interests of the viewers.

Objective:

In this project, we aim to re-implement the algorithms proposed for the dynamic control problem for the unicast traffic case [4] and the generalized unionist-multicast case [5]. These algorithms constitute a basic benchmark for the development of more sophisticated control algorithms based on Machine Learning.

Tasks scheduling:

Phase 1 : Getting familiar with the network cloud control problem and proposed algorithms in [3]-[5] and the existing code for the transmission resource allocation of the algorithms proposed in [4]. To do before DoW, Nov. 16th

Phase 2: Integration of computing resource allocation decisions (such as in [4]) on the existing code. To do in two first full-work weeks before Dec. 1st

Phase 3: Extension to the multicast case (such as in [5]) on the existing code. To do in two last full-work weeks before Dec. 15th

Useful Information:

Technical tools:

C++, CPLEX Optimization Solver

Bibliography:

[1] Cisco Visual Networking Index: Forecast and Methodology, 2016–2021, August 8, 2018

[2] M. Pióro, D. Medhi., “Ch. 4 Network Design Problem Modeling,” Routing, flow, and capacity design in communication and computer networks. Elsevier, 2004.

[3] M. Barcelo, J. Llorca, A. M. Tulino, B. Raman, N, “The cloud service distribution problem in distributed cloud networks, “ in Proc. IEEE International Conference on Communications (ICC), June 2015, pp. 344-350

[4] H. Feng, J. Llorca, A. M. Tulino, and A. F. Molich, “Optimal dynamic cloud network control,” in Proc. IEEE International Conference on Communications (ICC), 2016, pp. 1-7.

[5] J. Zhang, A. Sinha, J. Llorca, A.,M. Tulino, E. Modiano, E., “Optimal Control of Distributed Computing Networks with Mixed-Cast Traffic Flows,” in Proc. IEEE INFOCOM, April 2018

[6] R. Aparicio-Pardo and L. Sassatelli. A Green Video Control Plane with Fixed-Mobile

Convergence and Cloud-RAN. Internation Teletraffic Congress, Sep. 2018.

[7] CPLEX: https://www.ibm.com/analytics/cplex-optimizer

Title: Middleware tools to bring AI in an e-Learning platform (Sparks)

Who?

Name: Catherine FARON ZUCKER

Mail: faron@unice.fr

Telephone:

Web page: http://www.i3s.unice.fr/~faron/

Name: Géraud FOKOU PELAP

Mail: geraud.fokou-pelap@inria.fr

Telephone:

Web page: http://www.i3s.unice.fr/~fokou/

Where?

Place of the project: I3S lab, Campus SophiaTech, Building Templiers 1, 4th floor

Address: Templier

Team: I3S/SPARKS

Web page: http://wimmics.inria.fr

Pre-requisites if needed: skills in Web programming, interest in Semantic Web

Description:

The goal of this project is to develop a middleware to enable an existing e-Learning platform to exploit a knowledge graph in an RDF dataset through Web services in order to develop intelligent services to users.

A set of Java Web services have already been implemented. The mission of the trainee is to develop a middleware which will be the client of these Java Web services, and the server of the learning platform:

- it will query the RDF knowledge graph through SPARQL queries and get answers in JSON;

- it will answer to http queries from the learning platform by producing and sending JSON-API answers.

Useful Information:

This PFE takes place in the context of the EDUMICS project, a joint lab between the WIMMICS research team and the Educlever company.

The trainee will have the opportunity to work on cutting-edge technologies: Java Rest services, Web development (PHP, Javascript),

software interoperability (XML, JSON, JSON-API) and Semantic Web technologies (RDF, SPARQL).

The following scientific paper presents the results of our previous work showing that a Semantic Web based solution can be a great alternative to a classical RDB based solution:

https://hal.archives-ouvertes.fr/hal-01870950

Title: Range extension through Diversity Techniques in LoRa (Low Power Wide Area) Networks (Diana)

Who?

Name: Thierry Turletti & Walid Dabbous

Mail: thierry.turletti@inria.fr & walid.dabbous@inria.fr

Telephone: 0492387879 & 0492387718

Web page: https://team.inria.fr/diana/team-members/thierry-turletti/ &

https://team.inria.fr/diana/team-members/walid-dabbous/

Where?

Place of the project: Inria

Address: 2004 route des Lucioles, 06902 Sophia Antipolis

Team: Diana project-team

Web page: https://www.inria.fr/equipes/diana

Pre-requisites if needed: Signal Processing, RF communication, Matlab.

Description: The Internet of Things (IoT) is playing an increasingly

important role today and more than half of major new business systems

are expected to incorporate IoT elements by 2020.

LoRa [1,2] is an emerging communication technology for Low Power Wide Area

Network (LPWAN) which is known to be particularly efficient for long

range communication links (several kilometers) at very low cost.

A measurement campaign done by Ubinet students last year [3] has started

characterizing LoRa transmissions in a campus like environment. The

results show that the transmission range of single antenna LoRa

devices is limited to 1.5 Km on a straight flat road.

MIMO techniques are proven to be efficient in WiFi and other wireless

technologies. Applying those techniques to LoRa would be beneficial to

extend the transmission range.

In this PFE, we will first extend the measurement campaign mentioned

above and then investigate the use of multiple antennas together with

diversity techniques to increase the uplink transmission range (i.e.,

node to gateway communication). The measurement campaign will be done

using Software Defined Radio equipment (USRP, LImeSDR, Pothos,

GnuRadio).

Work plan:

The student will start by a state-of-the-art review on LoRa physical

layer and diversity techniques [4].

Then she/he will perform a new indoor/outdoor measurement campaign on

the campus to evaluate the impact of frequency (868MHz or 2.4GHz) and

the impact of different obstacles on the transmission range. Indoor

tests will be performed both in the R2lab anechoic chamber [5] and in

typical office environment. Outdoor measurement will use the I-WIN

project infrastructure deployed jointly by Inria and LEAT.

Then, the student will compare the performance of different diversity

techniques to increase the uplink range. Multiple antennas at the

gateway will be used to combine the different signals received, with

the goal of improving the quality of the signal. Possible diversity

techniques include Maximum Ratio Combining (MRC), Switched Diversity

Combining (SDC), Equal Gain Combining (EGC) and Selection Combining

(SC).

This PFE is related to the PFE on Beamforming for LoRa Low Power Wide Area Networks.

This PFE study may be continued in an internship and a PhD for

excellent students.

References:

[1] N. Sornin, M. Luis, T. Eirich, T. Kramp, O.Hersent , “LoRa

Specification 1.0,” LoRa Alliance Standard specification.,

2016. https://www.lora-alliance.org/

[2] Augustin, A., Yi, J., Clausen, T., & Townsley, W. M. (2016). A

study of LoRa: Long range & low power networks for the internet of

things. Sensors, 16(9),

1466. http://www.mdpi.com/1424-8220/16/9/1466/pdf

[3] LoRa: Characterization and Range Extension in campus environment.

Gayatri Sivadoss, Ubinet Internship report. August 2018.

[4] Maximum Ratio Combining, Wireless Communications: Principles And

Practice, second edition, Rappaport Theodore S., Pearson Education,

2010.

[5] https://r2lab.inria.fr/index.md

Title: Beamforming for LoRa Low Power Wide Area Networks (Diana)

Who?

Name: Walid Dabbous & Thierry Turletti

Mail: walid.dabbous@inria.fr & thierry.turletti@inria.fr

Telephone: 0492387718 & 0492387879

Web page: https://team.inria.fr/diana/team-members/walid-dabbous/

& https://team.inria.fr/diana/team-members/thierry-turletti/

Where?

Place of the project: Inria

Address: 2004 route des Lucioles, 06902 Sophia Antipolis

Team: Diana project-team

Web page: https://www.inria.fr/equipes/diana

Pre-requisites if needed: Signal Processing, RF communication, Matlab.

Description: The Internet of Things (IoT) is playing an increasingly

important role today and more than half of major new business systems

are expected to incorporate IoT elements by 2020.

LoRa [1,2] is an emerging communication technology for Low Power Wide

Area Network (LPWAN) which is known to be particularly efficient for

long range communication links (several kilometers) at very low cost.

A measurement campaign done by Ubinet students last year [3] has started

characterizing LoRa transmissions in a campus like environment. The

results show that the transmission range of single antenna LoRa

devices is limited to 1.5 Km on a straight flat road.

In multiple gateways deployments, it is important to limit the

interferences between nearby gateways. Beamforming provides diversity

and array gain via coherent combining of the multiple signal paths

[1]. When the angular position of the node is known, beamforming can

be done by the gateway to target this specific node.

In this PFE, we will first extend the measurement campaign mentioned

above and then investigate the benefits of beamforming at the downlink

in both indoor and outdoor environments. In particular, we will

measure transmission power reduction for different distances. The

measurement campaign will be done using Software Defined Radio

equipment (USRP, LImeSDR, Pothos, GnuRadio).

Work plan:

The student will start by a state-of-the-art review on LoRa physical

layer and beamforming techniques [4].

Then she/he will perform a new indoor/outdoor measurement campaign on

the campus to evaluate the impact of frequency (868MHz or 2.4GHz) and

the impact of different obstacles on the transmission range. Indoor

tests will be performed both in the R2lab anechoic chamber [5] and in

typical office environment. Outdoor measurement will use the I-WIN

project infrastructure deployed jointly by Inria and LEAT.

Then she/he will evaluate in a new indoor/outdoor measurement campaign

the gain in SNR and goodput of the beamforming techniques already

implemented in GnuRadio by a previous student.

This PFE is related to the PFE on Range extension through Diversity Techniques in LoRa

Low Power Wide Area Networks.

This PFE study may be continued in an internship and a PhD for

excellent students.

References:

[1] Wireless Communications, Andrea Goldsmith, Cambridge University

Press, 2005.

[2] N. Sornin, M. Luis, T. Eirich, T. Kramp, O.Hersent , “LoRa

Specification 1.0,” LoRa Alliance Standard specification.,

2016. https://www.lora-alliance.org/

[3] Augustin, A., Yi, J., Clausen, T., & Townsley, W. M. (2016). A

study of LoRa: Long range & low power networks for the internet of

things. Sensors, 16(9),

1466. http://www.mdpi.com/1424-8220/16/9/1466/pdf

[4] Geolocation for LoRa Low Power Wide Area Network, Othmane Bensouda

Korachi, Ubinet Internship report. August 2018.

[5] https://r2lab.inria.fr/index.md

Title: Large and rich training sets for coflow classifiers (Diana)

Who?

Name: Thierry Turletti, Damien Saucez

Mail:

Thierry.Turletti@inria.fr, Damien.Saucez@inria.fr

Telephone: +33 4 89 73 24 18

Web page:

https://team.inria.fr/diana/team-members/thierry-turletti/,

https://team.inria.fr/diana/team-members/damien-saucez/

Where?

Place of the project: Inria Sophia Antipolis

Address: 2004 route des Lucioles, 06560 Valbonne

Team: DIANA

Web page:

https://team.inria.fr/diana/

Pre-requisites if needed: Fair knowledge of Linux and Network management. Fair level of programming in Java or Python.

Description: Parallel distributed computation algorithms such as Map-Reduce [1] produce massive amount of traffic that can stress the data-center networking infrastructure. By nature the communication scheme of such algorithms is distributed and structured: multiple workers process data in parallel and send their results in the network. As a result, as opposed to Internet traffic where one can consider that flows are independent (to some extent), in data-centers the network flows are often not independent and can be grouped in what is called coflows [2]. To be simple, a coflow is "a collection of flows between two groups of machines that are bound together by application-specific semantics”. In order to optimize networking and scheduling decisions, it is essential to automatically detect the flows and to which coflow they belong such that consistent optimisation decisions can be performed on the flows of a same coflow. Machine learning techniques start to be proposed to classify flows of coflows [3-6]. Alas, a general issue when it comes to propose classifiers for such system is the difficulty to feed the learning algorithms with a large enough training set. For that reason, and in order to help in building coflow classifiers, we aim with this PFE at investigating how to construct an automated experimentation system to generate large and reproducible training sets for coflow classifier. The idea is to emulate large and various datacenters in Grid5000 and to run a broad range of benchmarks on them. From the execution of the benchmarks it will then be possible to produce a large and rich variety of training sets for coflow classifiers. Tools such as Grid5000 [7], Distem [8], and DiG[9] will be investigated to help in the construction of this essential brick of data-center optimization.

[1] J. Dean, and S. Ghemawat, "MapReduce: simplified data processing on large clusters", in Communications of the ACM, 2008, vol. 51, no 1, p. 107-113.

[2] Chowdhury, M. and Stoica, I., “Coflow: A networking abstraction for cluster applications,” in Proceedings of the 11th ACM Workshop on Hot Topics in Networks, ser. HotNets-XI.

[3] M. Chowdhury et al., “Efficient coflow scheduling with VARYS,” SIGCOMM Comput. Commun. Rev., vol. 44, no. 4, pp. 443–454, Aug. 2014.

[4] H. Zhang et al., “CODA: Toward automatically identifying and scheduling coflows in the dark,” in Proc. Of ACM SIGCOMM 2016.

[5] L. Chen, et al. "Optimizing coflow completion times with utility max-min fairness." Computer Communications, In Proc. of IEEE INFOCOM, 2016.

[6] Y. Zhao et al. "RAPIER: Integrating routing and scheduling for coflow-aware data center networks." In Proc. of IEEE INFOCOM, 2015.

[7] Grid5000, https://www.grid5000.fr/mediawiki/index.php/Grid5000:Home

[8] L. Sarzyniec, T. Buchert, E. Jeanvoine and L. Nussbaum, "Design and Evaluation of a Virtual Experimental Environment for Distributed Systems”, 21st Conference on Parallel, Distributed and Network-Based Processing (PDP 2013).

[9] H. Soni, D. Saucez and T. Turletti, "DiG: Data-centers in the Grid," 2015 IEEE NFV-SDN conference, 2015.

Useful Information: 1 student

Evolution over time of the structure of social graphs(Coati)

Who?

Advisor: Frédéric Giroire

Emails: frederic.giroire@inria.fr

Laboratory: COATI project - INRIA (2004, route des Lucioles – Sophia Antipolis)

Web Site:

http://www-sop.inria.fr/members/Frederic.Giroire/

Pre-requisites if any:

Knowledge and/or taste for graph algorithms, big data, graph algorithms, network analysis

Description:

The goal of the project is to develop methods to analyse the evolution across time of a social network. We will consider as example the graph of scientific collaborations as it can be crawled freely.

The project will have two phases:

- Data collection. In the first phase, the student will use the available bibliographic research tools (SCOPUS, Web of Science, Patstat) to create data sets. One corresponding to the current situation and others corresponding to past moments. The data sets will correspond mainly to networks (annotated graphs) of scientific collaborations.

- Data analysis. In the 2nd phase, the student will analyse this data. First, they will focus on simple metrics (number of publications, number of patent applications...) and compare the evolutions across time. Then, if there is time, she will start studying the evolution of the structure of the network and will look at whether they are observing an evolution of its clustering due to the emergence of new collaborations.

The PFE will be part of a larger project on the evaluation of the impact of funding on scientific research. The project involve researchers in economics, sociology, and computer science.

The PFE can also be done in a group of two students.

The PFE may be followed by an internship for interested students.

Title: Graphical and textual editors for hierarchical automata (Kairos)

Who?

Name: Eric Madelaine & Julien Deantony

Mail: eric.madelaine@inria.fr, julien.deantoni@inria.fr

Telephone:

Web page:

Where?

Place of the project: Inria

Address: Sophia-Antipolis

Team: KAIROS

Web page: https://team.inria.fr/kairos/

Pre-requisites if needed:

Java, Eclipse,

Affinities with formal/rigourous activities

Useful information:

This subject is primarily planned for two students, but can be adapted for one single student.

This work can potentially be a prequel to a summer internship for a good student, in the same context.

Description:

Context:

We are developing a platform called VerCors for the design and the analysis of distributed

systems. At the core of the platform is a new model expressing the behavior of such

systems, in the form of hierarchical automata [1]. This behavior model is able to express the

semantics of various distributed or parallel languages, and serves as an input formalism

for various analysis tools. One important such analysis technique is model-checking, that consists

in checking the validity of requirements, in terms of temporal logic formulas, on the states of the model.

Objective:

As a starting point to ease the reading, capture and manipulation of such hierarchical

automata, we specified its abstract syntax by using the Eclipse Modeling Framework

(https://www.eclipse.org/modeling/emf).

This framework provides a JAVA API for manipulating the AST. This is sufficient for

some manipulation but the reading and writing of such automata is still very tedious.

Based on the existing abstract syntax, we would like to obtain both a textual and a

graphical advanced editors, for defining both the system model and the requirement formulas.

The two formalisms have a common part expressing the (symbolic) actions and predicates

From which the behavior is built.

Approach (techniques and tools to use):

For that purpose we want to rely on the Xtext framework (http://www.eclipse.org/Xtext/), which

enables the automatic generation of advanced textual editors based on a specification of

the language grammar. For the graphical editors, we want to rely on the Sirius framework

(http://www.eclipse.org/sirius), enabling the automatic generation of graphical editor based

on a specification of the representation of each concepts in the abstract syntax. The idea

is to take benefits of both representations depending on the user, the complexity of the

automata and even the part of the automata under development. For instance, graphical

manipulation are convenient for structuring the automata but textual editing is usually

preferable to specify the guards and transition actions.

Anyway, the students working on this project will be free to make any propositions to

helpboth the reading and capture of such hierarchical automata.

The students must feel comfortable with the Java programming language and curious to

manipulate exciting powerful frameworks.

Schedule and sharing out of the work:

The first weeks (1/2 day per week) will be targeted to understand the context and explore

the documentation about the technology proposed.

Then the students will independently develop the textual and the graphical approaches,

the intersection being that the tool, in the end, will produce the same EMF objects.

Depending on the results obtained by the students, it could be interesting to have a mixed

edition, both graphical and textual, since the Xtext and Sirius frameworks can be

integrated together [2]. While this part is optional, we

believe that good students could provide impressive results mixing both graphical and

textual representations.

References:

[1] Ludovic Henrio, Oleksandra Kulankhina, Siqi Li, Eric Madelaine. Integrated

environment for verifying and running distributed

components -Extended version. [Research Report] RR-8841, INRIA Sophia-Antipolis.

2015, pp.24. <hal-01252323>

[2] https://www.eclipsecon.org/france2014/sites/default/files/slides/Xtext_Sirius.pdf

and https://www.infoq.com/presentations/sirius-xtext

Useful information: topic for 2 students

Title: Optimization of drones trajectory for optimal sensor coverage and data collection (Coati)

Who?

Name: Christelle Caillouet

Mail: christelle.caillouet@unice.fr

Telephone: +33 4 92 38 79 29

Web page: http://www-sop.inria.fr/members/Christelle.Molle-Caillouet/

Where?

Place of the project: COATI, joint project team between Inria and I3S lab

Address: Inria, 2004 route des lucioles, Sophia Antipolis

Team: COATI

Web page: https://team.inria.fr/coati/

Pre-requisites if needed: Linear programming, Algorithmic, Wireless Networks

Description: Recent advances of technology have led to the development of flying drones

that act as wireless base stations to track objects lying on the ground. This kind of robots

(also called Unmanned Aerial Vehicles or UAVs) can be used in a variety of applications

such as vehicle tracking, traffic management and fire detection.

Deploying these Unmanned Aerial Vehicles to cover targets is a complex problem since each

target should be covered, UAVs should form a connected backbone with a base station in order

to collect and send information to the targets, while minimizing several parameters such that

deployment cost, UAV's altitudes to ensure good communication quality, energy consumed,

UAV's move, ...

The project direction is to provide an efficient and reliable drone placement and scheduling

by adjusting their position ensuring the surveillance of all the targets among time.

Theoreticaly, this problem is related to the set covering problem (and its dynamic version),

and the 3D packing problem.

The guideline of the proposed project is the following :

* Bibliographic analysis and understanding of papers [1] and [2]

* Development of a linear model extending [1] with trajectory modelling and scheduling constraints

* Implementation and analysis of obtained solutions

Useful Information:

[1] C. Caillouet, F. Giroire, T. Razafindralambo, "Optimization of mobile sensor coverage with UAVs", in WiSARN@INFOCOM, Apr. 2018.

[2] L. Di Puglia Pugliese, F. Guerriero, D. Zorbas, T. Razafindralambo, "Modelling the mobile

target covering problem using flying drones", Optimization Letters, Springer Verlag, volume

10(5), pages 1021ñ1052, June 2016.

This project can be followed by an internship.

Title: Optimal planning of LoRa networks (Coati)

Who?

Name: Christelle Caillouet

Mail: christelle.caillouet@unice.fr

Telephone: +33 4 92 38 79 29

Web page: http://www-sop.inria.fr/members/Christelle.Molle-Caillouet/

Where?

Place of the project: COATI, joint project team between Inria and I3S lab

Address: Inria, 2004 route des lucioles, Sophia Antipolis

Team: COATI

Web page: https://team.inria.fr/coati/

Pre-requisites if needed: Linear programming, Algorithmic, Wireless Networks

Description: LoRa networks enable long range communications at low power and low cost for Internet of Things (IoT) applications.

The performances of such networks depends on various parameters such as the location of the gateways, energy consumption of end

devices and radio configuration of the communications. The goal is to deploy the network gateways in order to ensure the coverage

of end devices, limiting congestion and interferences. Additionnaly the network capacity can be improved by properly allocating radio

ressources like channel bandwidth, spreading factor, coding rate, and transmission power.

In order to maximize the LoRa network capacity, cross-layer optimization approaches have to be investigated. The goal of this project

is to review recent work about planning LoRa networks, and analyze the various parameters to consider for an accurate cross-layer

model to garanty good network performances with a large number of gateways and end devices.

The guideline of the proposed project is the following :

* Bibliographic analysis and understanding of papers [1] and [2]

* Reflexion about parameters to consider to optimize LoRa network capacity

* Development of a linear model or algorithm

* Implementation and analysis of obtained solutions

Useful Information:

[1] M. Cesana, A. Redondi, J. Ortin, "A Framework for Planning LoRaWAN Networks", IEEE PIMRC, Sep. 2018.

[2] D. Zorbas, G. Papadopoulos, P. Maille, N. Montavont, C. Douligeris, "Improving LoRa Network Capacity Using Multiple Spreading Factor Configurations", ICT 2018.

This project can be followed by an internship.

Title: Design and tests of new user’s attention guidance techniques for Virtual Reality streaming (SigNet)

Who?

Advisors: Lucile Sassatelli, Marco Winckler, Anne-Marie Pinna-Déry Emails : {first.last}@unice.fr

Laboratory: I3S, SigNet and S3 groups (2000, route des Lucioles – Sophia Antipolis)

Description:

VR is growing fast with different companies rolling out cheap and not-so-cheap head-mounted sets, from dedicated headsets like Oculus Rift and HTC Vive down to smartphone-dependent headsets (e.g., Samsung Gear VR, Google Cardboard and alike). VR represents a tremendous revolution in the user’s experience, but VR is also a significant challenge for streaming transmission over the Internet (that is, Youtube-like, without download). The bit rates entailed by 360° videos are indeed much higher than for conventional videos.

This PFE lies in the context of the ACTIVATE project (UCA Academie 1, under the coordination of Lucile Sassatelli). In the context of this research project, we are designing innovative streaming strategies for 360° videos, which are meant to both decrease the required bandwidth and improve the user experience. A possible choice to do so is to transmit in high-quality only the parts of the scenes that are effectively explored by the users. It however entails two difficulties. First anticipating the user’s position to decide what to send in high- quality is difficult. Second, video directors must be able to interrupt exploration and bring the viewer’s attention back to the plot.

We suggest that user guidance helps solve both problems: i) bring users’ attention to the plot so that they don’t miss the narrative presented by video producers; ii) reduce network traffic by transferring only the parts that users actually watch. In our previous work [1,2], we have proposed a guiding technique for VR videos called snap-changes.

In this PFE, we want to investigate 2 new interface/video manipulation techniques called virtual walls and slow- downs. They have already been partly implemented in our Android VR app, but remain to be refined, interfaced with the video quality selection module for streaming, and to be tested on users.

- 1st phase: Getting familiar with the problem and the available implementations:

- the considered principles for streaming 360°-videos

- the testbed, made of 2 Android applications, and multimedia toolboxes [6,7]

- running user tests with the current implementation (comparing with a reference implementation

without virtual walls nor slow-downs).

- 2nd phase: Completion of the techniques, interfacing with the streaming module, user experiments - proper handling of audio for slow-downs

- select the quality over time as a function of the head position

- run tests to properly assess the gains in terms of user’s experience and bandwidth

Technical tools:

Android, Samsung Gear VR framework (Adobe Premiere)

References:

[1] S. Dambra, G. Samela, L. Sassatelli, R. Pighetti, R. Aparicio-Pardo and A.-M. Pinna-Déry. Film Editing: New Levers to Improve Virtual Reality Streaming. ACM International Conference on Multimedia Systems

(MMSys), Amsterdam, The Netherlands, June 2018.

[2] L. Sassatelli, A.-M. Pinna-Déry, M. Winckler, S. Dambra, G. Samela, R. Pighetti and R. Aparicio-Pardo. Snap-changes: a Dynamic Editing Strategy for Directing Viewer's Attention in Streaming Virtual Reality Videos. ACM International Conference on Advanced Visual Interfaces, Grosseto, Italy, May 2018.

[3] Bo Begole. Why The Internet Pipes Will Burst When Virtual Reality Takes Off. Forbes, Feb. 2016.

[4] O. A. Niamut, E. Thomas, L. D'Acunto, C. Concolato, F. Denoual, and S. Y. Lim, "MPEG DASH SRD: spatial relationship description," ACM Int. Conf. on Multimedia Systems (MMSys), May 2016.

[5] FFMPEG. Available: https://ffmpeg.org/

[6] MP4box. Available: https://gpac.wp.mines-telecom.fr/mp4box/

Title: Estimating Content Popularity in Cache Networks(Neo)

Who?

Name: Giovanni Neglia Mail: giovanni.neglia@inria.fr

Name: Sara Alouf Mail: sara.alouf@inria.fr

Web page: http://www-sop.inria.fr/members/Giovanni.Neglia/, http://www-sop.inria.fr/members/Sara.Alouf/

Where?

Inria Sophia-Antipolis Méditerranée

Address: 2004 route des Lucioles, 06902 Sophia Antipolis

Team: Neo, https://team.inria.fr/neo/

Description

This internship is in the framework of Neo’s research cooperation with Akamai Technologies, the world leader in the field of Content Delivery Networks.

Caching policies try, implicitly or explicitly, to estimate the popularities of the different contents, in order to store those more likely to be requested in the near future. [1] advocates that popularity estimation will play a fundamental role in future cellular networks, while [2] stresses the importance in such scenario to perform the estimation at the right level of the cache hierarchy.

Efficient estimation of popularities can be done with counting extensions [3] of Bloom filters. The specific variant in [4] is conceived to quantify request rates through an auto-regressive filter that can track also time-variant popularities. [5] suggests that the counting error floor (due to false positives) does not allow to evaluate correctly the popularity but for the most popular m contents, where m is the number of counters used. A similar remark on how memory affects estimation quality is in [6]. In [7], the request rate for content i is estimated simply as r_i = 1/T_i where T_i is the most recent time-interval between two consecutive requests. [8] proposes a new caching policy relying on more sophisticated estimation techniques. [9] suggests a novel approach to implicitly estimate popularities that does not require additional memory. [10] presents an interesting framework to estimation techniques by looking

at both the learning rate and the learning accuracy.

The student will read the papers below and compare the popularity estimation techniques proposed in these papers in terms of their algorithmic complexity as well as of their caching performance (e.g. their hit rate). The last aspect will be carried on by simulations using synthetic traffic traces, but also real

ones provided by Akamai Technologies.

Pre-requisites:

The student should have good programming and analytical skills (probability, algorithms).

Other information:

This subject is research oriented and can be continued with a longer internship. In particular, some preliminary results from a Ubinet internship during the previous academic year have questioned the usefulness of neural networks for caching, when very precise popularity information is available [11]. The results could be different in the case when memory constraints require the use of popularity estimators with small memory footprint.

References:

[1] E. Zeydan, E. Bastug, M. Bennis, M. Abdel Kader, I. Alper Karatepe, A. Salih Er, and M. Debbah, “Big Data Caching for Networking: Moving from Cloud to Edge,” IEEE Communications Magazine, Volume: 54 Issue: 9, 2016

[2] M. Leconte, G. Paschos, L. Gkatzikis, M. Draief, S. Vassilaras, S. Chouvardas Placing, “Dynamic Content in Caches with Small Population,” in Proc. of IEEE INFOCOM 2016, San Francisco, USA

[3] A. Broder and M. Mitzenmacher, “Network applications of bloom filters: A survey,” Internet Math., vol. 1, no. 4, pp. 485–509, 2003. [Online]. Available: http://projecteuclid.org/euclid.im/1109191032

[4] G. Bianchi, N. d’Heureuse, and S. Niccolini, “On-demand time-decaying bloom filters for telemarketer detection,” Computer Communication Review, vol. 41, no. 5, pp. 5–12, 2011.

[5] G. Bianchi, K. Duffy, D. J. Leith, and V. Shneer, “Modeling conservative updates in multi-hash approximate count sketches,” in 24th International Teletraffic Congress, ITC 2012, Krakow, Poland, September 4-7, 2012, 2012, pp. 1–8.

[6] G. Neglia, D. Carra, P. Michiardi, Cache Policies for Linear Utility Maximization, Proc. of INFOCOM 2017, Atlanta, GA, USA, 1-4 May 2017

[7] M. Dehghan, L. Massoulie, D. Towsley, D. Menasche, and Y. Tay, “A Utility Optimization Approach to Network Cache Design,” in Proc. of IEEE INFOCOM 2016, San Francisco, USA.

[8] S. Li, J. Xuy, M. van der Schaarz, W. Li, “Popularity-Driven Content Caching,” in Proc. Of IEEE INFOCOM 2016, San Francisco, USA

[9] G. Neglia, D. Carra, M. D. Feng, V. Janardhan, P. Michiardi, and D. Tsigkari, “Access-time aware cache algorithms,” Proceeding of ITC 28, Würzburg, September 2016. BEST PAPER AWARD

[10] J. Li, S. Shakkottai, J. Lui, V. Subramanian, “Accurate Learning or Fast Mixing? Dynamic Adaptability of Caching Algorithms,” CoRR abs/1701.02214 (2017)

[11] V. Fedchenko, G. Neglia, B. Ribeiro, "Feedforward Neural Networks for Caching: Enough or Too Much?," under submission, available upon request.

Title: Achieve Web Search Privacy by Obfuscation (Neo/Epione)

Who?

Name: Giovanni Neglia (giovanni.neglia@inria.fr), Charles Bouveyron (charles.bouveyron@unice.fr)

Web pages: http://www-sop.inria.fr/members/Giovanni.Neglia/, https://math.unice.fr/~cbouveyr/

Where?

Inria Sophia-Antipolis Méditerranée

Address: 2004 route des Lucioles, 06902 Sophia Antipolis

Teams:

Neo, https://team.inria.fr/neo/

Epione, https://team.inria.fr/epione/

Description:

The recent Cambridge Analytica scandal has once more raised attention on the concrete risk of misuse of the data collected by information technology companies.

We believe that the digital citizen should be empowered to directly defend his/her own data. To this purpose, we want to study how users can defend their privacy against profiling attempts from search engines, by obfuscating their personal data amidst a stream of fake information.

There have been a few prototypal attempts to pursue this direction. The oldest example we are aware of is TrackMeNot [How09], a Firefox browser extension designed to hide user search engine queries by adding a stream of programmatically generated decoy searches. After the US Senate vote to eliminate privacy rules in March 2017, a new indignation wave spurred a number of programmers to develop similar plugins/script to generate fake web visits. Some notable examples are ISP Data Pollution [Smi17], RuinMyHistory [Adk17] and Noiszy [Noi17]. These programs use a static (but tunable) list of websites to which they send requests, or autonomously adapt this list by picking random links from queries to search engines. While commendable in their attempt, it is not clear if these software programs are indeed effective. Criticism to the quality of the noise generated from TrackMeNot was expressed by Bruce Schneier, Chief Technology Officer of IBM Resilient [Sch06] and a few years later [Pid10] showed experimentally that the search engine, equipped with only a short-term history of a user?s search queries, could break the privacy guarantees of TrackMeNot by only utilizing off-the-shelf machine learning classifiers. More recently, the Electronic Frontier Foundation Senior Staff Technologist Jeremy Gillula has expressed his skepticism about the current solution: "I'd love to be proven wrong about this. I'd want to see solid research showing how well such a noise-creation system works on a large scale before I trust it" [Bro17]. Our current research aims to provide solid theoretical conclusions about this approach. A first attempt in this direction is in [Ye09], that quantifies the mutual information between the aggregate (real plus fake data) and the original data flows as a function of the noise added, but implicitly assumes that fake data is indistinguishable from real one.

Starting from the original observations in [Sze14], several recent works have shown that state-of-the-art classifiers are vulnerable to worst-case (i.e., adversarial) perturbations of the data points in the training dataset. [Faw16] shows that the same classifiers are relatively robust to random noise in high dimensional problems, but much less if a worst-case perturbation can be found even within a subspace of the dataset. These results, that hold for a static learning setting, drive our search for good noise generators to jeopardize online learning.

The purpose of this project is to study the related literature on robustness of classifiers to adversarial and random perturbations of the dataset. From this analysis the student should draw conclusions about which approaches are more promising to generate fake queries.

Pre-requesites:

We are looking for two possible profiles: 1) candidates with strong background on probability and statistics, interested to understand the theoretical limits of query obfuscation using tools from probability and statistical learning, 2) candidates with good programming skills, interested to evaluate experimentally the effect of noise on the performance of state-of-the-art machine learning classifiers.

Other info:

This subject is research oriented and can lead to a following internship.

References:

[Adk17] J. Adkins, RuinMyHistory, https://github.com/FascinatedBox/RuinMyHistory

[Bro17] J. Brodkin, After vote to kill privacy rules, users try to “pollute” their Web history, https://arstechnica.com/?post_type=post&p=1070315

[Bou18] C. Bouveyron, P. Latouche and R. Zreik, The Stochastic Topic Block Model for the Clustering of Networks with Textual Edges, Statistics and Computing, Volume 28, Issue 1, pp 11–31, 2018. [Che15] M. Chessa, J. Grossklags, P. Loiseau, A game-theoretic study on non-monetary incentives in data analytics projects with privacy implications. In: Proceedings of the 2015 IEEE 28th Computer Security Foundations Symposium (CSF). pp. 90-104, 2015

[Dwo06] C. Dwork, “Differential privacy,” in Automata, Languages and Programming,

ser. Lecture Notes in Computer Science, M. Bugliesi, B. Preneel, V. Sassone, and I. Wegener, Eds. Springer Berlin Heidelberg, 2006, vol. 4052, pp. 1–12

[Faw16] A. Fawzi, S. M. Moosavi-Dezfooli, P. Frossard, Robustness of classifiers: from adversarial to random noise, Advances in Neural Information Processing Systems, 1632-1640, 2016

[Fbtrex] Facebook tracking exposed, https://facebook.tracking.exposed/

[Fie98] S. E. Fienberg, U. E. Makov, and R. J. Steele, Disclosure Limitation Using Perturbation and Related Methods for Categorical Data, Journal of Official Statistics, Vol. 14, No. 4, 1998, pp. 485-502 [Gre18] The Cambridge Analytica files: the story so far https://www.theguardian.com/news/2018/mar/26/the-cambridge-analytica-files-the-story-so-far [How09] Howe, D.C., Nissenbaum, H.: TrackMeNot: Resisting surveillance in web search. In: Kerr, I., Lucock, C., Steeves, V. (eds.) Lessons from the Identity Trail: Privacy, Anonymity and Identity in a Networked Society. Oxford University Press, Oxford (2009)

[Hun12] A. Hundepool, J. Domingo-Ferrer, L. Franconi, S. Giessing, E. Schulte Nordholt, K. Spicer, P. de Wolf, Statistical Disclosure Control, John Wiley & Sons, Jul 5, 2012

[Kle01] J. Kleinberg, C. H. Papadimitriou, and P. Raghavan. On the value of private information. In Proceedings of TARK, pages 249257, 2001

[Noi17] https://noiszy.com

[Ped10] S T Peddinti, N Saxena, On the Privacy of Web Search Based on Query Obfuscation: A Case Study of TrackMeNot, International Symposium on Privacy Enhancing Technologies (PETS 2010), pp 19-37

[Rub93] D. B. Rubin, Discussion statistical disclosure limitation, Journal of official Statistics, 1993 [Sch06] B. Schneier, TrackMeNot, https://www.schneier.com/blog/archives/2006/08/trackmenot_1.html

[Smi17] S. Smith, ISP Data Pollution script, https://github.com/essandess/isp-data-pollution

[Sze14] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R. Fergus, Intriguing properties of neural networks, International Conference on Learning Representations, 2014

[Ye09] S. Ye, F. Wu, R. Pandey and H. Chen, "Noise Injection for Search Privacy Protection," International Conference on Computational Science and Engineering, Vancouver, BC, 2009

Title: Capturing characteristics of partitioned graphs (Scale)

Who?

Name: Fabrice Huet

Mail: Fabrice.huet@unice.fr

Telephone: 04 92 94 26 91

Web page: https://sites.google.com/site/fabricehuet/

Where?

Place of the project: Laboratoire I3S

Address: 2000 route des Lucioles

Team: Scale (COMRED team)

Web page: http://www.i3s.unice.fr/fr/comred

Pre-requisites if needed: Basic knowledge of graphs. Knowledge of Java or Python

Description:

A lot of data is represented in graphs which are often too large to be loaded and processed on one machine.

A common approach to perform computation on large graphs is to partition them into smaller sub-graphs. These

sub-graphs (or partitions) can then be processed by different machines. Depending on the algorithm (PageRank, Shortest Path...), some

communication will take place during execution. Partitioning the graph is a problem in itself, with already a lot of

different solution. They can be classified into two different approaches, whether we place the edges (aka vertex-cut based)

the vertices (edge-cut based) on the different machines. The partitioning can have a major impact on the execution time

of an algorithm so using the best one is of paramount importance. However, there is no single best partitioner. It depends

on a lot of different parameters, including the graph considered and the algorithm executed.

In a previous work (https://hal.inria.fr/hal-01401309), we have investigated which metrics were the most significant for

choosing the best partitioning using a linear model. More recently, we have conducted a similar work in the context

of machine learning. An insight from this work is that the common metrics used to describe graphs or partitions fail to capture

significant information, making the modeling inaccurate.

The goal of this PFE is to study the large body of published papers on graph partitioning to gather and analyze all the metrics

used by the authors. We aim to make a catalogue of known metrics. In a second step, these metrics will have to be implemented in the

GraphX framework of Spark (https://spark.apache.org/graphx/).

Title: A Privacy-Preserving Social Networking Application (Sparks)

Who?

Name: Karima Boudaoud :

Mail: karima.boudaoud@unice.fr

Name: Philipp Hoschka (W3C/ERCIM)

Mail: hoschka@w3.org

Where?

Place of the project:

Address: Polytech’Nice Sophia

Team: SPARKS

Description: Today’s commercially successful social networking applications like Facebook, Twitter and Instagram pose potentially serious issues in the area of privacy due to their centralized architecture: all user data is kept at a central location by the operator of the social network.

The solution to this problem is using a decentralized architecture where data are kept by individual users, and shared under their control. To tackle this issue, Tim Berners-Lee (the inventor of the Web) has developed SOLID, an architecture that supports decentralized data sharing (see https://medium.com/@timberners_lee/one-small-step-for-the-web-87f92217d085).

The goal of this “stage” is to design and implement an example decentralized social networking application (sharing restaurant and hotel recommendations) using Berners-Lee’s SOLID system. The work includes developing a web version (usable on mobile) as well as at least one “mobile app” version (Android or iphone).

Useful Information: project for 2 students