Internship List 16-17

ElectroSmart, exploring the exposition to microwaves.

Name: Arnaud Legout

Mail: arnaud.legout@inria.fr

Telephone: 04 92 38 78 15

Web page: http://www-sop.inria.fr/members/Arnaud.Legout/

Place of the project: Inria

Address: 2004 route des Lucioles, Sophia Antipolis

Team: DIANA

Web page: https://team.inria.fr/diana/

Pre-requisites if needed: Android programming

Description:

The goal of the ElectroSmart project is to develop the instrument,

methods, and models to compute the exposition of the general public to

microwave electromagnetic fields used by wireless protocols and

infrastructures such as Wi-Fi, Bluetooth, or cellular. Indeed, the

Internet and new devices such as smartphones have fundamentally

changed the way people communicate, but this technological revolution

comes at the price of a higher exposition of the general population to

microwave electromagnetic fields (EMF). This exposition is a concern

for a broad spectrum of actors such as health agencies and

epidemiologists who want to understand the impact of such an

exposition on health, or cellular operators and regulation authorities

who want to improve the cellular coverage while limiting the

exposition.

The student will have to work on the improvement of the

Android application on which the ElectroSmart project

is based (see http://es.inria.fr). The internship main goal will be

to design and implement a statistical test in the application to assess

you level of exposition based on the other users.

The student must be fluent in Java and has some experience in

Android programming .

If the student is highly motivated and has the required

competencies, we offer the possibility to continue with a

Ph.D. thesis.

Measuring and modeling the dependency of Web Quality of Experience on network performance

Name: Chadi Barakat

Mail: Chadi.Barakat@inria.fr

Telephone: +33492387596

Web page: http://team.inria.fr/diana/chadi/

Place of the project: Inria Sophia Antipolis – Diana team

Address: 2004, route des lucioles, 06902 Sophia Antipolis, France

Team: Diana

Web page: http://team.inria.fr/diana/

Pre-requisites if any: Network programming skills and good knowledge of data analysis tools

Detailed description:

This internship fits within our ACQUA project on measuring, modeling and predicting end user Quality of Experience (QoE). The purpose is to establish models linking end user Quality of Experience for main services and applications to network conditions (bandwidth, delay, etc). These models on one side are supposed to shed light on the dependency of Quality of Experience on network performance. On the other side, they can be used to predict the Quality of Experience of end users regarding services and applications, even before launching the applications and services themselves. Such prediction improves network transparency and can be used for optimizing network management as is the case for caching and routing.

In a previous work, we have proved the feasibility of the approach with the Skype use case. We have proceeded for that by extensive controlled experiments where network configurations are artificially modified and Skype call quality is written down. Data has then been analyzed using statistical tools and machine learning techniques, and the result was indeed the possibility to predict Skype call quality with a good precision if network level measurements are available (using Decision Trees for instance).

The purpose of this internship is to go one step further and cover the Web use case, which is one of the main usages of today Internet. The candidate will first have to cover the state of the art for studies on Web Quality of Experience and define both the experiments to be carried out and the measurements to be done both for network and Web Quality of Experience (for instance Page Load Time as a model of Web QoE). After this study, our existing platform tuned for Skype and video streaming has to be customized to the Web use case and a rich dataset has to be collected. Reducing the complexity of the experimentation phase is one of the criteria at this level. In parallel to data collection, the candidate will perform statistical analysis of the collected datasets and confirm or not the possibility to predict Web Quality of Experience. By the end of the internship, we expect a model of acceptable accuracy, the better in the form of a Decision Tree, linking network performance to Web QoE.

The ACQUA project is part of the ANR National Project BottleNet and the Inria Project Lab BetterNet. The candidate might need to interact with members of these projects, especially members of the ACQUA projects in the Diana team.

The internship can be continued in a PhD thesis if funding is secured and the candidate shows excellent research capacity.

References: set of bibliographical references (article, books, white papers, etc) to be read by the student before starting to work on this subject

- Thierry Spetebroot, Salim Afra, Nicolas Aguilera, Damien Saucez, Chadi Barakat, “From network-level measurements to expected Quality of Experience: the Skype use case“, in proceedings of the IEEE International Workshop on Measurement and Networking (M&N), Coimbra, Portugal, October 2015.

- Athula Balachandran, Vaneet Aggarwal, Emir Halepovic, Jeffrey Pang, Srinivasan Seshan, Shobha Venkataraman, and He Yan. 2014. Modeling web quality-of-experience on cellular networks. In Proceedings of the 20th annual international conference on Mobile computing and networking (MobiCom '14).

- EYEORG: A Platform For Crowdsourcing Web Quality Of Experience Measurements, Matteo Varvello, Jeremy Blackburn, David Naylor, and Konstantina Papagiannaki, in ACM CONEXT 2016.

- ACQUA: Application for prediCting Quality of User experience at Internet Access. URL: http://project.inria.fr/acqua/

- ANR BottleNet: Understanding and diagnosing end-to-end communication bottlenecks of the Internet. URL: http://project.inria.fr/bottlenet/

- IPL BetterNet: An observatory to Measure and Improve Internet Service Access from User Experience. URL: http://project.inria.fr/betternet/

Robust Service Function Chains in OpenStack

Name: Damien Saucez

Mail: damien.saucez@inria.fr

Telephone: +33 4 89 73 24 18

Web page: https://team.inria.fr/diana/team-members/damien-saucez/

Place of the project: Inria Sophia Antipolis

Address: 2004 route des Lucioles, 06902 Sophia Antipolis

Team: DIANA

Web page: https://team.inria.fr/diana/

Pre-requisites if needed: Strong knowledge in network protocols and concepts (IP, TCP, routing,…). Excellent programming skills in Java or Python. Expertise in NAGIOS.

Description:

With the advent of the Cloud, networks are massively virtualized and run on top of commodity hardware shared with other tenants. With this work, we will extend OpenStack to provide robust Service Function Chains (SFC). More specifically, we will develop an online placement algorithm that guarantees that if a chain is deployed, it will respect the preset Service Level Agreement (SLA) in terms of performances but also availability (e.g., latency < 10ms and 99.99% availability). The algorithm will be implemented in OpenStack by leverage existing plugins, such as those in Telemetry (https://wiki.openstack.org/wiki/Telemetry) such that the orchestrator can place chains with availability guarantees.

Useful Information: The work is performed in the collaborative project ANR REFLEXION (http://anr-reflexion.telecom-paristech.fr/). The student will thus work closely with the team members involved in the project, namely Ghada Moualla and Thierry Turletti. The DIANA team members are subject to FSD authorization.

2-mode itinerary computation

Name: David Coudert and Nicolas Nisse

Mail: david.coudert@inria.fr and nicolas.nisse@inria.fr

Telephone: 04 92 38 79 81

Web page: http://www-sop.inria.fr/members/David.Coudert/

http://www-sop.inria.fr/members/Nicolas.Nisse/

Place of the project: Inria Sophia Antipolis Méditerranée

Address: 2004 route des Lucioles, 06902 Sophia Antipolis

Team: COATI

Web page: http://team.inria.fr/coati/

Pre-requisites if any: Graph theory, algorithms, data-structure

Detailed description: indicate the context of the work, what is

expected from the intern, what will be the outcome (software,

publication, …).

Context: Mobility is an important aspect of smart-cities. Consequently, there is a growing demand for services offering efficient itinerary planning. Typically, a traveler wants to be informed of the best ways to reach its destination, using any combination of the possible means of transportation (buses, tram, metro, bicycles, etc.), and with a simple query. The main difficulty of such multi-modal itinerary computation, apart from the number of possibles modes of transportation that have to be combined, is to propose realistic itineraries. Indeed, if the announced travel-time of an itinerary is 15min and that the real travel-time is 25min, the traveller is right to be unhappy.

Nowadays, even in major French cities where real-time data are available on all channels, itinerary calculations are always based on theoretical timetables (e.g., in Paris). Therefore, the proposed itinerary does not take into account the actual state of the network (delay of a bus, traffic jam, unavailability of a bicycle, etc.), and the announced travel-time is often underestimated. In medium-scale cities (e.g., Nice), better solutions are now proposed. For instance, SMEs like Instant-System integrates and continuously refreshes the position of all buses, subways, trams, etc. on the network and uses them in the itinerary calculations. Nonetheless, the proposed solution is not scalable and many improvements are necessary.

Objectives: This internship aims at studying and developing algorithms for computing itineraries com- bining walk and bicycle (e.g., vélo bleu, vélib’). The main objective is to better estimate the overall travel-time, taking into account: the cycling speed of the user, the probability that a bicycle is available at a station and that a slot will be free to return it, the number of red-lights along the path, the slope, etc.

The first task is to compare existing algorithms both theoretically and experimentally. Then, we will work on the design of new algorithms offering better tradeoffs between pre-processing time, query-time, flexibility to handle events (blocked street, etc.), quality of the proposed itinerary, gap with real travel- time, etc. The case of electric bicycles might also be considered.

This internship will be done in the context of a collaboration with SMEs Instant-System (http:// www.instant-system.com) and Benomad (http://www.benomad.com).

References: set of bibliographical references (article, books, white papers, etc) to be read by the student before starting to work on this subject

• H. Bast et al.: Route Planning in Transportation Networks. https://arxiv.org/pdf/1504.05140, 2015.

• S. Storandt: Route Planning for Bicycles - Exact Constrained Shortest Paths made Practical via Contraction Hierarchy. ICAPS 2012.

• R. Geisberger, C. Vetter: Efficient Routing in Road Networks with Turn Costs. SEA’11.

• D. Delling, A. Goldberg, T. Pajor, R. Werneck: Customizable Route Planning. SEA’11. Preprint of the journal version available here: http://www.cs.princeton.edu/ rwerneck/papers/DGPW14-CRP-journal.pdf

• M. Baum, J., T., D. Wagner: Energy-Optimal Routes for Electric Vehicles. ACM SIGSPATIAL 2013.

• M. Baum, J. Dibbelt, L. Huebschle-Schneider, T. Pajor, and D. Wagner: Speed-Consumption Tradeoff for Electric Vehicle Route Planning. ATMOS’14.

Statistical Physics Methods for Distributed Machine Learning

Name: Konstantin Avrachenkov

Mail: K.Avrachenkov@inria.fr

Telephone: 04 92 38 77 51

Web page: http://www-sop.inria.fr/members/Konstantin.Avratchenkov/me.html

Place of the project: Inria Sophia Antipolis

Address: Bat. Lagrange, 2004 Route des Lucioles, Sophia Antipolis

Team: Maestro

Web page: https://team.inria.fr/maestro/

Pre-requisites if any: Good knowledge of Physics and Mathematics

Detailed description: Over the last few years, research in computer science

has shifted focus to machine learning methods for the analysis of increasingly

large amounts of user data. As the research community has sought to optimize

the methods for sparse data and high-dimensional data, more recently new problems

have emerged, particularly from a networking perspective that had remained in the

periphery. These new directions go beyond sparsity of data and concern the

distributed nature of data sources as well as the computation itself.

We feel that statistical physics methods such as Gibbs sampling [3] and Generalized

Potts Model [2,4] are particularly well suited to design light complexity, distributed

machine learning methods for the tasks of unsupervised and semi-supervised

learning [1].

The student is expected to work on both theoretical and practical aspects of the

topic. We intend to employ mean-field methods [5] for the analysis of the statistical

physics based machine learning algorithms. Ideally, this internship results in

a publication and with PhD Thesis which follows. A PhD Scholarship funded by Bell

Labs is available for this topic.

References:

[1] Avrachenkov, K., Goncalves, P., Mishenin, A., and Sokol, M.

Generalized optimization framework for graph-based semi-supervised learning.

In Proceedings of SDM 2012.

[2] Blatt, M., Wiseman, S. and Domany, E.

Clustering data through an analogy to the Potts model.

Advances in Neural Information Processing Systems, pp.416-422, 1996.

[3] Bremaud, P. Markov chains: Gibbs fields, Monte Carlo simulation, and queues.

Springer, 2009.

[4] Eaton, E. and Mansbach, R.

A Spin-Glass Model for Semi-Supervised Community Detection.

In Proceedings of AAAI 2012.

[5] Nishimori, H.

Statistical physics of spin glasses and information processing: An introduction.

Clarendon Press, 2001.

Web tracking via invisible Web beacons

Name: Nataliia Bielova and Arnaud Legout (Inria, INDES and DIANA teams)

Mail: nataliia.bielova@inria.fr

Telephone: 04 92 38 77 87

Web page: http://www-sop.inria.fr/members/Nataliia.Bielova/

Place of the project: Inria Sophia Antipolis

Address: 2004 route des Lucioles

Team: INDES

Web page: https://team.inria.fr/indes/

Pre-requisites if any: Web technologies: JavaScript and Web browser extensions

Detailed description:

The web has become an essential part of our society and is currently the main medium of information delivery. As the users browse the Web, their online choices, habits and preferences are continuously monitored by tracking companies. This can be very lucrative for advertising companies, yet very intrusive for the privacy of users.

Recent research has shown that third-party advertising networks and data brokers use a wide range of techniques in order to track users across the web - these techniques are used to reconstruct browsing sessions and to create profiles of users, inferring, among others, their hobbies, health status, political inclinations, and level of wealth. This information can be used to, not only deliver better targeted advertisements to users, but also to discriminate users, for example by providing customized prices for products based on a user’s willingness and ability to pay.

This internship aims at analysing the new Web tracking technologies based on “Web beacon”, or “pixel image” tracking. This tracking technology uses an invisible 1x1 pixel image that is used to send information to third-party trackers, while being invisible to the user. Web beacon tracking is particularly invasive because it cannot be blocked by Private browsing mode, AdBlock or Ghostery extensions, and not even by disabling JavaScript.

The candidate will have to run automated Web experiments and perform large-scale measurement of the Web beacon tracking on the Web. The ultimate goal is to detect which companies use Web beacon, how this technology works and propose solutions to allow companies gather statistics without infringing user’s privacy. The expected outcome of the internship is a publication at the high-level security and privacy conferences, such as IEEE Security and Privacy.

Useful Information:

This internship can be continued with a Ph.D. thesis for excellent candidates. The Ph.D. thesis could be co-supervised and co-located between Inria and Columbia University (USA) if the candidate is excellent and willing to spend 1 year at the Columbia University.

References:

1. F. Roesner, T. Kohno, and D. Wetherall. Detecting and Defending Against Third- Party Tracking on the Web. In WWW, 2012.

2. G. Acar, C. Eubank, S. Englehardt, M. Juarez, A. Narayanan, C. Diaz. The Web Never Forgets : Persistent Tracking Mechanisms in the Wild. In Proceedings of CCS 2014, Nov. 2014.

3. G. Aggrawal, E. Bursztein, C. Jackson, and D. Boneh. An analysis of private browsing modes in modern browsers. In Usenix Security Symposium, 2010.

Random graph models for Directed Social Graphs like Twitter

Contact: Stéphane Pérennes and Frédéric Giroire.

Emails: stephane.perennes@cnrs.fr, frederic.giroire@cnrs.fr

Phone: 04 92 38 50 98

Laboratory: INRIA Sophia Antipolis, COATI team-project, https://team.inria.fr/coati/

Context

Twitter is the most popular micro-blogging service in the world. It allows its users to exchange short messages (tweets) that are limited to 140 characters. It was created to enable people to find out what is currently happening with people and organizations they are interested in. The relation between users on Twitter is different from classical social networks like Facebook. Instead of bidirectional friendship links that are initiated by one user and accepted by another, Twitter uses the concept of following. Users can follow other users they are interested in, which means they subscribe to all the messages they sent. So, the links on Twitter are unidirectional, if someone follows you, you don't need to follow back. Twitter is a very interesting object of study because the unidirectional model of relationship can represent more schemas of real-life communications, thus it has a huge societal impact.

Directed social networks are growing in importance because they are the main source of information for many people, but also because the information exchanged in these networks feed traditional media (such as TV or newspaper). Also, social networks are growing in size: end of 2014, Twitter had more than 1 billion accounts.

The direct study of social graphs is hard for several reasons. Most of the information is private in a large number of OSLs (e.g. Facebook). Moreover, even for public social networks like Twitter, it is hard (if not impossible) to collect a significant part of the content. Finally, even in the positive case in which it was possible, it takes a lot of time to analyze such huge amount of information. Using a Random Graph Model may represent a good solution for some of these limitations. The idea is to model the OSN’s most important properties, such as structures and degree distributions. Then, the random graph model can be used for example to test algorithm or methods on graphs similar to an OSN, to predict the evolution of the OSN, or to examine the relations between users over time.

Objective

The goal is to focus on the analysis of directed social graphs. In particular, we will explore random graphs to model more closely directed online social graphs.

Requirements: taste for graph theory and random graphs

This internship is research oriented.

Optimizing jointly Data Center Servers and Network

Contact: Stéphane Pérennes and Frédéric Giroire.

Emails: stephane.perennes@cnrs.fr, frederic.giroire@cnrs.fr

Phone: 04 92 38 50 98

Laboratory: INRIA Sophia Antipolis, COATI team-project, https://team.inria.fr/coati/

Context

Software-defined or Software-Driven Networks (SDN) is a new networking paradigm enabling innovation, centralization of network management and preventing the so-called ossification of the Internet. SDN decouples the control plane from the data plane in network equipments, which means that a switch or a router is transformed into a simple forwarding device that applies rules sent by a remote controller using a normalized protocol. This simple approach allows network administrators to get a better control on the traffic in their network, e.g., Google has recently presented an SDN-based re-design of its core backbone where it is able to reach nearly 100% utilization of links under stringent QoS constraints [1]. SDN also enables the academic community to experiment with flexible as well as high performing equipments to test new or existing protocols. The OpenFlow protocol is the leading instantiation of the SDN concept at the moment and is supported by major manufacturers, e.g., HP, Juniper, IBM as well as open-source virtual switches like Open vSwitch [2], which is at the heart of cloud management solutions like OpenStack [3]. In particular, the SDN technology allows to optimize dynamically the placement of tasks in servers which are executed by virtual machines and of routes in networks.

Objective

We consider the problem of optimizing jointly the servers and network of a datacenter. A data center has a set of tasks to be executed by the servers (backup, computations, gaming, video streaming). Some of these tasks (backup, video streaming, computation with map reduce) generate some traffic which has to be routed throughout the data center networks.

We consider the problem of jointly affecting the tasks to servers and the network demands to routes in a network with limited capacity in order to minimize the time to carry out all the tasks.

Requirements: taste for algorithmics and network optimization.

This internship is research oriented.

References

[1] JAIN, S. et al, B4: experience with a globally-deployed software defined wan. In Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM (SIGCOMM '13). ACM, New York, NY, USA, 3-14.

[2] http://openvswitch.org/

[3] http://www.openstack.org/

Internet of Things traffic orchestration with Software Defined Networks

Names:

Dino Lopez & Guillaume Urvoy-Keller

dino.lopez@unice.fr, urvoy@i3s.unice.fr

http://www.i3s.unice.fr/~lopezpac

http://www.i3s.unice.fr/~urvoy

Place of the project:

Laboratoire I3S

2000, route des Lucioles

Les Algorithmes

Bât. Euclide B - BP 121

06903 Sophia Antipolis Cedex

Tel. +33 (0)4.92.94.27.64

Web : http://signet.i3s.unice.fr/

Pre-requisites if any:

* Good knowledges of Linux-based systems

* Python, C and Java programming

Detailed description:

=== Introduction ===

Current network technologies continue evolving to offer improved quality of service and better networking support, leading also to a high diversification of services. Just as an example, the deployment of 4G systems to increase the downloading rate of mobile devices has increased the popularity of video streaming, catalyzed the introduction of new online games and increased the number of smart homes- and smart cities-oriented services, etc [1].

During this internship, we intend to study how Software Defined Networking (SDN)can support the emerging Internet of Things (IoT).

Indeed, on the one hand, in the specific area of smart homes and smart cities, the market proposes currently a wide variety of objects with Internet access, leading what is known today as the Internet of Things (IoT). According to some reports, the IoT is in constant expansion and will continue growing in number of connected elements [2]. However, to ensure good QoS of IoT, data security, etc., the IoT traffic will require finer grain management, able to satisfy the different requirements.

On the other hand, Software Defined Network (SDN) bears the promise of better network administration. Indeed, while current network devices (e.g., routers, switches, …) use their embedded programs to process network traffic, in SDN, the intelligence that takes decisions about how to process network traffic (the Control Plane) is physically separated from the network forwarding devices (the Data Plane). Hence, SDN possesses a wider view of the network topology and the network state, and is able to provide a better network management strategy.

SDN-IoT is gaining momentum [3,4]. As a little example, the SigNet and Scale teams of the I3S Lab are members of a European Project that will start in January 2017, having as the main subject of study, the development of the SDN-IoT architecture in cellular-based networks.

=== Internship Objectives ===

In this internship, the student will be required to

* Do the state of the art (SOTA) about current propositions to administrate the IoT traffic, using either legacy network infrastructure or SDN devices.

* Devise a number of scenarios involving IoT and SDN, esp. (i) the case of a large number of IoT devices that need to be organized in different logical (SDN) networks for security and traffic management purpose and (ii) a more advanced scenario where IoT devices need to be organized into networks and need to communicate with each other and with nearby or distant servers (the former case corresponds to fog computing while the latter corresponds to cloud computing).

* Investigate some key metrics (scalability, performance) of the previous scenarios in Mininet[5]/Opennet[6] with Wifi/4G networks.

References: set of bibliographical references (article, books, white papers, etc) to be read by the student before starting to work on this subject

[1] http://www.adlittle.com/downloads/tx_adlreports/ADL_UK_Business_Benefits_01.pdf

[2] http://www.cisco.com/c/en/us/about/security-center/secure-iot-proposed-framework.html

[3] https://www.ietf.org/proceedings/91/slides/slides-91-sdnrg-3.pdf

[4] P. Thubert, M. R. Palattella and T. Engel, "6TiSCH centralized scheduling: When SDN meet IoT," Standards for Communications and Networking (CSCN), 2015 IEEE Conference on, Tokyo, 2015, pp. 42-47.

[5] http://mininet.org/

[6]Chan, Min-Cheng, et al. "Opennet: A simulator for software-defined wireless local area network." 2014 IEEE Wireless Communications and Networking Conference (WCNC). IEEE, 2014.

Service Availability Protection through Virtualization

Names:

Dino Lopez & Guillaume Urvoy-Keller

dino.lopez@unice.fr, urvoy@i3s.unice.fr

http://www.i3s.unice.fr/~lopezpac

http://www.i3s.unice.fr/~urvoy

Place of the project:

Laboratoire I3S /groupe SigNet

2000, route des Lucioles

Les Algorithmes

Bât. Euclide B - BP 121

06903 Sophia Antipolis Cedex

Tel. +33 (0)4.92.94.27.64

Web : http://signet.i3s.unice.fr/

Pre-requisites if any:

* Good knowledges of Linux-based systems

* Good knowledges of C and Python programming

Detailed description:

=== Introduction ===

Nowadays, virtualization is being massively adopted by professionals of the ICT. Indeed, virtualization is currently the cornerstone of modern Data Centers (DCs) to improve network efficiency through a finer and more flexible management of resources.

The improved management capabilities provided by virtualization relies on the non hardware-stuck nature of virtual appliances, meaning that a virtual appliance aims at being independent from the host hardware and can be “freely moved” between different physical hardwares.

Numerous are the possibilities derived from the virtualization. As a few example, researchers and computer science engineers used virtualization to provide, for instance, Infrastructure, Platform, or still Networking as a Service. Energy efficiency can be improved by mean of energy-aware server consolidation. Quality of Service can also be improved with QoS-aware VM placement strategies.

For this reason, Virtualization performances and migration capabilities has been studied theoretically and experimentally. As a results, several studies shows the impact of CPU, or network traffic over the performance seen by virtual machines. Regarding the migration capabilities, several studies tried to assess the impact of the downtime on the QoS of applications [4,5].

At the SigNet project of the I3S Lab, we have studied the virtualization implications on the system and assessing its performance [1-3]. More recently, we have also focused on the virtualization capabilities to protect services availability, when power optimization strategies for clouds might hurt the QoS perceived by customers.

=== Internship tasks and objectives ===

In this internship, the student is required to build an state of the art (SOTA) and analyze the current propositions leveraging the virtualization capabilities to provide service availability (as it is done for instance in [6]). More specifically, the student must analyze and list the propositions to enable fault tolerance, or solve mobility issues with virtualization and cloudlets.

As a second step, the student must continue with the development of the prototype currently being designed at the SigNet team to provide service availability in Cloud scenarios featuring energy-efficiency solutions.

Finally, the student will study a possible integration of this service availability solution within Openstack.

References: set of bibliographical references (article, books, white papers, etc) to be read by the student before starting to work on this subject

[1] Son-Hai Ha, Dino Lopez Pacheco, and Guillaume Urvoy-Keller. "Impact of Virtualization on Network Performance: The TCP Case". In Cloud Networking (CLOUDNET), 2013 IEEE International Conference on, Nov 2013.

[2] Adrian Arsene, Dino Lopez-Pacheco, and Guillaume Urvoy-Keller. "Understanding the network level performance of virtualization solutions". In Cloud Networking (CLOUDNET), 2012 IEEE 1st International Conference on, 1-5, Nov 2012.

[3] Container-based Service Chaining: a Performance Perspective

Sergio Livi; Quentin Jacquemart; Dino Lopez Pacheco; and Guillaume Urvoy-Keller

CloudNet 2016, 5th IEEE International Conference on Cloud Networking

[4] J. Hwang, K. K. Ramakrishnan and T. Wood, "NetVM: High Performance and Flexible Networking Using Virtualization on Commodity Platforms," in IEEE Transactions on Network and Service Management, vol. 12, no. 1, pp. 34-47, March 2015.

[5] Jiaqiang Liu, Yong Li, and Depeng Jin. 2014. SDN-based live VM migration across datacenters. SIGCOMM Comput. Commun. Rev. 44, 4 (August 2014), 583-584.

[6] W. Li, A. Kanso and A. Gherbi, "Leveraging Linux Containers to Achieve High Availability for Cloud Services," Cloud Engineering (IC2E), 2015 IEEE International Conference on, Tempe, AZ, 2015, pp. 76-83.

Algorithms and tools for checking symbolic equivalences of processes

Name: Eric Madelaine

Mail: eric.madelaine@inria.fr

Telephone: +33 6 87 47 99 80

Web page: http://www-sop.inria.fr/members/Eric.Madelaine/

Place of the project: INRIA Sophia-Antipolis

Address: 2004 rte des Lucioles

Team: AOSTE

Web page: https://www.inria.fr/equipes/aoste

Pre-requisites if any:

Some background with programming distributed systems, and with language semantics, will be appreciated.

Java and Eclipse programming experience will be a plus.

Detailed description:

We are developping a theoretical and methodological framework for the verification of properties of distributed systems, based on symbolic representations of their behaviour. This project goal is to enable model-checking and equivalence checking of properties of complex systems, either very large, data-sensitive, or with parameterized architectures, without requiring the user to specify a finite instantiation of the system, but rather by describing the system using symbolic predicates.

In this context we have already defined a finite semantic representation using automata with symbolic transitions. The next step is to define relations between these automata, either equivalences (typically bisimulation equivalences), or refinement preorders, and to design algorithms to check such relations. We foresee that such algorithms will include one classical part, with a specific partition refinement algorithm, together with a symbolic part, using a SAT solver or an SMT (satisfiability modulo theory) solver engine to deal with the symbolic part.

The student will have to get acquainted with our symbolic semantics framework, and then to:

- design an algorithm to check whether a given relation between behavioural models (i.e. between processes) is an equivalence relation

- study the properties (convergence, complexity) of the algorithm

- build a prototype implementation, using the Microsoft Z3 SMT solver as a logical engine.

References: set of bibliographical references (article, books, white papers, etc) to be read by the student before starting to work on this subject

- A theory for the composition of concurrent processes, E. Madelaine, L. Henrio, M. Zhang, FORTE'16, Heraklion, June 2016. Extended version: https://hal.inria.fr/hal-01299562v1

- pNets: an expressive model for parameterised networks of processes, PDP'15, https://hal.inria.fr/hal-01055091v2

Algorithm and optimization of covering problems using flying drones

Name: Christelle Caillouet

Mail: christelle.caillouet@unice.fr

Telephone: +33 4 92 38 79 29

Web page: http://www-sop.inria.fr/members/Christelle.Molle-Caillouet/

Place of the project: COATI, joint project team between Inria and I3S lab

Address: Inria, 2004 route des lucioles, Sophia Antipolis

Team: COATI

Web page: https://team.inria.fr/coati/

Pre-requisites: Graph theory, Approximation algorithms

Description: Recent advances of technology have led to the development of flying drones that act as wireless base stations to track objects lying on the ground. This kind of robots (also called Unmanned Aerial Vehicles or UAVs) can be used in a variety of applications such as vehicle tracking, traffic management and fire detection.

The goal is to investigate the optimal 3D deployment of multiple UAVs in order to cover a set of mobile targets. Each UAV has limited energy and its coverage performances depend on their altitude and connectivity between each other.

Theoreticaly, this problem is related to the set covering problem (and its dynamic version), and the 3D packing problem.

In this internship, we want to extend existing works on providing an efficient and reliable drone placement and scheduling by adjusting the drones position ensuring the surveillance of all the targets at the same time (mobile or fixed targets).

More precisely, the goal of the internship is to provide:

* a model (linear program) extending previous works and solving the covering problem of mobile targets while ensuring connectivity among the drones;

* approximation algorithms in order to solve efficiently real instances.

Useful Information:

[1] M. Mozaffari, W. Saad, M. Bennis and M. Debbah, "Efficient Deployment of Multiple Unmanned Aerial Vehicles for Optimal Wireless Coverage," in IEEE Communications Letters, vol. 20, no. 8, pp. 1647-1650, Aug. 2016.

[2] L. Di Puglia Pugliese, F. Guerriero, D. Zorbas, T. Razafindralambo. Modelling the mobile target covering problem using flying drones. Optimization Letters, Springer Verlag, 2015, pp.29.

Program performance optimisation of multi-threaded applications on multi-core processors

Name: Sid TOUATI

Mail: Sid.Touati@inria.fr

Web page: http://www-sop.inria.fr/members/Sid.Touati/

Place of the project: INRIA-Sophia

Team: AOSTE

Web page: https://www.inria.fr/equipes/aoste

Pre-requisites if any: Operating systems, compilers, threads.

Detailed description: The aim of this internship is to:

- analyse the performances of a C++ application with openMP: hotspot functions, bottlenecks, etc.

- optimise the performance of the application by using advanced compiler option

- optimise the performance of the application by doing some code modification or thread affinity strategies.

- doing a final rigorous statistical study (validation step).

For more details, please email me.

Abstracting OpenMP program control structure into Model of Computation form

Supervisor: Robert de Simone (Robert.de_Simone@inria.fr)

Location: Aoste team, Inria Sophia-Méditerranée & University Nice Côte d'Azur (UMR I3S)

Web site: https://team.inria.fr/aoste/

Salary: "gratification", as requested by Ubinet rules

Description:

So-called "Programming Models" for parallel programming, such as OpenMP, MPI, OpenCL and others, generally consist of additional structuring information added around a conventional general-purpose, sequential programming language (such as C/C++, Fortran, Java,...).

It can be presented in an imperative fashion (such as in MPI where communicating processes description and interconnect topologies are (partially) informed, or in a declarative fashion, such as in OpenMP where potential parallelism of several forms can be informed as annotations on an otherwise sequentially structures program. Since a huge collection of programs consist of nested loops and regular numerical algorithms, there is hope to extract and represent all this potential parallelism information into the shape of Process Networks and more generally formal models of computations.

Lexical analysis of OpenMP programs can be performed using existing compilers. The topic of the internship is to consider the extraction of PN information, adjusting the representation either by extending PN expressiveness or by abstracting some of the concrete features, mainly on recursive taks or data-dependent switching instructions. The practical approach to generate the Abstract Syntax Tree of an OpenMP application written in C/C++ will use tools such as clang-llvm, based on the LLVM modular C compiler. The combinaison of this with the additional pragma information, mostly describing data and task parallelism, will result in the shape of a task graph, posibly including iterative tasks as well.

The purpose of the internship is to study the transformation patterns, and provide a first prototypal version with conservative assumptions on the program (nested loops with static bounds for instance). It may lead to a PhD topic aiming at covering a more ambitious class of parallel program descriptions, less static.

Experiments have been conducted in the reverse direction (producing OpenMP programs from Polyhedral Process Networks). There are also existing efforts in the description of Internal Representation (IR) formats for static control OpenMP programs, which could under conditions be enhanced to the status of formal models in the Process Network family.

references:

- http://polly.llvm.org/

- http://www.cse.iitm.ac.in/~raghesh/raghesh-a-masters-thesis.pdf

- http://impact.gforge.inria.fr/impact2015/papers/impact2015-chatarasi.pdf

- http://pluto-compiler.sourceforge.net/

- https://hal.inria.fr/hal-00752626/en

- http://clang.llvm.org/doxygen/group__CINDEX.html

Coevolving Networks

Name: Giovanni Neglia (Inria), Alain Jean-Marie (Inria), Daniel Figueiredo (UFRJ) Mail: giovanni.neglia@inria.fr, alain.jean-marie@inria.fr, daniel@land.ufrj.br Webpages: http://www-sop.inria.fr/members/Giovanni.Neglia,

http://www-sop.inria.fr/members/Alain.Jean-Marie/ http://www.land.ufrj.br/~daniel/

Where?

Place of the project: Inria Sophia-Antipolis Méditerranée Address: 2004 route des Lucioles, Sophia Antipolis Team: Maestro

Web page: https://team.inria.fr/maestro/

What?

Title: Coevolving Networks

Pre-requisites: Solid knowledge of the contents of the Ubinet courses “Graph algorithms” and “Performance Evaluation of Networks.” Good programming skills. Contact G. Neglia if you have doubts about your match with the topic.

Description:

Multiplex networks have recently raised as “one of the newest and hottest themes in the statistical physics of complex networks” [Lee15] (see also [Dag14] for an overview). They originate from the observation that many complex systems, ranging from living organisms to critical infrastructures, operate through multiple layers of distinct interactions among their constituents. For example, in the human society a single individual belongs to different personal, professional, hobbyist, etc., communities in the off-line world, but also might have multiple accounts in so-called online social systems (Facebook, Twitter, etc.). In most of the cases it is not possible to simply stack the many layers in a single graph: diseases can only spread through physical contacts but information about the spread of the disease and how to counter-act it can propagate through both the networks with different speeds. In the case of smart grids, electricity flows through a network of cables, transformers and generators, but a faster network of sensors and actuators communicating through different transmission technologies can control the underlying network. Moreover, the evolution of a network in terms of its nodes and edges over time is likely to be correlated to the evolution of other networks, for example, online social networks are obviously dependent on the relationship existing in the physical world (but they also contribute to reshape them). Modeling and predicting the behavior of a network is bound to require a more holistic view where information on many networks are used. It has been shown that multiplex can exhibit very different properties depending on i) the structural coupling of the different layers, i.e. how nodes are linked through different layers, but also on ii) the type of coupling, i.e. that is, how the function of one layer affects that of another. These interactions can lead to a strong departure from the phenomena observable in a single-layer network, like the discontinuous transition in the size of giant mutual component at the critical fraction of random node removals, observed in [Bul10] for their percolation model of cascade failures in coupled ICT/power networks, or the intricate and correlated interactions over multiple kinds of relationships across a given set of individuals [Sze10].

The focus of this internship is on the co-evolution of networks. There are a few models for coevolving multiplex growth [Kim13,Nic13], essentially generalizing the Barabási- Albert preferential attachment model. These models intrinsically assume that i) the different layers evolve at the same rate and ii) their mutual influence is symmetric. This is not necessarily the case in real networks: for example online social networks may evolve much faster than their offline counterparts and it is simpler to add communication infrastructure to the power grid than to add new power lines. In a similar way, the development of social networks is mainly driven by the underlying offline social network,1 and the sensor-actuator network in a power grid follows the evolution of the underlying energy distribution network. We plan then to investigate these aspects. In particular, we want to provide specific statistical tools that allow researchers to identify the relations among different layers in real situations. For example, the co-authorship network (two authors are connected if they wrote a paper together) and the citation network (paper A is connected to paper B if A cites B) are clearly co-evolving networks, but can we identify a main driver of their co-evolution? Is a previous collaboration that leads a researcher to know better and cite more the papers from a colleague, or is it the fact to know well his/her work that finally leads to co- author a paper together? This activity will benefit from our previous work on random walk dynamics and dynamic graphs [Fig12] and on co-evolving random walks and network growth [Amo16,FigXX]. We will also rely on the large literature on the link prediction problem [Lib03], including the recent approaches based on machine learning starting from [Lic10].

The student will start carrying on the following specific tasks:

- retrieve the publication archive of the Americal Physical Society

(journals.aps.org)

- sanitize the dataset by applying some state of the art techniques to resolve

author ambiguity

- build the citation and co-autorship graphs

- study the basic properties of such graphs (diameter, degree distribution, etc.)

- evaluate empirically to what extent the appearance of a link in one graph allows

to predict the appearance of the corresponding link in the other graph

- formalize a test to assess the statistical significance of the conclusion above

- propose a model for the co-evolution of the two networks

1 This was definitely true at the beginning of the development of online social networks, when only people physically met were added as online connections. Nowadays, the situation is probably more complex, still, the network of offline social contacts is probably weakly influenced by the evolution of online contacts.

[Amo16] B. Amorim, D. Figueiredo, G. Iacobelli, G. Neglia, “Growing Networks Driven by Random Walks,” the 7th Workshop on Complex Networks (CompleNet 2016), March 23- 25, 2016, Dijon, France

[Bul10] S.V. Buldyrev, R. Parshani, G. Paul, H.E. Stanley, S. Havlin, Catastrophic cascade of failures in interdependent networks, Nature 464, 1025 (2010)

[Dag14] “Networks of networks: The last frontier of complexity, ” edited by G. D’Agostino, A. Scala (Springer, 2014)

[Fig12] D. Figueiredo, P. Nain, B. Ribeiro, E. de Souza e Silva, and D. Towsley, Characterizing Continuous Time Random Walks on Time-varying Graphs, SIGMETRICS 2012.

[FigXX] D. Figueiredo, G. Iacobelli, G. Neglia, Graph Builder Random Walk, under submission.

[Kim13] J.Y. Kim, K.-I. Goh, Coevolution and Correlated Multiplexity in Multiplex Networks, Phys. Rev. Lett. 111, 058702 (2013)

[Lee15] K. M. Lee, B. Min, KI Goh, Towards real-world complexity: an introduction to multiplex networks, The European Physical Journal B (2015)

[Lib03] D. Liben-Nowell and J. Kleinberg, The link prediction problem for social networks. In Proceedings of the twelfth international conference on Information and knowledge management (CIKM '03). ACM, New York, NY, USA, 556-559.

[Lic10] R. N. Lichtenwalter, J. T. Lussier, and N. V. Chawla. New perspectives and methods in link prediction. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '10). ACM, New York, NY, USA, 243-252.

[Nic13] V. Nicosia, G. Bianconi, V. Latora, M. Barthélemy, Growing Multiplex Networks, Phys. Rev. Lett. 111, 058701 (2013)

[Sze10] Szell, M., Lambiotte, R., Thurner, S., Multirelational organization of large-scale social networks in an online world, Proc. Nat. Acad. Sciences, 107(31), (2010).

Improving Caching Policies for Spark

Name: Giovanni Neglia (Inria), Fabrice Huet (I3S), Hlib Mykhailenko (Inria/I3S), Pietro Michiardi (Eurecom)

Mail: giovanni.neglia@inria.fr, fabrice.huet@unice.fr, hlib.mykhailenko@inria.fr, pietro.michiardi@eurecom.fr

Webpages: http://www-sop.inria.fr/members/Giovanni.Neglia, https://sites.google.com/site/fabricehuet/

http://www-sop.inria.fr/members/Hlib.Mykhailenko/ http://www.eurecom.fr/~michiard/

Place of the project: Inria Sophia-Antipolis Méditerranée Address: 2004 route des Lucioles, Sophia Antipolis Team: Maestro

Web page: https://team.inria.fr/maestro/

Pre-requisites: Good knowledge of Java and hands-on approach to distributed systems. Knowledge of Scala language would definitely be an important plus. Good analytical skills.

Description:

Spark [1] is an in-memory distributed processing framework, that is experiencing rapid growth and adoption due to its use in big data analytics. A crucial reason for its success is its ability to persist intermediate data in memory between computation tasks, which eliminates significant amount of disk I/Os and reduces data processing time.

Crucial to Spark performance is then the capability to maintain in cache the "right data," i.e. the information that is likely to be requested again in the near future.

To this purpose, it has been recently proposed to take advantage of the application execution flow to determine which content to cache [2,3,4]. In particular, Spark represents the execution flow as a Directed Acyclic Graph (DAG), that captures data dependencies across multiple stages.

During the internship the student should

- study the relevant papers indicated below

- propose other caching policies exploiting the DAG information

- implement them in Spark and evaluate their performances

We think this research topic has the potential to bring to a publication by the end of the internship.

[

1] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, Spark: Cluster computing with working sets. In Proc. 2nd USENIX HotCloud, 2010.

[2] L. Xu, M. Li, L. Zhang, A. R. Butt, Y. Wang, and Z. Z. Hu, MEM- TUNE: Dynamic memory management for in-memory data analytic platforms, in Proc. IEEE International Parallel and Distrib. Process. Symposium and Design (IPDPS), 2016.

[3] Y. Yu, W. Wang, J. Zhang, and K. B. Letaief, LRC: DAG-Aware Cache Management in Distributed Data Analytics Systems, IEEE Infocom, 2017

[4] Mingxing Duan, Kenli Li, Zhuo Tang, Guoqing Xiao, and Keqin Li, "Selection and replacement algorithms for memory performance improvement in Spark," Concurrency and Computation: Practice and Experience (Special Issue on WPBA 2014), vol. 28, no. 8, pp. 2473-2486, June 2016.

Overlay routing in CDNs for Quality of Experience optimization

Main Advisor : Ramon Aparicio-Pardo

Secondary Advisor: Lucile Sassatelli

Emails : {raparicio,sassatelli}@i3s.unice.fr

Laboratory : I3S, SigNet group (2000, route des Lucioles – Sophia Antipolis)

Webpage: http://signet.i3s.unice.fr

Description :

IP video traffic is expected to grow fourfold from 2015 to 2020 up to representing 82 percent of all IP traffic (business and consumer) by 2020. While a large variety of Video on Demand (VoD) and video streaming services have emerged in the past years, the field continues to evolve rapidly. The way people watch video is constantly evolving and in recent years has mainly been driven by mobile usage. For instance, live streaming embedded in social media platforms is a relatively new phenomenon, but this technology is growing and rapidly evolving with services such as Facebook Live or Periscope.

With the explosion of streaming services that deliver Internet video to the TV and other device endpoints, content delivery networks (CDNs) have prevailed as dominant technology to deliver such content. Globally, 72 percent of Internet video traffic will cross Content Delivery Networks (CDN) by 2019.

CDNs employ overlay routing to relay traffic to the edge server assigned to the end client. There exist solutions to express and solve the routing problem when the objective is a simple function of certain Quality of Service (QoS) metrics (such as delay, throughput, loss rate, jitter). It is however known that these raw metrics are poor predictors for the perceived Quality of Experience (QoE – re-bufferings, definition, etc.) by the video client [1].

The goal of the internship is to express and study possible formulations of the overlay routing problem where the objective is a function representing the Quality of Experience (QoE) from the QoS metrics.

Phase 1: Some models connecting QoE with QoS for video are available from the literature [2-8]. From these available models, explicit functions amenable to network optimization will be selected or re- designed. Different types of regression techniques will be in particular considered.

Phase 2: Implementation of the optimization model within a solver such as ILOG CPLEX (available in Python, Matlab, C++, Java). Obtaining results on real-world topologies and traffic traces.

Phase 3: Investigation of the possible decompositions.

Pre-requisites:

Knowledge on networking and routing

Knowledge of video streaming distribution

Programming skills (Matlab or Python)

Basic knowledge in optimization

Additional information:

This internship is part of a collaboration with the Mathematical and Algorithmic Sciences Lab at Huawei’s French Research Center (FRC) in Paris.

Salary: Gratification (550 euros)

References :

[1] H. Nam, K.-H. Kim and H. Schulzrinne. QoE Matters More Than QoS: Why People Stop Watching Cat Videos. In Proc. Infocom, 2016.

[2] X. Liu et al.. A Case for a Coordinated Internet Video Control Plane. In Proc. Sigcomm 2012.

[3] F. Dobrian, V. Sekar, A. Awan, I. Stoica, D. A. Joseph, A. Ganjam, J. Zhan, and H. Zhang. Understanding the impact of video quality on user engagement. In Proc. SIGCOMM, 2011.

[4] R. K. P. Mok, E. W. W. Chan, X. Luo, and R. K. C. Chang. Inferring the QoE of HTTP Video Streaming from User-Viewing Activities. In Proc. SIGCOMM W-MUST, 2011.

[5] H. H. Song, Z. Ge, A. Mahimkar, J. Wang, J. Yates, Y. Zhang, A. Basso, and M. Chen. Q-score: Proactive Service Quality Assessment in a Large IPTV System. In Proc. IMC, 2011.

[6] A. Balachandran, V. Sekar, A. Akella, S. Seshan, I. Stoica, and H. Zhang. Developing a predictive model of quality of experience for internet video. In ACM SIGCOMM ’13.

[7] T. Spetebroot, S. Afra, N. Aguilera, D. Saucez, C. Barakat. From network-level measurements to expected Quality of Experience: the Skype use case. In Proc. of the IEEE International Workshop on Measurement and Networking (M&N), Coimbra, Portugal, October 2015.

[8] F. Zhang, W. Lin, Z. Chen and K. N. Ngan. Additive Log-Logistic Model for Networked Video Quality Assessment. In IEEE Trans. on Image Processing, vol. 22, no. 4, pp. 1536-1547, April 2013.

Deep Learning and Best Practices

Name:Frédéric Precioso

Mail:precioso@i3s.unice.fr

Telephone: +33 (0)4 92 96 51 43

Web page: http://www.i3s.unice.fr/~precioso/

Place of the project: Laboratoire I3S

Address: Polytech’Nice-Sophia, 930 route des Colles, BP 145, 06903 Sophia Antipolis CEDEX, FRANCE

Team: SPARKS

Web page: https://sparks.i3s.unice.fr

Pre-requisites if any: Knowledge in Artificial Neural Networks

Detailed description (context, expectations, outcome):

Context

“Deep learning is a branch of machine learning, employing numerous similar, yet distinct, deep neural network architectures to solve various problems”. When it comes to applying deep learning to solve a given problem, one doesn’t know before-hand the network topology (number of neurons, layers, etc.) that will give the best results. While it has been shown numerous times that deep learning can solve very efficiently some really complex problems, it is not yet completely understood how such results are obtained. As such, one willing to solve his own deep learning problem would have to try out numerous possibilities in order to find one suitable for their needs. As it is challenging to develop, debug and scale up deep learning algorithms, many sophisticated approaches have been proposed to help solving deep learning problems [11, 12]. However, the efficiency of these methods depends on the problem to solve, the capacity of your machines and what you want to do with the results. In this context, one can wonder if it is possible to automate these practices in order to produce appropriate deep learning workflows for their problem.

This project will be part of the ROCKFlows project. ROCKFlows1 (Request your Own Convenient Knowledge Flows) is an exploratory project, aiming to help users creating their own Machine Learning Workflows. According to user dataset and objectives, the platform aims to generate the most suitable workflow, depending on the problem to be solved.

Objectives

The purpose of the internship is double:

- experiment with deep learning in order to better understand the internal mechanisms of complex neural networks, and how the input data and the hyperparemeters relate to the performance of the algorithms. This involves finding appropriate case studies, and studying state of the art techniques in order to apply and compare existing techniques.

- Incorporate gained knowledge in the ROCKFlows platform [13]. This involves working with the engineering team to understand how deep learning can be represented and exposed to a non-expert user and to automatically generate code that will suit her.

We want to publish the results in a journal.

References (ref to read before starting work):

[1] Arel, I., Rose, D. C., & Karnowski, T. P. (2010). Deep machine learning-a new frontier in artificial intelligence research [research frontier]. IEEE Computational Intelligence Magazine, 5(4), 13-18.

[2] R Deep Learning Essentials - By Dr. Joshua F. Wiley

[3] Bengio, Y., Goodfellow, I. J., & Courville, A. (2015). Deep learning. An MIT Press book in

preparation. Draft chapters available at http://www. iro. umontreal. ca/∼ bengioy/dlbook. [4] Bergstra, J. S., Bardenet, R., Bengio, Y., & Kégl, B. (2011). Algorithms for hyper-parameter

optimization. In Advances in Neural Information Processing Systems (pp. 2546-2554).

[5] Deng, L., Hinton, G., & Kingsbury, B. (2013, May). New types of deep neural network learning for speech recognition and related applications: An overview. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 8599-8603). IEEE.

[6] Kang, K. C., Sugumaran, V., & Park, S. (Eds.). (2009). Applied software product line engineering. CRC press.

[7] Bengio, Y. (2012). Deep Learning of Representations for Unsupervised and Transfer Learning. ICML Unsupervised and Transfer Learning, 27, 17-36.

[8] Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(Feb), 281-305.

[9] Larochelle, H., Erhan, D., Courville, A., Bergstra, J., & Bengio, Y. (2007, June). An empirical evaluation of deep architectures on problems with many factors of variation. In Proceedings of the 24th international conference on Machine learning (pp. 473-480). ACM.

[10] Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems (pp. 2951-2959).

[11] Bengio, Y. (2012). Practical recommendations for gradient-based training of deep architectures. In Neural Networks: Tricks of the Trade (pp. 437-478). Springer Berlin Heidelberg.

[12] Orr, G. B., & Müller, K. R. (Eds.). (2003). Neural networks: tricks of the trade. Springer.

[13] Camillieri C, Parisi L, Blay-Fornarino M, Precioso F, Riveill M, Vaz JC (2016) Towards a Software Product Line for Machine Learning Workflows: Focus on Supporting Evolution. Proc. 10th Work. Model. Evol. co-located with ACM/IEEE 19th Int. Conf. Model Driven Eng. Lang. Syst. (MODELS 2016), Saint-Malo, Fr. Oct. 2, 2016. pp 65–70

[14] How to Choose a Neural Network. (2016). Retrieved from: https://deeplearning4j.org/neuralnetworktable

[15] Parisi L, Camillieri C, Blay-Fornarino M, Precioso F, Riveill M, Comparison of Workflows: a Step Further, 20 pages, Soumis ECML-PKDD 2017

Learning from experiments on ML Workflows

Name: Mireille Blay Fornarino

Mail: blay@i3s.unice.fr

Telephone: 04 92 96 51 61

Web page: http://mireilleblayfornarino.i3s.unice.fr/

Place of the project: Laboratoire I3S

Address: Polytech’Nice-Sophia, 930 route des Colles, BP 145, 06903 Sophia Antipolis CEDEX, FRANCE

Team: SPARKS

Web page: https://sparks.i3s.unice.fr

Pre-requisites if any: ML

Detailed description (context, expectations, outcome):

Context

Constructing a Machine Learning (ML) workflow depends at least (i) on the collected data set, (ii) what you want to do with the results. This task is highly complex because of the increasing number of available algorithms and the difficulty in choosing the right algorithms and their combinations as well as parametrization. Moreover, to decide which algorithm to choose scientists often need to use analysis tools and try and compare algorithms to know which algorithm performs best.

ROCKFlows project1 aims to help users creating their own Machine Learning Workflows. According to user dataset and objectives, the platform aims to generate the most suitable workflow, depending on the problem to be solved [2].

We base our knowledge on ML experiments that we have run on more than 100 dataset, 60 algorithms and 10 pre-processing techniques [3][4]. Based on the results we rank the ML workflows (pre-processing + algorithm) in order to advise user the best workflows for her need.

Objectives

This internship consists in to study more thoroughly the experiment results, thanks with machine learning techniques. The purpose is to find links between the input data of a machine learning problem (number of elements, correlation between attributes, ..) and the results of the workflows in according to several criteria such as accuracy, execution time or memory usage.

A first step, currently being addressed through a PFE is to define a Machine Learning workflow that is capable, given a set of workflow experiment to:

- find datasets patterns, e.g. groupings of datasets with some common features for which algorithms behave similarly, thus finding links between input data and performances.

- provide a way to express these results in a “logical” manner, like through constraints.

As part of the study, the student is expected to propose new representation or metrics to represent the input data that would help in obtaining better results. It is also possible that she will have to work with openly available experiments data, such as OpenML [5].

This Internship is in line with the PFE. It aims to extend this work by taking into account more parameters in the construction of the patterns and by generalizing the approach in order to be able to easily extend it either in terms of parameters on the data sets or on the properties of the results. Scaling up the results so that they can be "easily" taken into account in a product line approach could also be studied. This work aims at (i) producing analysis workflows to answer the general problem of the selection of the algorithms[7], (ii) to position these searches in the context of the algorithm portfolios that we extend to the workflows[6,8], (iii) to publish these results.

References (ref to read before starting work):

[1] S. Apel and C. K astner. An overview of feature-oriented software development. Journal of

Object Technology (JOT), 8(5):49{84, July/August 2009.

[2] Camillieri C, Parisi L, Blay-Fornarino M, Precioso F, Riveill M, Vaz JC (2016) Towards a Software Product Line for Machine Learning Workflows: Focus on Supporting Evolution. Proc. 10th Work. Model. Evol. co-located with ACM/IEEE 19th Int. Conf. Model Driven Eng. Lang. Syst. (MODELS 2016), Saint-Malo, Fr. Oct. 2, 2016. pp 65–70

[3] Parisi L, Camillieri C, Blay-Fornarino M, Precioso F, Riveill M, Comparison of Workflows: a Step Further, 20 pages, Soumis Journal ECML-PKDD 2017

[4] Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we Need Hundreds of Classifiers to Solve Real World Classification Problems? J Mach Learn Res 15:3133–3181.

[5] Joaquin Vanschoren, Jan N. van Rijn, Bernd Bischl, and Luis Torgo. OpenML: networked science in machine learning. SIGKDD Explorations 15(2), pp 49-60, 2013.

[6] Kevin Leyton-Brown Eugene Nudelman GAJMYS (2003) A Portfolio Approach to Algorithm Selection. Int. Jt. Conf. Artif. Intell.

[7] Rice JR (1976) The Algorithm Selection Problem. Adv Comput 15:65–118.

[8] Vanschoren, J., Blockeel, H., Praringer, B. & Holmes, G. (2012). Experiment databases: A new way to share, organize and learn from experiments. Machine Learning, 87(2), 127-158.

Autonomic Big Data Processing

Name: Fabrice Huet

Mail: fabrice.huet@unice.fr

Telephone: 04 92 38 79 77

Web page: https://sites.google.com/site/fabricehuet/home

Place of the project: I3S

Address: Sophia Antipolis

Team: Comred/Scale

Web page: https://team.inria.fr/scale/

Pre-requisites if any: Java required, an experience with Scala would be great.

Detailed description:

The advent of Big Data has given birth to a large number of models and environment to process

large amount of data, such as MapReduce[1] and its implementation[2]. More recently, a lot

of work has been focused on processing data streams[3,4] or a mix of batch and streams[5] in

so called Lambda architectures[6].

Most of theses works assume a static environment and only deal with dynamic resources in

case of failures, i.e. restarting failed nodes or recomputing lost results. However, nothing

is really static in practice. Data can vary during a computation (complexity or size of intermediate results),

forcing a user to estimate how much resources he/she will need to complete a job. In a previous work[7], we

have shown that it is indeed possible albeit not trivial. Moreover, some external constraints such as energy

consumption or dedicated hardware can impact the number of resources available at runtime for a long running

job. Overall, both data and resources introduce some dynamicity.

The goal of this internship is to study the dynamic allocation of resources in the Spark[5] framework and propose some autonomic mechanism.

First, the candidate will survey existing solutions and experiment

with the basic mechanisms implemented in these two platforms. Second, the candidate will devise scenarios

highlighting the benefits of dynamic resources allocations. Finally, some new mechanisms will be proposed

and implemented to allow the framework to request/release resources based on its current load.

References: set of bibliographical references (article, books, white papers, etc) to be read by the student before starting to work on this subject

[1] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters. Commun. ACM, 51(1):107–113, January 2008.

[2] Hadoop. https://hadoop.apache.org/.

[3] Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, and Dmitriy Ryaboy. Storm@twitter. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD ’14, pages 147–156, New York, NY, USA, 2014. ACM.

[4] Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel, Karthik Ramasamy, and Siddarth Taneja. 2015. Twitter Heron: Stream Processing at Scale. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15). ACM, New York, NY, USA, 239-250. DOI=http://dx.doi.org/10.1145/2723372.2742788

[5] Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. Spark: Cluster computing with working sets. In Proceed- ings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud’10, pages 10–10, Berkeley, CA, USA, 2010. USENIX Association.

[6] Lambda architecture. http://lambda-architecture.net/.

[7] Ge Song, Zide Meng, Fabrice Huet, Frederic Magoules, Lei Yu, and Xuelian Lin. A Hadoop MapReduce Performance Prediction Method. In HPCC 2013, pages 820–825, Zhangjiajie, China, November 2013.

Virtual Reality: Impact of Content and Transmission Strategies on Network

Advisors : Lucile Sassatelli, Ramon Aparicio-Pardo, Anne-Marie Pinna-Déry, Diane Lingrand

Emails : {first.last}@unice.fr

Laboratory : I3S, SigNet and S3 groups (2000, route des Lucioles – Sophia Antipolis)

Description :

VR is growing fast with different companies rolling out cheap and not-so-cheap head-mounted sets in early 2016, from dedicated headsets like Oculus Rift and HTC Vive down to smartphone- dependent headsets (e.g., Samsung Gear VR, Google Cardboard and alike, to watch the phone screen an inch away from the eyes with magnifying lenses). VR platforms are also on the rise, such as YouTube 360 (YT 360) for distribution or Daydream presented at the last Google I/O conference in May 2016.

On the one hand, VR represents a tremendous revolution in the user’s experience, but VR also entails a daunting challenge for streaming transmission over the Internet (that is, Youtube-like, without download). The bit rates entailed by 360° videos (even H.265-compressed) are indeed much higher than for conventional videos (immersive smartphone apps [1] require about 28Mbps). These network speeds are hardly available in home accesses (of ADSL-type), forcing to offer the download option to avoid interruptions and low definitions. The problem is exacerbated with self- contained headsets able to render 4K/360° videos - possibly in 3D [2].

To tackle the challenge of streaming VR, an example of a simple approach is to send in priority some portions of the scene. A pre-selection of these portions can be made by making use of the Spatial Relationship Description (SRD) amendment to the MPEG DASH standard [4,5].

The goal of the internship is to show that similar network strategies can be used to reduce the bandwidth requirements without degrading the Quality of Experience (QoE) of the VR user.

Phase 1: Mastering and possibly completing the platform to run the user experiments. This platform will be based on the ffmpeg and Mp4box tools [6,7] and will feature the tracking of the user’s field of view by the inertial measurement unit.

Phase 2: Selection of the set of contents, transmission strategies and editing choices to make the data base. Expression of the expected results in terms of network resource consumption.

Phase 3: User experiments and analysis of results.

Phase 4: Study of different content goals: identifying the possible bandwidth savings depending on the type of video. This step shall involve sociologist colleagues.

Pre-requisites:

Knowledge of video streaming distribution Programming skills (Java, Android, Matlab or Python)

Additional information:

This internship is part of a new Idex project on User-Centric Approaches for Streaming VR. Apart from R. Aparicio and L. Sassatelli working on video streaming and network optimization, this project also involves people from Human Computer Interface and modeling (A.-M. Pinna-Déry), from Machine learning (D. Lingrand), and a company making VR products (Adastra) with whom the intern shall interact. The intern shall also interact with other people hired on the project.

Salary: Gratification (550 euros)

References :

[1] Within application. Available: http://with.in/

[2] CNET. Everyone wanted a piece of virtual reality at this year's CES. CES 2016. Available:

http://tinyurl.com/jr9cz7h

[3] Bo Begole. Why The Internet Pipes Will Burst When Virtual Reality Takes Off. Forbes, Feb. 2016.

[4] ISO/IEC 23009-1:2014/Amd 2:2015, "Spatial relationship description, generalized URL parameters and other extensions".

[5] O. A. Niamut, E. Thomas, L. D'Acunto, C. Concolato, F. Denoual, and S. Y. Lim, "MPEG DASH SRD: spatial relationship description," ACM Int. Conf. on Multimedia Systems (MMSys), May 2016. [6] FFMPEG. Available: https://ffmpeg.org/

[7] MP4box. Available: https://gpac.wp.mines-telecom.fr/mp4box/

Quality of Experience in Virtual Reality: User Experiments and Modeling

Advisors: Diane Lingrand, Ramon Aparicio-Pardo, Anne-Marie Pinna-Déry, Lucile Sassatelli

Emails : {first.last}@unice.fr

Laboratory: I3S, SigNet and S3 groups (2000, route des Lucioles – Sophia Antipolis)

Description:

VR is growing fast with different companies rolling out cheap and not-so-cheap head-mounted sets in early 2016, from dedicated headsets like Oculus Rift and HTC Vive down to smartphone- dependent headsets (e.g., Samsung Gear VR, Google Cardboard and alike, to watch the phone screen an inch away from the eyes with magnifying lenses). VR platforms are also on the rise, such as YouTube 360 (YT 360) for distribution or Daydream presented at the last Google I/O conference in May 2016.

On the one hand, VR represents a tremendous revolution in the user’s experience, but VR also entails a daunting challenge for streaming transmission over the Internet (that is, Youtube-like, without download). The bit rates entailed by 360° videos (even H.265-compressed) are indeed much higher than for conventional videos (immersive smartphone apps [1] require about 28Mbps). These network speeds are hardly available in home accesses (of ADSL-type), forcing to offer the download option to avoid interruptions and low definitions. The problem is exacerbated with self- contained headsets able to render 4K/360° videos - possibly in 3D [2].

To tackle the challenge of streaming VR, for example, a pre-selection of portions of the scene to be sent in priority can be made by making use of the Spatial Relationship Description (SRD) amendment to the MPEG DASH standard [4,5]. This selection must however be based on criteria of user’s Quality of Experience (QoE). Recent models and recommendations to assess video quality are provided in ITU-T Rec. P.1201.2 [6]. Moreover, since VR aims to immerse the user into a virtual world, the user’s experience changes radically from video by involving more psychological-physiological parameters (motion sickness, stress, fear, discomfort, fatigue, enjoyment etc.). These aspects should also be considered in the QoE model.

On the other hand, a lot is yet to be invented and studied for VR filmmaking, in particular for the editing phase. The network design (content preparation, transmission and rendering) and the VR content creation are strongly coupled by QoE, which is yet to identify.

The goal of the internship is to investigate and identify a QoE model for streamed VR.

Phase 1: Mastering and possibly completing the platform to run the user experiments. This platform will be based on the ffmpeg and Mp4box tools [7,8] and will feature the tracking of the user’s field of view by the inertial measurement unit.

Phase 2: Design of the input and output metrics.

The design of the input space will be prepared through a study of the possible features determining QoE in VR. The network strategies to test will be determined and implemented.

The design of the questionnaire, determination of the outputs and scalarization into a QoE score will be informed by the literature in video quality assessment, in particular by the newly (March 2016) released guidelines for 3D video (ITU-T Rec. P.916 [9]).

Design of the plan of experiments. Carrying out the experiments with a headset (Samsung Gear VR) and a population of users.

Phase 3: Analysis of the QoE data obtained from the experiments. This phase will involve machine learning tools to extract the different types of factors (network strategies, content type) determining the VR users’ QoE.

Phase 4: If time permits, extension of the above study to (i) several scenes, then (ii) to more diverse content.

Pre-requisites:

Knowledge of video streaming distribution Programming skills (Java, Android, Matlab or Python) Statistics

Knowledge of statistical machine learning is a plus.

Additional information:

This internship is part of a new Idex project on User-Centric Approaches for Streaming VR. Apart from R. Aparicio and L. Sassatelli working on video streaming and network optimization, this project also involves people from Human Computer Interface and modeling (A.-M. Pinna-Déry), from Machine learning (D. Lingrand), and a company making VR products (Adastra) with whom the intern shall interact. The intern shall also interact with other people hired on the project.

Salary: Gratification (550 euros)

References :

[1] Within application. Available: http://with.in/

[2] CNET. Everyone wanted a piece of virtual reality at this year's CES. CES 2016. Available: http://tinyurl.com/jr9cz7h

[3] Bo Begole. Why The Internet Pipes Will Burst When Virtual Reality Takes Off. Forbes, Feb. 2016.

[4] ISO/IEC 23009-1:2014/Amd 2:2015, "Spatial relationship description, generalized URL parameters and other extensions".

[5] O. A. Niamut, E. Thomas, L. D'Acunto, C. Concolato, F. Denoual, and S. Y. Lim, "MPEG DASH SRD: spatial relationship description," ACM Int. Conf. on Multimedia Systems (MMSys), May 2016. [6] Marios C. Angelides, Harry Agius (Eds). The Handbook of MPEG Applications: Standards in Practice. Wiley, Nov. 2010. ISBN: 978-0-470-75007-0

[7] FFMPEG. Available: https://ffmpeg.org/

[8] MP4box. Available: https://gpac.wp.mines-telecom.fr/mp4box/

[9] ITU-T, Recommendation P.916. Information and guidelines for assessing and minimizing visual discomfort and visual fatigue from 3D video. March 2016.

Anechoic Chamber Characterisation for Wi-Fi Meshed Scenarios

Name: Walid Dabbous

Mail: walid.dabbous@inria.fr

Telephone: +33 4 92 387 718

Web page: https://team.inria.fr/diana/team-members/walid-dabbous/

Name: Thierry Turletti

Mail: thierry.turletti@inria.fr

Telephone: +33 4 92 3867 879

Web page: https://team.inria.fr/diana/team-members/thierry-turletti/

Name: Mohamed Naoufal Mahfoudi

Mail: mohamed-naoufal.mahfoudi@inria.fr

Telephone: +33 4 92 387 770

Place of the project: Inria Sophia Antipolis

Address: 2004 route des Lucioles, 06902 Sophia Antipolis

Team: DIANA

Web page: https://team.inria.fr/diana/

Background: With the constant evolution of Wireless networking, it becomes difficult to survey the performance of protocols in realistic settings. Especially, in the wireless domain where signal's propagation behavior is environment dependent, hardly controllable, and widely variable. Hence, there is need for an accurate evaluation for testing the validity and the reproducibility [1] of wireless experiment results. In fact, anechoic chambers offer a controlled environment where a fair assessment of these network protocols is possible. The Inria R2lab chamber [2] is both equipped with RF absorbers for reducing signal propagation phenomena as electromagnetic reflections, scattering and a Faraday cage for blocking external radio wave interference. A controlled environment offers the choice of artificially inducing signal propagation phenomena, which helps for assessing the network protocols in different types of environments without suffering from the randomness of real life wireless testbeds. Wireless mesh scenarios require the presence of mesh access points with non overlapping coverage areas. So, it is important to find out what wireless nodes in R2lab that can serve as access points in target mesh scenarios.

Proposition: The objective of this internship is to assess and characterise the wireless propagation environment of the R2lab testbed through running controlled experiments and analyzing results for wireless mesh scenarios. For instance, a simple experimental scenario would consist in setting up a Wi-Fi transmission between mesh access points and multiple clients, measure the metrics related to the received signals such as RSSI and channel state information (CSI) available on modern Wi-Fi cards, then analyse these metrics in regards to existent state-of-the art models. We will vary both the transmission power and the physical transmission rate of the mesh access points and study the impact at each receiver. We will derive different wireless mesh settings and deploy the BATMAN Open Mesh tool [3] to verify the choice made for the settings.

The student will use the nepi-ng tool [4] to easily describe the scenarios of interest and automate the execution, control and results collection of experiments. The experiments will be carried out in R2lab (Reproducible Research Lab), an anechoic chamber based in INRIA Sophia-Antipolis.

Prerequisites: Knowledge of Channel propagation models, statistical analysis, Programming in Python.

References:

[1] Young­ Hwan Kim, Alina Quereilhac, Mohamed Amine Larabi, Julien Tribino, Thierry Parmentelat, Thierry Turletti and Walid Dabbous. Enabling Iterative Development and Reproducible Evaluation of Network Protocols. Computer Networks, Special Issue on Future Internet Testbeds. Volume 63, 22 April 2014, pp. 238­250.

[2] R2lab: See URL: http://r2lab.inria.fr/overview.md

[3] Open Mesh BATMAN, URL: https://www.open-mesh.org/projects/open-mesh/wiki

[4] nepi-ng Experiment control tool, URL: http://r2lab.inria.fr/tuto-300-nepi-ng-install.md