Internship List 15-16

Experimenting and modeling YouTube Quality of Experience

Name: Chadi Barakat

Damien Saucez

Mail: Chadi.Barakat@inria.fr

Damien.Saucez@inria.fr

Telephone: +33492387594

Web page: http://team.inria.fr/diana/chadi/

https://team.inria.fr/diana/damien-saucez/

Place of the project: Diana Team, Inria Sophia Antipolis

Address: 2004, route des lucioles

06902 Sophia Antipolis

Team: DIANA

Web page: http://team.inria.fr/diana/

Pre-requisites if any: Knowledge in statistics, data analysis and machine learning.

Programming skills in C and Java.

Detailed description: indicate the context of the work, what is

expected from the intern, what will be the outcome (software,

publication, ...).

Context of the internship:

This proposal fits within our project aiming at the estimation of the Quality of Experience for main Internet applications departing from network-level measurements. Our project, called ACQUA [1], leverages measurements done at the network and device level as done today (bandwidth, delay, loss rate, signal strength, type of wireless connection, etc), and applies over them well calibrated models to estimate/predict the quality of experience for main applications even before launching them. ACQUA is an extendible solution in terms of the applications it can track. It allows a fine-grained profiling of the Internet access at the level of application quality of experience. Models in ACQUA are obtained via extensive experiments in a controlled environment, where network conditions are artificially changed and the corresponding Quality of Experience is written down. A data analysis and learning phase then follows to link the network conditions to the Quality of Experience and produce models that can be later plugged in our application. The analysis is also supposed to provide insights on the traffic characteristics of the application under study.

In a recent work [2], we have proved the feasibility of the approach via the Skype use case. We then started looking at the popular YouTube case with the main objective to establish links between network conditions and the Quality of Experience as perceived by the end user during video streaming in YouTube. A platform prototype has been set up to automatically playout YouTube videos while changing network conditions, and dumping data both on the network and the video streaming traffic.

Objectives of the internship:

The purpose of this internship is to build upon the above work and produce a thorough analysis of the obtained data using statistical and machine learning techniques. The internship will start by a review of the literature on YouTube Quality of Experience (QoE) and the work that has been done on ACQUA, then will move into the analysis of the data from different angels, with the main objective to provide insights on how YouTube Quality of Experience (e.g., the number of interruptions, the duration of interruptions, the buffering time) varies with the network conditions. It is expected that this analysis phase provides insights on the traffic characteristics of YouTube traffic and the interactions of this traffic with network conditions. It is also expected that this analysis produces an efficient model for YouTube Quality of Experience (QoE) that can be used in ACQUA and that allows QoE prediction from network measurement. This model should account for the different kinds of YouTube videos and for the different players and streaming strategies that are widely used in YouTube streaming. For this purpose, the platform will have to be enhanced to integrate these extensions, more data should be collected, and a model should be developed and validated.

The internship will be funded by the French National Projet ANR Bottlenet that will also fund a PhD thesis building on this work and addressing the general problem of Quality of Experience measurement, modeling and troubleshooting.

References: set of bibliographical references (article, books, white papers, etc) to be read by the student before starting to work on this subject

[1] ACQUA: Application for prediCting Quality of User experience at Internet Access. URL: http://planete.inria.fr/acqua/

[2] Thierry Spetebroot, Salim Afra, Nicolas Aguilera, Damien Saucez, Chadi Barakat. From network-level measurements to expected Quality of Experience: the Skype use case. IEEE International Workshop on Measurement and Networking (M&N), Oct 2015, Coimbra, Portugal.

[3] Athula Balachandran, Vyas Sekar, Aditya Akella, Srinivasan Seshan, Ion Stoica, and Hui Zhang. 2013. Developing a predictive model of quality of experience for internet video. SIGCOMM Comput. Commun. Rev. 43, 4 (August 2013), 339-350.

[4] Schatz, Hossfeld, Janowski, and Egger. From Packets to People: Quality of Experience as New Measurement Challenge. In: Data Traffic Monitoring and Analysis. Springer LNCS, 2013.

Efficient algorithms for the Subset Interconnection Design Problem

Christelle Caillouet & David Coudert

Tel: +33 (0)4 92 38 79 81

E-mail: christelle.caillouet@unice.fr

david.coudert@inria.fr

Context: The subset interconnection design problem [2] ask for a minimum cardinality subset of edges

of a given graph G = (V, E) such that, for each given subsets of nodes Vi

of G is connected. This problem has many applications in overlay networks, social networks, and even

in inter-molecular protein networks [1]. This problem is known to be NP-hard and both approximation

and heuristic algorithms have been proposed.

During this internship, we will consider a variant of the problem arising from structural biology. In this

variant, each vertex belongs to a certain type T and we are not aware of the exact composition of each

subset but rather of the number of vertices of each type that is composed by. In other words, the graph G

may have several nodes of the same type, and a subset Vi expresses the requirement of a certain number

of nodes of certain types. So apart from having to find the edge set E of minimum cardinality, we also

have to find the exact nodes in each subset.

Objectives: The main goal of this internship is to propose exact and approximate algorithms for instances

involving hundreds of proteins (currently unreachable with existing solutions). More precisely, a first

step is to define proper models of the problem which respect specific constraints from structural biology.

The models will be formulated as mixed integer linear programs (MILP), implemented using Sagemath,

and tested on biological datasets extracted from several types of experiments. Since the number of valid

solutions reported by the MILP might be huge and that their enumeration might require a considerable

amount of time, we will have to investigate further directions:

• Search for efficient combinatorial algorithms reporting consensus solutions, i.e., solutions which

are most likely to reflect the real connections for the big instances;

• Improve the MILP efficiency: find specific constraints (cuts, valid inequalities, etc.), or transform

the formulation to use sophisticated methods such as branch&price;

• Exploit the 3D shape of proteins to reduce the size of the search space and report only geometri-

cally feasible solutions.

Required background: Basic notions of graph algorithms, programming languages C/C++ and Python.

References:

1. D. Agarwal, J. Araujo, C. Caillouet, F. Cazals, D. Coudert, and S. Pérennes. Connectivity infer-

ence in mass spectrometry based structure determination. In European Symposium on Algorithms

(LNCS 8125), pages 289-300, Sophia-Antipolis, France, 2013. Springer.

2. D.-Z. Du and Z. Miller. Matroids and subset interconnection design. SIAM Journal on Discrete

Mathematics, 1988

Content distribution over Software Defined Network

Contacts: Thierry Turletti, Damien Saucez, Walid Dabbous

e-mail: {thierry.turletti,damien.saucez,walid.dabbous}@inria.fr

The Team: DIANA, INRIA Sophia Antipolis, URL: http://team.inria.fr/diana/

The Diana research team at INRIA conducts two main research directions: (1) designing and deploying a measurement plane to provide network service transparency and (2) defining an open architecture allowing easy programmability of the underlying network infrastructure. Our team is also highly involved in the development of realistic network evaluation tools and platforms (ns-3, DCE, nepi, OneLab) to validate the new network services, applications and protocols.

Context:

Software-Defined Networking (SDN) [1] has been proposed recently as a way to programmatically control networks, facilitating the deployment of new network core functions, applications and services, as well as tuning network policy and performance. SDN's premise is to decouple network control from the data plane, which leads to improved network programmability and consequently the network's ability to evolve and adapt to the requirements of new applications and services). In a nutshell, a centralized controller implements all the intelligence of the system while the other network elements implement elementary functions without any form of intelligence.

Work description:

The objective of the internship is to study how SDN can be used to deploy dynamic data plane resource allocation mechanisms, both for networking and computational point of view. For instance, this means the possibility to allocate dynamically CPU cores according to the applications' requirements, or to place cache functionality for popular content at strategical locations of the network. Such functionalities are usually provided through a NaaS (Network as a Service) API (Application Programming Interface) that abstracts the network and devices operations to simplify their management.

In this internship, we will focus on a particular application: content distribution over a SDN network. The work will consist in designing and implementing a network API (Application Programming Interface), built upon the Representational State Transfer (REST) [2] paradigm and validating it for a content distribution scenario in a real testbed including OpenFlow (OF) [3] routers and an OF controller such as OpenDayLight [4]. More specifically, the API will be used to represent the entire network and to place a copy of the most popular content in the caching router close to the client once the controller detects that there is congestion between the server and the client.

This internship work will be done in the context of an ANR project with partners at Thales, 6Wind and ENSL.

Pre-requisites:

- Programming language: Python, Java, shell scripting.

- Strong knowledge in Internet technologies and distributed systems (TCP/IP, routing, HTTP, JSON…)

- Technical ability to configure and manage GNU/Linux machines

References

[1] B. Astuto, M. Mendonca, X.-N. Nguyen, K. Obraczka, and T. Turletti, "Software-Defined Networking Enabled Capacity Sharing in User Centric Networks," IEEE Surveys & Tutorials, vol. 16, no. 3, pp. 1617-1634, August 2014. [Online]. Available: http://hal.inria.fr/hal-00825087

[2] Areeb Alowisheq and David E. Millard. 2009. EXPRESS: EXPressing REstful Semantic Services. In Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03 (WI-IAT '09), Vol. 3. IEEE Computer Society, Washington, DC, USA, 453-456. DOI=http://dx.doi.org/10.1109/WI-IAT.2009.324

[3] Nick McKeown, Tom Anderson, Hari Balakrishnan, Guru Parulkar, Larry Peterson, Jennifer Rexford, Scott Shenker, and Jonathan Turner. 2008. OpenFlow: enabling innovation in campus networks. SIGCOMM Comput. Commun. Rev. 38, 2 (March 2008), 69-74. DOI=http://dx.doi.org/10.1145/1355734.1355746

[4] Jan Medved, Robert Varga, Anton Tkacik, Ken Gray, "OpenDaylight: Towards a Model-Driven SDN Controller architecture", WOWMOM, 2014, 2014 IEEE 15th International Symposium on "A World of Wireless, Mobile and Multimedia Networks" (WoWMoM), 2014 IEEE 15th International Symposium on "A World of Wireless, Mobile and Multimedia Networks" (WoWMoM) 2014, pp. 1-6

Platform-based design and abstract compilation onto manycore architectures

Robert de Simone

Place of the project: Inria Sophia Antipolis

Team: AOSTE

To be detailed during the presentation of the internship.

Exploring chain of trust on Twitter

Name: Arnaud Legout and Frédéric Giroire

Mail: arnaud.legout@inria.fr, frederic.giroire@cnrs.fr

Telephone: 04 92 38 78 15

Web page:

http://www-sop.inria.fr/members/Arnaud.Legout/

http://www-sop.inria.fr/members/Frederic.Giroire/

Place of the project: Inria

Address: 2004 route des Lucioles, Sophia Antipolis

Team: DIANA and COATI

Web page:

https://team.inria.fr/diana/

https://team.inria.fr/coati/

Pre-requisites if needed: fluent in at least one programming language

Description:

Twitter is the most popular micro-blogging service in the world. It

allows its users to exchange short messages (tweets) that are limited

to 140 characters. It was created to enable people to find out what is

currently happening with people and organizations they are interested

in. The relation between users on Twitter is different from classical

social networks like Facebook. Instead of bidirectional friendship

link that are initiated by one user and accepted by another, Twitter

uses the concept of following. Users can follow other users they are

interested in, which means they subscribe for all the messages they

sent. So, the links on Twitter are unidirectional, if someone follows

you, you don't need to follow back. Twitter is a very interesting

object of study because the unidirectional model of relationship is

the closest to real-life communications, thus a huge societal impact.

The goal of this intership is to study how to influence users on

Twitter by exploiting a transitive relationship called chain of

trust. The student will have the opportunity to work on a unique datasets

we collected in 2012 [1]. This dataset represents the entire Twitter

social graph with more than 500 million accounts and 24 billion

links. We propose to explore how social links are created and how this

knowledge can be exploited by a malicious user to infringe privacy and

influence information propagation.

This internship is research oriented and can be continued with

a Ph.D. thesis for excellent students.

[1] http://www-sop.inria.fr/members/Arnaud.Legout/Projects/sotweet.html

Measuring in a virtualized environment

Name: Chadi Barakat and Guillaume Urvoy-Keller

Mail: Chadi Barakat <Chadi.Barakat@inria.fr> and Guillaume Urvoy-Keller <urvoy@unice.fr>

Telephone:

Web page: https://team.inria.fr/diana/chadi/ and http://www.i3s.unice.fr/~urvoy/

Place of the project: INRIA

Address: 2004, route des lucioles

06902 Sophia Antipolis, France

Team: Diana Team

Web page: https://team.inria.fr/diana/

Pre-requisites if needed: Networking and Measurements skills

Programming skills in C/C++ and

Java

Description:

Context

***********

A heavy trend in companies is to deploy IaaS (Infrastructure as a Service)

solutions like Openstack to manage both (virtual) servers and (virtual)

networks. Such a setup entails using virtual switches. The default choice

is most of the time Open vSwtich [1] that are managed by the network

control component of Openstack, Neutron [2]. While the global machinery

relieves the system/network administrator from the pain of manual

deployment, the mix of virtual and physical components increases the

complexity of troubleshooting performance issues when they occur. In this

proposal, we are interested in assessing how traffic measurement can be

done in a typical IaaS set-up.

Objectives

**********

The objective of this internship is threefold:

1) Assess the measurement capabilities offered by Open vSwitch, i.e. the

traffic measurement tools offered, like sFlow [3] or Netflow. Next, we

would like to assess the performance of those measurement tools, i.e. to

know at which point the measurement task can affect the fast-path (normal

delivery of packets) of the component. It will be important to evaluate

their performance both in terms of the overhead they incur on the

infrastructure and the accuracy they provide. Several research studies

have proposed to use these tools to perform various performance monitoring

tasks. It will be also important to review those works, e.g. [4][5].

2) Another heavy trend in cloud computing is the use of Network Function

Virtualization [6]. The basic idea of NFV is to pack a typical network

function like firewalling or load balancing into a VM so as to offer a

scalable service and also to deploy the function at the right place in the

network. Traffic measurements nicely fits in this new NFV paradigm and we

would like to assess if, instead of relying on the measurement

capabilities of Open vSwitch, we could deploy a NVF (network virtual

function - the actual VM featuring an NFV) close to switches to capture

their traffic. A comparison of the in-switch (objective 1) and

out-of-switch (objective 2) solutions would constitute the main

contribution of this part.

3) Given the previous two objectives which are supposed to provide a solid

understanding of the capacity/performance of each measurement method, the

purpose of this objective 3) is to build upon this understanding and

propose a network-wide solution to monitor traffic across the network by

combining the two methods. After a quick classification of measurement

functions, this solution is supposed to pinpoint the best way to balance

the measurement load between Open vSwitches and measurement-orient NVF

distributed across the network. We would like that each measurement

function is carried out with the best accuracy and the minimum overhead. A

validation of this solution by experimentation/emulation in different

deployment scenarios (entreprise, datacenter, etc) is expected.

This internship is directly related to a PhD position with a secured funding

through a Labex UCN@SOPHIA. Admission to the PhD is clearly conditioned by the

performance of the student during the internship and her/his ranking at

the Master Ubinet.

References

***********

[1] http://openvswitch.org/

[2] http://docs.openstack.org/icehouse/install-guide/install/apt/content/basic

s-networking-neutron.html

[3] http://www.sflow.org/

[4] Junho Suh, Kwon, T.T., Dixon, C., Felter, W., Carter, J, “OpenSample:

A Low-Latency, Sampling-¬Based Measurement Platform for Commodity SDN”,

Distributed Computing Systems (ICDCS), 2014 IEEE 34th International

Conference on , vol., no., pp.228,237, June 30 2014¬-July 3 2014.

[5] Minlan Yu, Lavanya Jose, and Rui Miao, “Software defined traffic

measurement with OpenSketch ”. In Proceedings of the 10th USENIX

conference on Networked Systems Design and Implementation (nsdi'13),

USENIX Association, Berkeley, CA, USA, 2942, 2013.

[6] Joao Martins, Mohamed Ahmed, Costin Raiciu, Vladimir Olteanu, Michio

Honda, Roberto Bifulco, and Felipe Huici, “ClickOS and the art of network

function virtualization”. In Proceedings of the 11th USENIX Conference on

Networked Systems Design and Implementation(NSDI'14). USENIX Association,

Berkeley, CA, USA, 459-473.

Exploring the Micro-Structure of Social Graphs

Contact: Frédéric Giroire et Arnaud Legout.

Mail: arnaud.legout@inria.fr, frederic.giroire@cnrs.fr

Telephone: 04 92 38 78 15

Web page:

http://www-sop.inria.fr/members/Arnaud.Legout/

http://www-sop.inria.fr/members/Frederic.Giroire/

Place of the project: Inria

Address: 2004 route des Lucioles, Sophia Antipolis

Team: DIANA and COATI

Web page:

https://team.inria.fr/diana/

https://team.inria.fr/coati/

Context

Twitter is the most popular micro-blogging service in the world. It allows its users to exchange short messages (tweets) that are limited to 140 characters. It was created to enable people to find out what is currently happening with people and organizations they are interested in. The relation between users on Twitter is different from classical social networks like Facebook. Instead of bidirectional friendship links that are initiated by one user and accepted by another, Twitter uses the concept of following. Users can follow other users they are interested in, which means they subscribe to all the messages they sent. So, the links on Twitter are unidirectional, if someone follows you, you don't need to follow back. Twitter is a very interesting object of study because the unidirectional model of relationship can represent more schemas of real-life communications, thus it has a huge societal impact.

Directed social networks are growing in importance because they are the main source of information for many people, but also because the information exchanged in these networks feed traditional media (such as TV or newspaper). Also, social networks are growing in size: end of 2014, Twitter had more than 1 billion accounts.

However, little is known about the inner structure of the directed social graphs. Indeed, high level statistics such as node degree, diameter, number of triangles, etc. are interesting, but do not give any information on the inner properties of the graph. Saying that such graphs are small words is also most of the time wrong even at a first level approximation. We exhibited a macrostructure of the Twitter social graph [p1], but in this macrostructure we have the largest strongly connected component gathering together 50% of all accounts. Exploring the properties of this component that represents the regular Twitter activity is important but complex because there is no appropriate theoretical tool to analyze this component.

Objective

The goal is to focus on the analysis of directed social graphs. In particular, we want to invent new graph analysis tools and metrics exhibiting specific social properties of the graph. Indeed, the biggest challenge is not to find a new network metric, but to link this metric to a social property.

The study will be based on a unique dataset, collected by the Inria team DIANA in July 2012, the full Twitter social graph [1].

In this context, the student will focus on two metrics evaluating the nature of a social link. The first one, embeddedness [2], gives a measure of a link's strength. The second one, dispersion [3], was proposed to recognize romantic partners among a user's friends.

An important point is that the two metrics have been defined on undirected graphs like Facebook. The student will see how to define them adequatly for directed graphs like Twitter. He will then investigate their meaning and efficiency on the Twitter dataset. Do they keep an interesting social meaning? This will then lead us to consider if they are more appropriate metrics for directed graphs.

A second point of study is that the dispersion metric was defined on Facebook graph to recognize romantic partners. Would this metric be efficient on Twitter graph which has a different structure of links? For example, are social partners connected on Twitter? More generally, the ratio of links of different natures (family, friendship, thematic interest,...) differs in Twitter. For example, there is a higher ratio of links corresponding to thematic interest. What is the impact on the proposed metrics? Is it possible to easily classify the links of a directed graph? What are the consequences on the graph structure?

This internship is research oriented and can be continued with a Ph.D. for excellent students.

References

[1] Maksym Gabielkov, Ashwin Rao, and Arnaud Legout. Studying Social Networks at Scale: Macroscopic Anatomy of the Twitter Social Graph. In Proc. of ACM SIGMETRICS'14, June 16--20, 2014, Austin, Texas, USA.

[2] Marsden, P. V., and Campbell, K. E. Measuring tie strength. Social Forces 63, 3 (Dec.1984), 482-501

[3] Lars Backstrom and Jon Kleinberg. 2014 Romantic partnerships and the dispersion of social ties: A network analysis of relationship status on Facebook. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing (CSCW '14). ACM, New York, NY, USA, 831-841. DOI=10.1145/2531602.2531642 http://doi.acm.org/10.1145/2531602.2531642

Study of P2P Systems for Video Live Streaming

Contact: Stéphane Pérennes and Nicolas Huin.

Emails: stephane.perennes@cnrs, nicolas.huin@cnrs.fr.

Phone: 04 92 38 50 98

Webpages: http://www-sop.inria.fr/members/Stephane.Perennes/, http://www-sop.inria.fr/members/Nicolas.Huin/

Laboratory: INRIA Sophia Antipolis, COATI team-project, https://team.inria.fr/coati/

The goal of this internship is to study and propose models to describe P2P systems. These systems are cheap to operate and scale well, but their highly distributed nature raises questions about reliability, durability, availability, confidentiality, and routing of the data. In particular, the focus of this internship will be on streaming in peer to peer networks. In such system peers are relaying packets or video frames, and the goal is to play the video the nodes ask for. This is possible only if some delay in the playback is allowed, and there exist trade-offs be- tween the number of customers nodes and the delay that the systems can support. Moreover some scheduling protocols must decide which packets are exchanged and who send them in order to achieve the best possible behavior. The analysis of P2P systems involves a wide range of techniques between practice and theory, from mathematical analysis to simulations and deployment of experiments. This will allow us to adapt the orientation of the project to the taste of the interested student.

Objective: During the internship, we will study and simulate a live streaming system for video. We will study its efficiency when there are frequent arrivals and departures of peers. The metric considered will be the delay and bandwidth usage.

The internship can be followed by an phd thesis.

[1] M. Castro, P. Druschel, A. Kermarrec, A. Nandi, A. Rowstron, and A. Singh, “SplitStream: high-bandwidth multicast in cooperative environ- ments,” in Proceedings of the nineteenth ACM symposium on Operating systems principles. ACM, 2003, p. 313.

[2] X. Zhang, J. Liu, B. Li, and T. Yum, “CoolStreaming/DONet: A data- driven overlay network for efficient live media streaming,” in proceedings of IEEE Infocom, vol. 3. Citeseer, 2005, pp. 13–17.

[3] X. Hei, C. Liang, J. Liang et al., “Insights into PPLive: A Measurement Study of a Large-Scale P2P IPTV System,” in International Word Wide Web Conference. IPTV Workshop, 2006.

[4] L. Massoulié and A. Twigg, “Rate-optimal schemes for Peer-to-Peer live streaming,” Performance Evaluation, vol. 65, no. 11-12, pp. 804–822, 2008.

[5] T. Bonald, L. Massoulié, F. Mathieu, D. Perino, and A. Twigg, “Epidemic live streaming: optimal performance trade-offs,” in Proceedings of the 2008 ACM SIGMETRICS international conference

[6] Giroire, F., Modrzejewski, R., Nisse, N., and Pérennes, S. (2013). "Maintaining Balanced Trees For Structured Distributed Streaming Systems". In Structural Information and Communication Complexity (pp. 177-188). Springer International Publishing.

A tool for static analysis and type-checking of distributed system control-flow graphs

Eric Madelaine

Telephone: 04 92 38 78 07 / 06 87 47 99 80

Web page: hhtp://www.inria.fr/eric.madelaine

Place of the project: INRIA

Team: Aoste (common team INRIA-I3S)

Web page: https://team.inria.fr/aoste/

Pre-requisites: Java, Distributed systems; Some experience with Eclipse Modeling Tools, UML, type-checking and/or static analysis will be appreciated

Description:

The modeling and verification platform VerCors, developed by SCALE team, includes modeling tools for specifying asynchronous distributed systems, tools for generating behavior models (automata) and executable Java code, and translation tools providing connections with external toolsets for model-checking. An important module of VerCors platform is a UML State Machine graphical designer allowing the user to specify the control-flow graph of a distributed process. The State Machines are then transformed into a structure given as an input to the model-checker and into executable Java code.

In order to help the user to construct State Machines appropriate for the further transformations, we would like to extend VerCors platform with a module for State Machine static correctness check. In particular, it is necessary to validate correct typing, proper declaration and usage of variables, coherency between the control-flow graphs and the rest of the designed models (e.g. system architecture). Such garanties are required (though not sufficient) to ensure that code generated from the model will compile and behave correctly.

In the course of the internship, the student will:

- analyse the set of existing graphical formalisms for distributed systems specification in Vercors in

order to extract a set of static semantics rules for control-flow graphs and UML state-machines,

- specify formally a set of properties checkable by static analysis and type-checking ,

- design an algorithm to check the formalized constraints, and implement it in the VerCors platform

- test the implementation on our use-case examples

References:

- VerCors: https://team.inria.fr/scale/software/vercors/

Optimizing the transport network of a large metropolitan area to improve citizen accessibility

Contact: Nicolas Nisse.

Emails: nicolas.nisse@inria.fr

Phone: 04 97 15 53 28

Webpages: http://www-sop.inria.fr/members/Nicolas.Nisse/

Laboratory: INRIA Sophia Antipolis, COATI team-project, https://team.inria.fr/coati/

The first part of the internship is to compute a good metric to measure the accessibility of inhabitants of a large metropolitan area to basic services such as school, work, health, commercial buildings. We will consider the case study of the city of Santiago, Chile, as our group has access to numerous data about the city: transportation system, education system, health system... The second part of the internship will be to propose solution to improve this accessibility with two possible direction: (i) building a better transportation system (ii) proposing good locations to implement new services such as, e.g., a new school.

Challenges will be multiple: algorithmic, data analysis. The size of the data makes is impossible to compute and keep into memory the matrix of distances of the transportation system. Advanced methods will have to be used to overcome the difficulty. In particular, the student is required to have a good background in graph algorithms and in their analysis.

The student must have good skills in Python (knowing C or C++ would be a plus).

The internship can be followed by an phd thesis.

Modeling Caches with Heterogeneous Contents

Name: Giovanni Neglia

Mail: giovanni.neglia@inria.fr

Web page: http://www-sop.inria.fr/members/Giovanni.Neglia/

Place of the project: Inria

Teams: Maestro (Inria)

Pre-requisites if needed: good knowledge of stochastic processes (at the level of Performance Evaluation of Networks), good knowledge of algorithms, interested in applying mathematics to understand the world

Description:

Studying the performance of a single cache (e.g. predicting its hit rate) is a surprisingly difficult task.

In 2002 Che et al. proposed an approximation that greatly simplifies this analysis by decoupling the dynamics of different contents.

Theoretical justifications for Che's approximation-as it is usually referred to-have been given in [Fri12].

Thanks to its high level of accuracy, this approach has become a key element to study networks of caches that are at the core of content centric architectures.

Nevertheless, Che's approximation assumes that all the contents stored in the cache have the same size,

but this is in general not true and some preliminary experiments show that "size matters."

The purpose of this internship is to study how Che's approximation can be extended to take into account different content sizes.

This research activity is in the framework of our research cooperation with Akamai Technologies-

the world leader in the field of Content Delivery Networks-that has provided us some real-world traces

we can use in our experiments.

Further information:

Please do not hesitate to contact me to discuss the topic.

Queueing models for renewable-powered data centers

Name: Giovanni Neglia, Sara Alouf

Mail: giovanni.neglia@inria.fr sara.alouf@inria.fr

Web page: http://www-sop.inria.fr/members/Giovanni.Neglia/ http://www-sop.inria.fr/members/Sara.Alouf/,

Place of the project: Inria Sophia Antipolis

Team: Maestro

Web page: http://www-sop.inria.fr/maestro/

Pre-requisites if needed: Good understanding of queueing theory (at the level of the course Performance evaluation of networks),

interested in applying mathematics to understand the world

Description:

The purpose of this project is to study the potential advantage to federate many micro-datacenters powered by renewable energy sources (like photovoltaics or wind turbines). Indeed, if computing tasks can be dispatched among the datacenters, it is possible to take advantage of available free energy at different locations.The student will first review the existing literature on load assignment to renewable-powered datacenters, starting from the references below. Then he/she will study some queueing models that can help to quantify for what number and size of the datacenters' and load characteristics a federation can be advantageous.

- Minimizing Electricity Cost: Optimization of Distributed Internet Data Centers in a Multi-Electricity-Market Environment.

Lei Rao, Xue Liu, Le Xie, Wenyu Liu, Infocom 2010.

- Online Algorithms for Geographical Load Balancing.

Minghong Lin, Zhenhua Liu, Adam Wierman, Lachlan L. H. Andrew, International Green Computing Conference (IGCC), 2012.

- Energy and performance management of green data centers: A profit maximization approach.

M Ghamkhari, H Mohsenian-Rad, IEEE Transactions on Smart Grid, 4 (2), 1017-1025, 2013.

Further information:

Please do not hesitate to contact us to discuss the topic.

Dynamic Big Data Processing

Name: Fabrice Huet

Mail: fabrice.huet@unice.fr

Telephone: 04 92 38 79 77

Web page: https://sites.google.com/site/fabricehuet/home

Place of the project: I3S

Address: Sophia Antipolis

Team: Scale

Web page: https://team.inria.fr/scale/

Title: Dynamic Big Data Processing

Pre-requisites if any: Java required, an experience with Scala would be great.

Detailed description:

The advent of Big Data has given birth to a large number of models and environment to process

large amount of data, such as MapReduce[1] and its implementation[2]. More recently, a lot

of work has been focused on processing data streams[3,4] or a mix of batch and streams[5] in

so called Lambda architectures[6].

Most of theses works assume a static environment and only deal with dynamic resources in

case of failures, i.e. restarting failed nodes or recomputing lost results. However, nothing

is really static in practice. Data can vary during a computation (complexity or size of intermediate results),

forcing a user to estimate how much resources he/she will need to complete a job. In a previous work[7], we

have shown that it is indeed possible albeit not trivial. Moreover, some external constraints such as energy

consumption or dedicated hardware can impact the number of resources available at runtime for a long running

job. Overall, both data and resources introduce some dynamicity.

The goal of this internship is to study the dynamic allocation of resources in two recent Big Data

processing framework, Storm[3] and Spark[5]. First, the candidate will survey existing solutions and experiment

with the basic mechanisms implemented in these two platforms. Second, the candidate will devise scenarios

highlighting the benefits of dynamic resources allocations. Finally, some new mechanisms will be proposed

and implemented to support the addition/removal of computational nodes during the execution of a job.

References: set of bibliographical references (article, books, white papers, etc) to be read by the student before starting to work on this subject

[1] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters. Commun. ACM, 51(1):107–113, January 2008.

[2] Hadoop. https://hadoop.apache.org/.

[3] Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, and Dmitriy Ryaboy. Storm@twitter. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD ’14, pages 147–156, New York, NY, USA, 2014. ACM.

[4] Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel, Karthik Ramasamy, and Siddarth Taneja. 2015. Twitter Heron: Stream Processing at Scale. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15). ACM, New York, NY, USA, 239-250. DOI=http://dx.doi.org/10.1145/2723372.2742788

[5] Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. Spark: Cluster computing with working sets. In Proceed- ings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud’10, pages 10–10, Berkeley, CA, USA, 2010. USENIX Association.

[6] Lambda architecture. http://lambda-architecture.net/.

[7] Ge Song, Zide Meng, Fabrice Huet, Frederic Magoules, Lei Yu, and Xuelian Lin. A Hadoop MapReduce Performance Prediction Method. In HPCC 2013, pages 820–825, Zhangjiajie, China, November 2013.

A Study of the verification and code generation for the Akka programming language

Ludovic Henrio

Telephone: 04 92 38 71 64

Web page: http://www-sop.inria.fr/members/Ludovic.Henrio/

Place of the project: I3S

Team: Scale

Web page: https://team.inria.fr/scale/

Pre-requisites: Distributed systems; Some experience with different programming languages

Description:

This internship is in the field of programming languages, especially languages for programming distributed applications. The main objective is to contribute to solutions enabling scalable modeling and formal verification of concurrent systems. We aim to provide developer techniques and tools for the safety analysis of concurrent and scalable applications. More precisely we will focus on one of the frameworks for actor programming that is quite widely adopted in an industrial setting: AKKA.

AKKA is an actor-based programming paradigm where entities interact by asynchronous calls to actors. This programming model is extremely efficient in distributed settings, non-blocking and event-driven. Its purpose is to enable the implementation of lightweight actors that can be instantiated in large numbers. AKKA is gaining meaningful importance and consequent adoption in the industry, also because of its original and easy-to-use support for fault-tolerance realized by a hierarchy of supervisors that deal with the crashes of the other actors.

The goal of this internship is to facilitate the systematic design of complex and heterogeneous systems that are both highly safe and correct. More precisely, the internship will have the following objectives:

- To study the approach used in the Vercors platform developed in the Scale team and the GCM/ProActive programming model

- To study the Akka programming model and identify the main differences between Akka and ProActive

- To provide solutions for the adaptation of the Vercors platform to the verification and generation of Akka programs.

The student will show the adequacy of his proposal by providing the first steps of either the theoretical foundations of the new platform or the implementation of some crucial elements of the new verifier and code generator.

This subject could be a good preliminary study before a PhD subject if successful.

ElectroSmart, exploring the exposition to microwaves.

Name: Arnaud Legout

Mail: arnaud.legout@inria.fr

Telephone: 04 92 38 78 15

Web page: http://www-sop.inria.fr/members/Arnaud.Legout/

Place of the project: Inria

Address: 2004 route des Lucioles, Sophia Antipolis

Team: DIANA

Web page: https://team.inria.fr/diana/

Pre-requisites if needed: Android programming

Description:

The goal of the ElectroSmart project is to develop the instrument,

methods, and models to compute the exposition of the general public to

microwave electromagnetic fields used by wireless protocols and

infrastructures such as Wi-Fi, Bluetooth, or cellular. Indeed, the

Internet and new devices such as smartphones have fundamentally

changed the way people communicate, but this technological revolution

comes at the price of a higher exposition of the general population to

microwave electromagnetic fields (EMF). This exposition is a concern

for a broad spectrum of actors such as health agencies and

epidemiologists who want to understand the impact of such an

exposition on health, or cellular operators and regulation authorities

who want to improve the cellular coverage while limiting the

exposition.

The student will have to work on the improvement of the

Android application on which the ElectroSmart project

is based (see http://es.inria.fr).

The student must be fluent Java and has some experience in

Android programming .

If the student is highly motivated and has the required

competencies, we offer the possibility to continue with a

Ph.D. thesis.

Optimal design and management of green fixed-mobile convergent networks

Name: Ramon APARICIO-PARDO

Mail: raparicio@i3s.unice.fr

Telephone: -

Web page: http://www.i3s.unice.fr/~raparicio

Place of the project: I3S laboratory

Address: Les Algorithmes - Euclide B

2000, route des Lucioles

BP 121

06903 Sophia-Antipolis Cedex

France

Team: Signet

Web page: http://signet.i3s.unice.fr/

Pre-requisites:

Basic notions of mathematical optimisation (particularly, integer linear programming), proficiency in at least one programming language and a background in telecommunication networks would be desirable.

Detailed description:

In the last years, users access more and more to multimedia contents via mobile devices yielding to remarkable traffic growth in cellular access networks. Of course, this explosion of cellular access traffic is accompanied by a growth in the power consumption motivating the need of improving the energy-efficiency at the cellular access.

One promising solution is the Fixed-Mobile Convergence (FMC) paradigm. The idea behind FMC is to

manage jointly the heterogeneous access technologies (e.g., optical, WiFi, 4G) in a unified fixed-mobile convergent network. That would significantly reduce costs and power consumption at access and aggregation networks by allowing a mutualized usage of their physical resources. For this reason, the Network Operators (NOs) and Internet Service Providers (ISPs) are particularly interested in this paradigm.

One methodological approach to have a comprehensive view of a network system is to model network network design and management as centralized optimization problems. In the FMC paradigm, this approach could be particularly beneficial to exploit all the synergies and interplays between mobile and fixed parts of the access networks, yielding to more energy-efficient schemas. For example, a resource allocation problem in a optical/5G-mobile convergent network could be modelled as an optimisation problem where we aim to minimize the overall power consumption of the devices by allocating devices in such way that they share cooling equipment.

This internship is research oriented. Its goal is to explore novel energy-efficiency schemes in the FMC paradigm by first identifying some promising proposals from the literature; secondly, modelling them as optimisation problems; third, addressing them with a suitable optimisation technique; and finally, comparing the optimal solutions obtained in each scheme. A short research paper (e.g a conference publication) would be a desirable outcome of the study.

References:

[1] M. Feknous, B. Le Guyader, P. Varga, A. Gravey, S. Gosselin, et al.. Multi- Criteria Comparison Between Legacy and Next Generation Point of Presence Broadband Network Architectures. Advances in computer science : an international journal (ACSIJ), 2015, 4 (3), pp.126 - 140

[2] P. Chanclou, A Pizzinat, F. Le Clech and al, “Optical Fiber Solution for Mobile Fronthaul to Achieve Cloud Radio Access”, In Future Network and Mobile Summit, July 2013, pp, 1–11.

[3] Gupta, A.; Jha, R.K., "A Survey of 5G Network: Architecture and Emerging Technologies," in Access, IEEE , vol.3, no., pp.1206-1232, 2015.

[4] Carapellese, N.; Tornatore, M.; Pattavina, A., "Energy-Efficient Baseband Unit Placement in a Fixed/Mobile Converged WDM Aggregation Network," in Selected Areas in Communications, IEEE Journal on , vol.32, no.8, pp.1542-1551, Aug. 2014

[5] Xinbo Wang; Thota, S.; Tornatore, M.; Sang-Soo Lee; Han-Hyub Lee; Soomyung Park; Mukherjee, B., "Green Virtual Base Station in optical-access- enabled Cloud-RAN," in Communications (ICC), 2015 IEEE International Conference on , vol., no., pp.5002-5006, 8-12 June 2015.

[6] Buttaboni, A.; De Andrade, M.; Tornatore, M.; Pattavina, A., “Virtual PON assignment for fixed-mobile convergent access-aggregation networks," in Optical Network Design and Modeling (ONDM), 2015 International Conference on , vol., no., pp.198-203, 11-14 May 2015, doi: 10.1109/ONDM.2015.7127298.

Evolution in software Product Line for Knowledge Flows

Supervisors:

Prof. Mireille Blay-‐Fornarino (I3S),

Prof. Frederic Precioso (I3S)