Home

Algorithms and Systems for MapReduce and Beyond (BeyondMR) is a workshop for research at the frontier of large-scale computations, with topics ranging from algorithms, to computational models, to the systems themselves. BeyondMR will be held in conjunction with SIGMOD/PODS2018, in Houston, TX, USA on Friday Jun 15, 2018.

The BeyondMR workshop aims to explore algorithms, computational models, architectures, languages and interfaces for systems that need large-scale parallelization and systems designed to support efficient parallelization and fault tolerance. These include specialized programming and data-management systems based on MapReduce and extensions, graph processing systems, data-intensive workflow and dataflow systems.

We invite submission on topics such as:

  • Cost models: Formal models that evaluate the efficiency of algorithms in large-scale parallel systems taking into account the different architectural properties and parameters of such systems.

  • Task Scheduling, Load-Balancing and Fault Tolerance: Methods and algorithms that avoid data and computational skew in large-scale parallel systems. Design of scheduling algorithms for balanced task distribution. Techniques for supporting fault-tolerance.

  • Algorithms and Applications: Algorithmic design for specific data processing tasks in large-scale parallel systems. These include query processing and graph processing tasks, iterative and recursive computational tasks, machine learning and general data analytics. Applications built using large-scale parallel systems.

  • New Parallel Architectures: Novel large-scale parallel architectures and systems that support various types of data processing tasks, such as graph processing, log processing, data analytics and machine learning. Extensions of current systems to provide additional functionality, improve performance, and support processing of more complex tasks.

Keynotes:
  • Rajat Monga, Google Inc., TensorFlow and systems for machine learning

    Abstract: 
    The rapid growth in machine learning over the last few years has led to a number of significant changes in the systems are used for training and deploying machine learning models. This is driven by changes in the algorithms, the growing need for compute, and speed ups with hardware accelerators like GPUs and TPUs. This talk will discuss some of the challenges addressed by current systems like TensorFlow, and provide some thoughts on where these systems are headed.

    Short bio: Rajat Monga leads TensorFlow at the Google Brain team, powering machine learning research and products worldwide. As a founding member of the team he has been involved in co-designing and co-implementing DistBelief and more recently TensorFlow, an open source machine learning system. Prior to this role, he led teams in AdWords, built out the engineering teams and co-designed web scale crawling and content-matching infrastructure at Attributor, co-implemented and scaled eBay’s search engine and designed and implemented complex server systems across a number of startups. Rajat received a B.Tech. in Electrical Engineering from Indian Institute of Technology, Delhi.

  • Kym Hines, Google Inc., The evolution of abstraction in large scale parallel computation

    Abstract: 
    MapReduce was revolutionary; not because it provided a general model of distributed computing, but because it provided a useful abstraction for a common distributed computing use case, and allowed the necessary control knobs to be expressed at the same level of abstraction. Finding the right abstractions can significantly improve the productivity of BigData analysts and bring large-scale distributed computing to a wider audience. On the other hand, new layers of abstraction impose new responsibilities on tool providers. This talk will discuss advances in distributed computing abstraction, the responsibilities these advances impose on tool providers, and approaches to addressing these new responsibilities.

    Short bio: Kym Hines is a Staff Engineer, and the tech lead/manager of the Flume project at Google.  Flume is Google’s answer to general purpose, large scale, parallel computation, and it has been made available outside of Google as Dataflow/Apache Beam.  She received her Ph.D. in Computer Science from the University of Washington, advised by Gaetano Borriello.

The prior occurrences of BeyondMR were held in 2014 and 2015 together with EDBT/ICDT and in 2016 and 2017 together with SIGMOD/PODS.