Blockus: Big Data Computation on Small Machines

Today’s big data computing world is built on map-reduce, processor-memory-disk slices, and scales reliably to petabyte data sizes.  Its two major weaknesses are cost and programming inflexibility.  In Blockus, we are exploring the use of new storage-class memories with intelligent storage-hierarchy management to achieve dramatically better cost-performance and programming flexibility for big data computations.

We are working with researchers at HP Labs, who have created an extension of the R programming system, called Presto, that supports scale-out parallelism in an extended R language. In Presto, users express computation on (possibly sparse) matrix partitions, and the system takes care of the distribution of data and computation. Our work at Chicago focuses on building on the Presto programming model and engine, enhancing it to scale vertically: that is creating a cost-effective, flexible and easy to program system that handles big data using secondary storage.


People: Erik Bodzsar, Andrew A. Chien (UChicago), Indrajit Roy, Rob Schreiber, Partha Ranganathan (HP Labs)

We gratefully acknowledge support from Hewlett-Packard for the Blockus project.

The LSSG is part of the Systems Group in the University of Chicago's   Department of Computer Science, and also affiliated with Chicago's Computation Institute, and Argonne National Laboratory's Mathematics and Computer Science Division.
Andrew A. Chien,
Aug 21, 2013, 1:48 PM