The Blockus project is creating new understanding and technologies to enable large-scale graph computations. Efforts include, understanding the behavior of graph algorithms and graph processing systems, scaling out and vertically (to use SSD's and other novel non-volatile memories), and exploring graph computations on with realistic application data usage restrictions.
- Understanding the behavior of graph computations and algorithms - how diverse as a function of algorithm and graph? how can we systematically explore the performance space?
- Vertical scalability using new storage-class memories -- can we achieve intelligent storage-hierarchy management to achieve dramatically better
cost-performance and programming flexibility for big data computations?
- Exploring realistic data sharing and privacy contraints in graph computations -- can we maintain good scalability and performance in the face of such constraints?
We are currently working with GraphX and Graphlab systems. Our early work was with researchers at HP Labs, who have created an extension of the R programming system, called Presto, that supports scale-out parallelism in an extended R language. In Presto, users express computation on (possibly sparse) matrix partitions, and the system takes care of the distribution of data and computation. The work at Chicago focuses on building on the Presto programming model and engine, enhancing it to scale vertically: that is creating a cost-effective, flexible and easy to program system that handles big data using secondary storage.
- Fan Yang and Andrew A. Chien, "Understanding Graph Computation Behavior to Enable Robust Benchmarking", in IEEE Conference on High Performance Distributed Computing Conference (HPDC), June 2015.
- Fan Yang, "Systematic Understanding Graph Computation Behavior to Enable Robust Benchmarking", Master's Thesis, Department of Computer Science, University of Chicago, April 2015.
- Fan Yang, T. Hu, C. Imes, R. Suminto, R. Tchoua, H. Zhang, Z. Zhou, and A. Chien. ”Horizontal and Vertical Scalability in Large-scale Graph Processing Systems”, Technical Report, June 2014
- Limitations of Data Reuse in Streaming Iterative Algorithms, Erik Bodzsar MS Thesis, May 2013.
- S. Venkataraman, E. Bodzsar, I. Roy, A. AuYoung, and R. Schreiber, Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices, Eurosys '13, April 2013; and some more HP Labs background.
People: Fan Yang, Andrew A. Chien (UChicago) Alumni: Erik Bodzsar, Indrajit Roy, Rob Schreiber, Partha Ranganathan (HP Labs)
We gratefully acknowledge support from Hewlett-Packard for the Blockus project.