Private‎ > ‎

Data-Intensive Readings

Sheng Li, Ke Chen, Ming-yu Hsieh, Naveen Muralimanohar, Chad D. Kersey, Jay B. Brockman, Arun F. Rodrigues, Norman P. Jouppi: System implications of memory reliability in exascale computing. SC 2011: 46

Cloudscale: Elastic Resource Scaling for Multi-tenant Cloud Systems, Z. Shen, S. Subbiah, X. Gu, and J. Wilkes.  ACM SOCC '11.

Orleans: Cloud Computing for Everyone, S. Bykov, A. Geller, G. Kliot, J. Larus, R. Pandya, and J. Thelin, ACM SOCC '11.

DOT: A Matrix Model for Analyzing, Optimizing and Deploying Software for Big Data Analytics in Distributed Systems,Yin Huai, Rubao Lee, Simon Zhang, Cathy Xia, and Xiaodong Zhang, ACM SOCC '11.

Sprint: Speculative Prefetching of Remote Data, Arun Raman, Greta Yorsh, Martin Vechev, Eran Yahav

Failure Trends in a Large Disk Drive Population, Eduardo Pinheiro, Wolf-Dietrich Weber, Luiz Andre Barroso


Franck Cappello, Al Geist, Bill Gropp, Laxmikant Kale, Bill Kramer, Marc Snir, Toward Exascale Resilience, Technical Report of the Illinois-INRIA Joint Laboratory on PetaScale Computing, TR-JLPC-09-01

Al Geist, Resilience: the Monster in the Closet, 2011 DOE Exascale Architecture I Workshop

Exascale Hardware Report, 2007.

Programming Models & Interfaces

MPI- Single-program, multiple data

Global Arrays

CnC (Concurrent Collections)

Michael Sullivan, Doe Hyun Yoon, and Mattan Erez. Containment Domains: A Full-System Approach to Computational Resiliency. Technical report TR-LPH-2011–001, LPH Group, Department of Electrical and Computer Engineering, The University of Texas at Austin, January, 2011.

Livermore – persistent variables

Jacob Lidman, Daniel J. Quinlan, Chunhua Liao, Sally A. McKee, "ROSE::FTTransform – A Source-to-Source Translation Framework for Exascale Fault-Tolerance Research", accepted by Fault-Tolerance for HPC at Extreme Scale (FTXS 2012), Boston, June 25-28, 2012.

Hoemmen and Heroux, “Fault Tolerant Methods via Selective Reliability”

Runtime & Checkpointing

Sean Hogan, Jeff R. Hammond, Andrew Chien, An Evaluation of Difference and Threshold Techniques for Efficient Checkpoints,
accepted by Fault-Tolerance for HPC at Extreme Scale (FTXS 2012), Boston, June 25-28, 2012.

Andrew Chien, Translational Architecture Opportunities for Exascale Systems, Exascale Research Conference April 16–18, 2012, Portland, Oregon

Leonardo Bautista-Gomez, Seiji Tsuboi, Dimitri Komatitsch, Franck Cappello, Naoya Maruyama, and Satoshi Matsuoka. 2011. FTI: high performance fault tolerance interface for hybrid systems. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11). ACM, New York, NY, USA, , Article 32 , 32 pages. DOI=10.1145/2063384.2063427

Keeping checkpoint/restart viable for exascale systems, SANDIA REPORT

Diego Ongaro, Stephen M. Rumble, Ryan Stutsman, John Ousterhout, and Mendel Rosenblum. 2011. Fast crash recovery in RAMCloud. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (SOSP '11). ACM, New York, NY, USA, 29-41. DOI=10.1145/2043556.2043560

Ikhwan Lee, Mehmet Basoglu, Michael Sullivan, Doe Hyun Yoon, Larry Kaplan, and Mattan Erez. Survey of Error and Fault Detection Mechanisms. Technical report TR-LPH-2011–002, LPH Group, Department of Electrical and Computer Engineering, The University of Texas at Austin, April, 2011

Andr ́ DeHon, Nick Carter and Heather Quinn, Final Report for CCC Cross-Layer Reliability Visioning Study, March 3, 2011

A. T. Moody, G. Bronevetsky, K. M. Mohror, B. R. de Supinski, Detailed Modeling, Design, and Evaluation of a Scalable Multi-level Checkpointing System, Lawrence Livermore National Laboratory Technical Report, LLNL-TR-440491, July 2010

Compressed differences: An algorithm for fast incremental checkpointing
, JS Plank, J Xu, R Netzer - Rapport technique, University of Tennessee, 1995

Benchmarks, workloads

Shah, Mehul; Ranganathan, Parthasarathy; Chang, Jichuan; Tolia, Niraj; Roberts, David; Mudge, Trevor, 
Data Dwarfs: Motivating a Coverage Set for Future Large Data Center Workloads, 
HP Laboratories, HPL-2010-115.

Ramanathan Narayanan, Berkin ÖzisikyilmazJoseph ZambrenoGokhan MemikAlok N. Choudhary: MineBench: A Benchmark Suite for Data Mining Workloads. IISWC 2006: 182-188

Performance Scalability of Data-Mining Workloads in Bioinformatics, Y. Chen, Q. Diao, C. Dulong, C. Lai, W. Hu, E. Li, W. Li, T. Wang, Y. Zhang

R-MAT: A Recursive Model for Graph Mining, Deepayan Chakrabarti, Yiping Zhan, Christos Faloutsos

BioPerf: A Benchmark Suite to Evaluate High-Performance Computer Architecture on Bioinformatics Applications, David A. Bader, Yue Li, Tao Li, Vipin Sachdeva


Molly A. O'Neil, Martin Burtscher: Floating-point data compression at 75 Gb/s on a GPU. GPGPU 2011: 7

Martin Burtscher, Paruj Ratanaworabhan: FPC: A High-Speed Compressor for Double-Precision Floating-Point Data. IEEE Trans. Computers 58(1): 18-31 (2009)
Eric R. Schendel, Saurabh V. Pendse, John Jenkins, David A. Boyuka, II, Zhenhuan Gong, Sriram Lakshminarasimhan, Qing Liu, Hemanth Kolla, Jackie Chen, Scott Klasky, Robert B. Ross, Nagiza F. Samatova: ISOBAR hybrid compression-I/O interleaving for large-scale parallel I/O optimization. HPDC 2012: 61-72

E. Schendel, Y. Jin, N. Shah, J. Chen, C.S. Chang, S.-H. Ku, S. Ethier, S. Klasky, R. Latham, R. Ross, N. Samatova, "ISOBAR Preconditioner for Effective and High-throughput Lossless Data Compression", The 28th annual IEEE International Conference on Data Engineering (ICDE) 2012.

Sriram Lakshminarasimhan, Neil Shah, Stéphane Ethier, Scott Klasky, Robert Latham, Robert B. Ross, Nagiza F. Samatova: Compressing the Incompressible with ISABELA: In-situ Reduction of Spatio-temporal Data. Euro-Par (1) 2011: 366-379


BOOM Analytics: Exploring Data-Centric Declarative Programming for the Cloud  Alvaro, Condie, Conway, Elmeleegy, Hellerstein, and Sears, Eurosys 2010.

BOOM: Data-centric Programming in the Data Center  Alvaro, Condie, Conway, Elmeleegy, Hellerstein, and Sears.  Berkeley TR 2009-98.

Dedalus: Datalog in Time and Space  Alvaro, Marczak, Conway, Hellerstein, Maier, and Sears.  Berkeley TR 2009-173.

What you Always Wanted to Know about Datalog (And Never Dared Ask), Ceri, Gottlob, and Tanca, IEEE Transactions on Knowledge and Data Engineering, Vol 1, No 1, March 1989.

Datalog, Wikipedia Entry

MapReduce basics, extensions, modifications

MapReduce: Simplified Data Processing on Large Clusters by J. Dean, and S. Ghemawat. Proceedings of the Sixth Symposium on Operating System Design and 
Implementation (OSDI), 2004.

Hadoop, OReilly, 2nd Edition

MapReduce Online , Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy, Russell Sears, NSDI 2010.

Incoop: MapReduce for Incremental Computations, P. Bhatotia, A. Wieder, R. Rodrigues, U. Acar, and R. Pasquini, ACM SOCC '11.

PrIter: A Distributed Framework for Prioritized Iterative Computations.  Yanfeng Zhang, Qixin Gao, Lixin Gao, Cuirong Wang, ACM SOCC '11.

Yandong Wang, Xinyu Que, Weikuan Yu, Dror Goldenberg, and Dhiraj Sehgal. 2011. Hadoop acceleration through network levitated merge. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11). ACM, New York, NY, USA

Out-of-core Algorithms

A Computational Study of External-Memory BFS Algorithms, Deepak Ajwani, Roman Dementiev, Ulrich Meyer

I/O-Efficient Techniques for Computing Pagerank, Yen-Yu Chen, Qingqing Gan, Torsten Suel, CIKM '02 Proceedings of the eleventh international conference on Information and knowledge management

A Survey of Out-of-Core Algorithms in Numerical Linear Algebra, Sivan Toledo

Out-of-Core SVD and QR Decompositions, Eran Rabani, Sivan Toledo

A Parallel Block Lanczos Algorithm and its Implementation for the Evaluation of Some Eigenvalues of Large Sparse Symmetric Matrices on Multicomputers, Mario Guarracino, Francesca Perla, Paolo Zanetti


Christopher J. Rossbach, Jon CurreyMark SilbersteinBaishakhi RayEmmett Witchel: PTask: operating system abstractions to manage GPUs as compute devices. SOSP 2011: 233-248

Key-value stores
``SILT: A Memory-Efficient, High-Performance Key-Value Store'' by Hyeontaek Lim, Bin Fan, David G. Andersen, and Michael Kaminsky. In Proc. 23rd ACM Symposium on Operating Systems Principles (SOSP), (Cascais, Portugal), Oct. 2011.

Fast Crash Recovery in RAMCloud in SOSP 2011

Bigtable: A Distributed Storage System for Structured Data, Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber

Distributed frameworks (non-MapReduce)
Scaling the Mobile Millenium System in the Cloud, Hunter, et. al.  ACM SOCC '11.

Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica.Technical Report UCB/EECS-2011-82. July 2011.

Spark: Cluster Computing with Working Sets. Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, Ion Stoica. USENIX HotCloud 2010. June 2010.

GraphLab: A Distributed Framework for Machine Learning in the Cloud, Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin

Pregel: a system for large-scale graph processing, G. Malewicz

CIEL: a universal execution engine for distributed data-flow computing, Derek G. Murray, Malte Schwarzkopf, Christopher Smowton, Steven Smith, Anil Madhavapeddy, Steven Hand

Distributed and Parallel Filesystem Papers and Links

Wittawat Tantisiroj, Swapnil Patil, Garth Gibson, Seung Son, Sam Lang, Rob Ross"On the Duality of Data-intensive File System Design: Reconciling HDFS and PVFS" In Proceedings of the ACM Supercomputing Conference (SC 2011).

The Hadoop Distributed File System, Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler

The Google File System, Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Mainstream Tech Media Articles (general background):

Software Links

Hajime Fujita,
Jun 15, 2012, 8:28 AM