Sun, Oct 18th


Mon, Oct 19th



DFM'15 - Full Day - Empire
SocHPC - Full Day - Cypress
SWStack - Full Day - Tudor B
DEHPC - Full Day - Walnut
AlgSpec - Afternoon - Tudor A
rCUDA - Afternoon - Tudor C

(Empire) 08:00-08:30 Welcome and Opening Remarks Marc Snir, Costin Iancu and Kathy Yelick
Session 1A
08:30-09:00 Phase Aware Warp Scheduling: Mitigating the Effects of Phase Behavior in GPGPU Applications Mihir Awatramani, Xian Zhu, Diane Rover and Joseph Zambreno. 
09:00-09:30 NVMMU: A Non-Volatile Memory Management Unit for Heterogeneous GPU-SSD Architectures Jie Zhang, David Donofrio, John Shalf, Mahmut Kandemir and Myoungsoo Jung. 
09:30-10:00 Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance Rachata Ausavarungnirun, Onur Kayiran, Saugata Ghose, Gabriel Loh, Chita Das, Mahmut Kandemir and Onur Mutlu. 
Session 1B

08:30-09:00 Scalable SIMD-Efficient Graph Processing on GPUs Farzad Khorasani, Rajiv Gupta and Laxmi N. Bhuyan. 
09:00-09:30 Parallel Methods for Verifying the Consistency of Weakly-Ordered Architectures Adam McLaughlin, Duane Merrill, Michael Garland and David A. Bader. 
09:30-10:00 Stadium Hashing: Scalable and Flexible Hashing on GPUs Farzad Khorasani, Mehmet E. Belviranli, Rajiv Gupta and Laxmi N. Bhuyan.

Coffee Break 10:00-10:30


Session 2A
10:30-11:00 TSXProf: Profiling Hardware Transactions Yujie Liu, Justin Gottschlich, Gilles Pokam and Michael Spear. 
11:00-11:30 ALEA: Fine-grain Energy Profiling with Basic Block Sampling Lev Mukhanov, Dimitrios Nikolopoulos and Bronis De Supinski. ALEA: Fine-grain Energy Profiling with Basic Block Sampling

Session 2B
10:30-11:00 Towards General-Purpose Neural Network Computing Schuyler Eldridge, Amos Waterland, Margo Seltzer, Jonathan Appavoo and Ajay Joshi. 
11:00-11:30 Practical Near-Data Processing for In-memory Analytics Frameworks Mingyu Gao, Grant Ayers and Christos Kozyrakis. 

Lunch 11:30-13:00


Session 3A
Language & Compilation
13:30-14:00 Scalable Task Scheduling and Synchronization Using Hierarchical Effects Stephen Heumann, Alexandros Tzannes and Vikram Adve. 
14:00-14:30 PENCIL: a Platform-Neutral Compute Intermediate Language for Accelerator Programming Riyadh Baghdadi, Ulysse Beaugnon, Albert Cohen, Tobias Grosser, Michael Kruse, Chandan Reddy, Sven Verdoolaege, Mohammed Javed Absar, Sven Van Haastregt, Alexey Kravets, Anton Lokhmotov, Róbert Dávid, Elnar Hajiyev, Adam Betts, Alastair Donaldson and Jeroen Ketema. 
14:30-15:00 Communication Avoiding Algorithms: Analysis and Code Generation for Parallel Systems Karthik Murthy and John Mellor-Crummey. 
Session 3B

13:30-14:00 Exploiting Program Semantics to Place Data in Hybrid Memory Wei Wei, Dejun Jiang, Sally A. McKee, Jin Xiong and Mingyu Chen.
14:00-14:30 Decoupled Direct Memory Access: Isolating CPU & IO Traffic by Leveraging a Dual-Port DRAM Donghyuk Lee, Lavanya Subramainan, Rachata Ausavarungnirun, Jongmoo Choi and Onur Mutlu.
14:30-15:00  Software-managed Approach to Die-Stacked DRAM Mark Oskin and Gabriel Loh.

Coffee Break 15:00-15:30


Session 4
Best papers 
15:30-16:00 An Algorithmic Approach to Communication Reduction in Parallel Graph Algorithms Harshvardhan, Adam Fidel, Nancy Amato and Lawrence Rauchwerger. 
16:00-16:30 Polyhedral Optimizations of Explicitly Parallel Programs Prasanth Chatarasi, Jun Shirako and Vivek Sarkar. 
16:30-17:00 TARDIS: Timestamp based Coherence Algorithm for Distributed Shared Memory Xiangyao Yu and Srini Devadas.
17:00-17:30 BSSync: Hardware Support for ML Workloads with Bounded Staleness Consistency Models Joo Hwan Lee, Jaewoong Sim and Hyesoon Kim. 

18:30 -  Conference Reception
Starlight Room

Tue , Oct 20th

Brain-inspired Computing Dharmendra S. Modha. 
Session 5

  I will describe a decade-long, multi-disciplinary, multi-institutional effort spanning neuroscience, supercomputing, and nanotechnology to build and demonstrate a brain-inspired computer and describe the architecture, programming model, and applications. I will also describe future efforts to build, literally, “brain-in-a-box”. For more information, see:

Bio: Dr. Dharmendra S. Modha is an IBM Fellow and IBM Chief Scientist for Brain-inspired Computing. He is a Cognitive Computing pioneer who envisioned and now leads a highly successful effort to develop Brain-inspired Computers. The project has received ~$58 million in research funding from DARPA (under SyNAPSE Program), US Department of Defense, and US Department of Energy. The ground-breaking project is multi-disciplinary, multi-institutional, and mult-national and has a world-wide scientific impact. The resulting architecture, technology, and ecosystem break path with the prevailing von Neumann architecture (circa 1946) and constitutes a foundation for energy-efficient, scalable neuromorphic systems. Dr. Modha's work has been featured in many thousands of media articles including The Economist, Science, New York Times, Wall Street Journal, The Washington Post, BBC, CNN, PBS, Discover, MIT Technology Review, Associated Press, Communications of the ACM, IEEE Spectrum, Forbes, Fortune, Time, amongst many others. Dr. Modha has significant contributions to IBM Businesses via innovations in caching algorithms for storage controllers, clustering algorithm for services, and coding theory for disk drives. Author of over 60 papers and inventor of over 100 patent disclosures, he has won ACM's Gordon Bell Prize; USENIX/FAST Test of Time Award; Best Paper Awards at ASYNC and IDEMI; First Place, Science/NSF International Science & Engineering Visualization Contest; IIT Bombay Distinguished Alumni Award; and is a Fellow of IEEE and World Technology Network. In 2013 and 2014, he was named the Best of IBM. On their 40th Anniversary, EE Times named him amongst 10 Electronic Visionaries to watch.

Coffee Break 09:30-10:00

Session 6A
10:00-10:30 Runtime Value Numbering: A Profiling Technique to Pinpoint Redundant Computations Shasha Wen, Xu Liu and Milind Chabbi. 
10:30-11:00 Tracking and Reducing Uncertainty in Dataflow Analysis-Based Dynamic Parallel Monitoring Michelle Goodstein, Phillip Gibbons, Michael Kozuch and Todd Mowry. 
11:00-11:30 Compiler Assisted Load Balancing on Large Clusters Vinit Deodhar, Hrushit Parikh, Ada Gavrilovska and Santosh Pande. 

Session 6B
10:00-10:30  RC3: Consistency directed cache coherence for x86-64 with RC extensions Marco Elver and Vijay Nagarajan.
10:30-11:00 Fine Grain Cache Partitioning using Per-Instruction Working Blocks Jason Jong Kyu Park, Yongjun Park and Scott Mahlke. 
11:00-11:30 An Efficient, Self-Contained, On-Chip, Directory: DIR1-SISD Mahdad Davari, Alberto Ros, Erik Hagersten and Stefanos Kaxiras. 

Lunch 11:30-13:00


Session 7A
Resilience & Compilation
13:30-14:00 Dealing with the Unknown: Resilience to Prediction Errors Subrata Mitra, Greg Bronevetsky, Suhas Javagal and Saurabh Bagchi. 
14:00-14:30 Exploiting Staleness for Approximating Loads on CMPs Prasanna Venkatesh Rengasamy, Anand Sivasubramaniam, Mahmut Kandemir and Chita Das. 

  Orchestrating Multiple Data- Parallel Kernels on Multiple Devices Janghaeng Lee, Mehrzad Samadi and Scott Mahlke. 
Session 7B
13:30-14:00 A-REP: Adaptive Resource Efficient Prefetching for Maximizing Multicore Performance Muneeb Khan, Michael Laurenzano, Jason Mars, Erik Hagersten and David Black-Schaffer. 
14:00-14:30 Runtime-Guided Management of Scratchpad Memories in Multicore Architectures Lluc Alvarez, Miquel Moreto, Marc Casas, Emilio Castillo, Xavier Martorell, Jesus Labarta, Eduard Ayguade and Mateo Valero. 
14:30-15:00 OSPREY: Implementation of Memory Consistency Models for Cache Coherence Protocols involving Invalidation-Free Data Access George Kurian, Qingchuan Shi, Srini Devadas and Omer Khan. 
 Poster Session
 16:00-18:00 ACM SRC and Conference Posters  
18:00- Conference Gala

Wed  Oct 21st

Cosmology and Computers: HACCing the Universe Salman Habib
Session 8

: Deep and wide surveys of the sky have led to a remarkable set of discoveries in cosmology. As the survey volumes become so large that statistical uncertainties almost disappear, cosmological modeling must reach unprecedented levels of scale and accuracy to properly interpret observational results. I will describe the key scientific problems and issues involved and then present the HACC (Hardware/Hybrid Accelerated Cosmology Code) framework, designed around a portable particle-based simulation model for the required, very high dynamic range applications. I will briefly cover the key features of HACC and plans for its future development, focusing on computational, algorithmic, and physics advances, in-situ analysis, and resilience features, while emphasizing the associated computer science needs.

: Salman Habib is a Senior Physicist and Computational Scientist with a joint appointment in Argonne National Laboratory's High Energy Physics and Mathematics and Computer Science Divisions. He is a Senior Member of the Kavli Institute for Cosmological Physics at the University of Chicago and a Senior Fellow in the Argonne/UChicago Computation Institute. Habib's research interests have spanned a broad range of topics and he has been active in the application of large-scale parallel computing as a powerful scientific resource in several fields. His research in precision cosmology has focused on understanding the dynamics of structure formation in the Universe in order to investigate properties of dark energy and dark matter, measure neutrino masses, and study primordial density fluctuations.

Coffee Break 09:30-09:40


 Coffee Break 11:00-11:10  


Session 9A
11:10-11:40 Vector Parallelism in JavaScript: Language and compiler support for SIMD Ivan Jibaja, Peter Jensen, John McCutchan, Dan Gohman, Ningxin Hu, Mohammad Haghighat, Steve Blackburn and Kathryn Mckinley. 
11:40-12:10 Compiling and Optimizing Java 8 Programs for GPU execution Kazuaki Ishizaki, Akihiro Hayashi, Gita Koblents and Vivek Sarkar. 
12:10-12:40 Throttling Automatic Vectorization: When Less Is More Vasileios Porpodas and Timothy Jones. 

 Session 9B
11:10-11:40 Evaluating the Cost of Atomic Operations on Modern Architectures Hermann Schweizer, Maciej Besta and Torsten Hoefler. 
11:40-12:10 MeToo: Stochastic Modeling of Memory Traffic Timing Behavior Yipeng Wang and Yan Solihin. 
12:10-12:40 Using Compiler Techniques to Improve Automatic Performance Modeling Arnamoy Bhattacharyya, Grzegorz Kwasniewski and Torsten Hoefler.