13:30 - 13:40 Welcome Message
13:40 - 14:30 Keynote: Automating Performance Optimization of Data Flow Within HPC Workflows
— Nathan R. Tallent (Pacific Northwest National Laboratory)
14:30 - 15:00 Paper Talk: EmuCSD: A Scalable Framework For Emulating Computational Storage Devices
— Saleh AlSaleh, Wahid Uz Zaman, Mahmut Taylan Kandemir (The Pennsylvania State University)
15:00 - 15:30 Coffee Break
15:30 - 16:10 Invited Talk: Toward Data-Centric Computing: Data-Movement Analysis, Concurrency-Aware Optimization, and Memory-Centric Architecture
— Xiaoyang Lu (Illinois Institute of Technology)
16:10 - 16:40 Paper Talk: NZFS: A Null File System with Zero-Copy I/O Support for Application Benchmarking
— Shingo Hattori, Osamu Tatebe (University of Tsukuba)
16:40 - 17:10 Paper Talk: Chasing the Rabbit: A Systematic Exploration of Software-Defined HPC Storage
— Hariharan Devarajan (LLNL), Brian Behlendorf (LLNL), Blake Devcich (HPE), Dean Roehrich (HPE),
17:10 - 17:20 Closing Remarks
Scientific workflows that require HPC resources are critical in many areas of scientific exploration. Because these workflows tend to be data intensive, severe bottlenecks emerge in storage systems and I/O networks. Although there has been much prior work on coordination of workflows, scheduling algorithms, and HPC storage systems, there are no comprehensive workflow performance diagnosis suites that can automatically identify and alleviate dataflow bottlenecks.
This talk will present DataFlowDrs, a new comprehensive suite of tools for performance optimization of HPC workflows that especially focuses on data flow and storage. Our suite introduces (a) lightweight high-resolution measurement and visualization tools for workflow profiling and tracing; (b) rapid modeling and analysis that reduces analysis data by compressing common repeated coordination patterns; (c) novel methods for predicting data flow scaling using automatically generated interpretable models of data flow; (d) effective performance analysis and bottleneck detection that can automatically quantify and rank bottlenecks for different combinations of task parallelism and storage resources; (e) actionable performance optimization in the form of new schedules and resource assignments.
Nathan Tallent is a chief computer scientist in the Future Computing Technologies Group within Advanced Computing, Mathematics, and Data Division at Pacific Northwest National Laboratory. He joined PNNL in 2011.
Dr. Tallent is an internationally recognized expert in extreme performance. He understands all levels of performance, from massively scalable computing to chip pipelines; all system components, from interconnects, storage, memory, and processors; and workloads ranging from AI/ML, data analytics/graphs, and HPC. His research is motivated by emerging challenges in distributed systems, scientific workflows, machine learning, and data management. He leads activities in continuum computing and the Performance Lab for EXtreme Computing and daTa where his contributions have spanned the challenges of performance measurement, modeling, bottleneck diagnosis, and optimization; and includes special attention to bottlenecks in networks, storage, and memory. He has made notable contributions to performance tools, both for performance modeling and for parallel performance analysis. He has more than 80 peer-reviewed publications, serves on several reviewing committees, and received a DOE Early Career award. He is one of the original developers of HPCToolkit, a widely used suite of performance tools on supercomputers. He received a Ph.D. in 2010 from Rice University.
Modern AI, scientific computing, and data-intensive applications are increasingly constrained by the cost of moving data across deep and heterogeneous memory hierarchies. As a result, system performance depends not only on computational capability, but also on where data resides, how efficiently it moves, and whether data access latency can be reduced, overlapped, or hidden. This talk presents a data-centric computing perspective for addressing these challenges, showing how analytical modeling of data movement, concurrency-aware memory optimization, overlap between data movement and computation, and memory-centric architecture design can work together to improve performance for memory-bound AI and HPC workloads. The broader goal is to motivate future systems in which architecture, runtime, compiler, and emerging memory/storage technologies are co-designed around data movement as a first-class optimization target.
Xiaoyang Lu is a Research Assistant Professor in the Department of Computer Science at Illinois Institute of Technology. His research interests include memory performance modeling, memory-centric computer architecture, data movement optimization, cache and prefetching systems, processing-in-memory architectures, and hardware/software co-design for AI and data-intensive workloads.