CALAS: Contention And Locality Aware Scheduling for Multi-chiplet GPUs
Tanmay Anand, Akanksha Chaudhari, Grayson Elias, Ethan Kowalski, Brandon Tran
Optimizing gRPC-Based In-Memory KVStores: A Global Load Balancer Approach
Aditya Das Sarma, Keren Chen, Yurun Yuan, Omid Rostamabadi, Chengpo Yan
CXL for LLM Workloads: Analytical Model and Practice
Alex Smith, Farheen Asif, Marco Kurzynski, Wentao Hou
Approximating NoCs for Massively Parallel Error-Resilient Machine Learning Applications
Shilpa Mysore Srinivasa Murthy, Alish Kanani, Mingcong Cao, Deepak Vasudevan
Temporal Coding for ML-Focused Heterogeneous Computing Systems
Harshal Pandit, Brian Mhatre, Adithya Pillai Ramesh, Rohith Mysore Kariyappa, Sam Katerov, Zhengyuan Zhang
Fine-Grained Coherence Protocol Enhancement for Directory-Based Cache Coherence
Rajesh Srivatsav Suresh, Kiranmay Sekar, Pritheshwar Thirugnanasambantham, Harshitha Naravaram, Sri Harsha Bandaru
DTL-TP: Delegation Ticket Lock for Task-Based Parallelization
Ashish Sunkara, Harish Anand Vijayakumar, Shreya Malkurthi, Harshil Oza, Aditya Kunchur
CPCoh++: Efficient Memory Coherence for Multi-Chiplet Module GPUs
Aatman Borda, Daniel Mu, Manu Maheshwari, Neeraj Surawar, Shashwatha Mitra G B, Rishi Velicherla
Just-In-Time Memory Access for Streaming Architectures
Jerry Xu, Yu Xia