Shuai Che

I work for Microsoft and help improve the Brainwave FPGA platform for accelerating DNN applications. I was previously employed by AMD Research and involved in the U.S. Department of Energy’s Fastforward and Pathforward Exascale computing projects. I also had work experience in Alibaba's machine learning group, performing system research and development. I was the lead developer of the Rodinia benchmark suite for heterogeneous computing. Rodinia has been included in the SPEC ACCEL V1.0 and SPECwpc V1.0 as standard accelerator benchmarks. I graduated from the University of Virginia in August 2012 with a Ph.D. in Computer Engineering.

Something I did for fun in spare time

Research Interests

Computer architecture, parallel computing and GPGPU, machine learning and graph processing, memory systems

Selected Papers and Reports (Google Scholar/DBLP)

S. Che and J. Yin. Northup: Divide-and-Conquer Programming for Systems with Heterogeneous Memories and Processors. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), May 2019.

● H. Yin, G. Chen, Y. Li, S. Che, W. Zhang and N. K. Jha. Hardware-Guided Symbolic Training for Compact, Accurate, yet Execution-Efficient LSTMs. https://arxiv.org/abs/1901.10997 .

● Y. Yu, Y. Li, S. Che, W. Zhang and N. K. Jha. Software-defined Design Space Exploration for Efficient AI Accelerator Architecture. https://arxiv.org/abs/1903.07676 .

● J. Yin, Y. Eckert, S. Che, M. Oskin, G.. Loh. Toward More Efficient NoC Arbitration: A Deep Reinforcement Learning Approach. In the International Workshop on AI-assisted Design for Architecture in conjunction of with ISCA, June 2018.

S. Che, B. M. Beckmann, and S. K. Reinhardt. Programming GPGPU Graph Applications with Linear Algebra Building Blocks. To appear: International Journal of Parallel Programming (IJPP), 2017.

● M. Orr, S. Che, B. Beckmann, M. Oskin, S. K. Reinhardt, and D. Wood. Gravel: Efficient Fine-grain GPU-initiated Network Messaging. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Nov. 2017.

S. Che, M. Orr, and J. Gallmeier. Work Stealing in a Shared Virtual Memory Heterogeneous Environment. In Proceedings of the ACM International Conference on Computing Frontiers (CF), May 2017.

● K. Hou, W. Feng, and S. Che. Auto-Tuning Strategies for Parallelizing Sparse Matrix-Vector (SpMV) Multiplication on Multi- and Many-Core Processors. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW), May 2017.

● N. Malaya, S. Che, J. Greathouse, R. Oostrum, and M. Schulte. Accelerating Matrix Processing with GPUs. In Proceedings of the IEEE Symposium on Computer Arithmetic (ARITH), invited paper, July 2017.

S. Che, A. Basu, and J. Gallmeier. Challenges of Programming a System with Heterogeneous Memories and Heterogeneous Processors. In Proceeding of the International Symposium on Memory Systems, Oct 2016.

● A. Basu, S. Puthoor, S. Che, and B. Beckmann. Software Assisted Hardware Cache Coherence for Heterogeneous Architectures. In Proceeding of the International Symposium on Memory Systems, Oct 2016.

S. Che, M. Orr, G. Rodgers, and J. Gallmeier. Betweenness Centrality in an HSA-enabled System. In the 1st High Performance Graph Processing workshop (HPGP), May 2016.

● S. Puthoor, A. Aji, S. Che, M. Daga, W. Wu, B. M. Beckmann, and G. Rodgers. Implementing Directed Acyclic Graphs with the Heterogeneous System Architecture. In the 9th Workshop on General Purpose Processing on Graphics Processing Units, Mar 2016.

S. Che, G. Rodgers, B. M. Beckmann, and S. K. Reinhardt. Graph Coloring on the GPU and Some Techniques to Improve Load Imbalance. In Proceedings of IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW), May 2015.

● M. S. Orr, S. Che, A. Yilmazer, B. M. Beckmann, M. D. Hill, and D. A. Wood. Synchronization Using Remote-Scope Promotion. In Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Mar 2015. (pdf)

● G. Juckeland, W. Brantley, S. Chandrasekaran, B. Chapman, S. Che, M. Colgrove, H. Feng, A. Grund, R. Henschel, W-M. Hwu, H. Li, M. S. Muller, M. Perminov, P. Shelepugin, K. Skadron, J. Stratton, A. Titov, K. Wang, M. Waveren, B. Whitney, S. Wienke, R. Xu, and K. Kumaran. SPEC ACCEL - A Standard Application Suite for Measuring Hardware Accelerator Performance. In Proceedings of 5th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), Nov 2014. (pdf)

S. Che, B. M. Beckmann, and S. K. Reinhardt. BelRed: Constructing GPGPU Graph Applications with Software Building Blocks. In Proceedings of IEEE High Performance Extreme Computing Conference (HPEC), Sept 2014. (pdf)

S. Che. GasCL: A Vertex-Centric Graph Model for GPUs. In Proceedings of the IEEE High Performance Extreme Computing Conference, Sept 2014. (pdf)

S. Che, J. Meng and K. Skadron. Dymaxion++: a Directive-Based API to Optimize Data Layout and Memory Mapping for Heterogeneous Systems. The 4th International Workshop on Accelerators and Hybrid Exascale Systems, May 2014.(pdf)

● B. A. Hechtman, S. Che, D. R. Hower, Y. Tian, B. M. Beckmann, M. D. Hill, S. K. Reinhardt, and D. A. Wood. QuickRelease: A Throughput-oriented Approach to Release Consistency on GPUs. In Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture (HPCA), Feb 2014. (pdf)

S. Che, B. M. Beckmann, S. K. Reinhardt and K. Skadron. Accelerating and Evaluating OpenCL Graph Applications. AMD Developer Summit (APU), Nov 2013. (pdf)

S. Che and K. Skadron. BenchFriend: Correlating the Performance of GPU Benchmarks. International Journal of High-Performance Computing Applications (IJHPCA), Oct 2013. (pdf)

S. Che, B. M. Beckmann, S. K. Reinhardt and K. Skadron. Pannotia: Understanding Irregular GPGPU Graph Applications. In Proceedings of 2013 IEEE International Symposium on Workload Characterization (IISWC), Sept 2013. (pdf)

● M. Boyer, K. Skadron, S. Che, and N. Jayasena. Load Balancing in a Changing World: Dealing with Heterogeneity and Performance Variability. In Proceedings of the 10th Conference on Computing Frontiers (CF), May 2013. (pdf)

● W. Heirman, T. E. Carlson, S. Che, K. Skadron, and L. Eeckhout. Using Cycle Stacks to Understand Scaling Bottlenecks in Multi-Threaded Workloads. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), Nov. 2011. (pdf)

S. Che, J. W. Sheaffer, and K. Skadron. Dymaxion: Optimizing Memory Access Patterns for Heterogeneous Systems. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (Supercomputing), Nov. 2011. (pdf)

S. Che, J. W. Sheaffer, M. Boyer, L. G. Szafaryn, L. Wang, and K. Skadron. A Characterization of the Rodinia Benchmark Suite with Comparison to Contemporary CMP Workloads. InProceedings of the IEEE International Symposium on Workload Characterization (IISWC), Dec. 2010. (pdf)

S. Che, M. Boyer, J. Meng, D. Tarjan, S. Lee, J. W. Sheaffer, and K. Skadron. Rodinia: A Benchmark Suite for Heterogeneous Computing. In Proceedings of IEEE International Symposium on Workload Characterization (IISWC), Oct 2009. (pdf)

S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, and K. Skadron. A Performance Study of General Purpose Applications on Graphics Processors using CUDA. Journal of Parallel and Distributed Computing (JPDC), 68(10):1370-1380, Jun 2008. (pdf)

S. Che, J. Li, J. W. Sheaffer, K. Skadron, and J. Lach. Accelerating Compute Intensive Applications with GPUs and FPGAs. In Proceedings of the IEEE Symposium on Application Specific Processors (SASP), Jun 2008. (pdf)

S. Che, J. Meng, J. W. Sheaffer, and K. Skadron. A Performance Study of General Purpose Applications on Graphics Processors. First Workshop on General Purpose Processing on Graphics Processing Units, Oct 2007. (pdf)

● J. Meng, S. Che, J. W. Sheaffer, J. Li, J. Huang and K. Skadron. Hierarchical Domain Partitioning For Hierarchical Architectures. Tech. Report CS-2008-08, Univ. of Virginia Dept. of Computer Science, Jun 2008. (pdf)

● J. Meng, S. R. T arapore, S. Che, J. Huang, J. W. Sheaffer, and K. Skadron. Programming with Relaxed Streams. Tech. Report CS-2007-17, Univ. of Virginia Dept. of Computer Science,Dec 2007. (pdf)

Professional Services

Program Committee member. The 1st International Workshop on the Intersection of High Performance Computing and Machine Learning, 2019

External Review Committee. IEEE International Symposium on High Performance Computer Architecture (HPCA), 2018

Board of Distinguished Reviewers (Program Committee member). ACM Trans. on Architecture and Code Optimization, 2018

Program Committee member. IEEE International Conference on Computer Design (ICCD), 2018

Program Committee member. IEEE International Conference on Cloud and Big Data Computing, 2018

Program Committee member. The 1st International Workshop on Large-Scale Deep Learning on Modern Heterogeneous Supercomputers in ICS 2018

Program Committee member. Workshop on Representative Applications (WRAp) in IEEE Cluster, 2018

Program Committee member. IEEE International Conference on Computer Design (ICCD), 2017

Program Committee member. IEEE International Conference on Cloud and Big Data Computing, 2017

Program Committee member. Board of Distinguished Reviewers, ACM Trans. on Architecture and Code Optimization, 2017

Program Committee member. Workshop on Representative Applications (WRAp) in IEEE Cluster, 2017

Program Committee member. IEEE International Conference on Computer Design (ICCD), 2016

Distinguished Reviewers Board (Program Committee member). ACM Transactions on Architecture and Code Optimization (TACO), 2016

Program Committee member. IEEE International Conference on Cloud and Big Data Computing, 2016

Program Committee member. IEEE International Conference on Computer Design (ICCD), 2015

Distinguished Reviewers Board (Program Committee member). ACM Transactions on Architecture and Code Optimization (TACO), 2015

Program Committee member. Workshop on Representative Applications (WRAp), 2015

External Review Committee. ACM/IEEE International Symposium on Computer Architecture (ISCA), 2015

External Review Committee. IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2015

Program Committee member. IEEE International Symposium on Workload Characterization (IISWC), 2014

Member of SPEC High Performance Group (HPG).

Review services for journals and conferences (e.g., ACM TECS, ACM TACO, IJHPCA, IJPP, J. of Supercomputing, TCAD, JPDC, IEEE CAL, ISCA, HPCA, ISLPED, EuroPar, ICPADS, AsHES ).

Last update: Sept 2019