Jongsoo Park

  • Technical Lead and Manager at Facebook AI Systems Co-design

  • Ph.D. from Department of Electrical Engineering, Stanford University

  • B.S. from Department of Electrical Engineering, Seoul National University

  • Previously, a research scientist at Intel Parallel Computing Lab, an Intern at VMware, and a software engineer at Penta Security Systems

  • Contact: JONGSOO "dot" park AT gmail "dot" com

Open Source Projects

  • SkimCaffe: sparse convolutional neural network

  • SpMP: SParse Matrix Pre-processing library. Fast sparse triangular solver, and matrix reorderings like BFS and reverse-Cuthill-Mckee

  • Sparso: Julia package to automate high-level optimizations for sparse linear algebra like inspector-executor and reordering

  • SPLATT: sparse tensor factorization

  • SOI-FFT: segment-of-interest low-communication FFT algorithm

Publications: Google scholar, github





  • Enabling Sparse Winograd Convolution by Native Pruning, with Sheng Li and Ping Tak Peter Tang

  • Faster CNNs with Direct Sparse Convolutions and Guided Pruning, Jongsoo Park, Sheng Li, Wei Wen, Ping Tak Peter Tang, Hai Li, Yiran Chen, and Pradeep Dubey, International Conference on Learning Representations (ICLR), 2017, accepted for publication, github

  • Sparse Tensor Factorization on Many-Core Processors with High-Bandwidth Memory, Shaden Smith, Jongsoo Park and George Karypis, IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2017, accepted for publication

  • Opportunities and Challenges in Sparse Linear Algebra on Many-Core Processors with High-Bandwidth Memory, Jongsoo Park, invited to present at SIAM Conference on Computational Science and Engineering (CSE'17), slides




  • Efficient Shared-Memory Implementation of High-Performance Conjugate Gradient Benchmark and Its Application to Unstructured Matrices, Jongsoo Park, Mikhail Smelyanskiy, Karthikeyan Vaidyanathan, Alexander Heinecke, Dhiraj D. Kalamkar, Xing Liu, Md. Mostofa Ali Patwary, Yutong Lu, and Pradeep Dubey, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2014, pdf. HPCG is a new sparse linear systems solver benchmark that complements HPL for dense matrix operations. This paper describes our implementation that ranked top positions of the first HPCG list, press release. For more recent results, please refer to slides and BoF presentation.

  • Sparsifying Synchronizations for High-Performance Shared-Memory Sparse Triangular Solver, Jongsoo Park, Mikhail Smelyanskiy, Narayanan Sundaram, and Pradeep Dubey, International Supercomputing Conference (ISC), 2014, pdf, included in Intel MKL Optimized Technology Preview, talk at ASCR HPCG workshop, open sourced at github

  • Versatile and Scalable Parallel Histogram Construction, Wookeun Jung, Jongsoo Park, and Jaejin Lee, International Conference on Parallel Architectures and Compilation Techniques (PACT), 2014, pdf, open sourced at github

  • Navigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets, Nadathur Satish, Narayanan Sundaram, Md. Mostofa Ali Patwary, Jiwon Seo, Jongsoo Park, Muhammad Hassan, Shubo Sengupta, Zhaoming Yin, and Pradeep Dubey, SIGMOD, 2014, pdf

  • Improving Communication Performance and Scalability of Native Applications on Intel Xeon Phi Coprocessor Clusters, Karthikeyan Vaidyanathan, Kiran Pamnany, Dhiraj D. Kalamkar, Alexander Heinecke, Mikhail Smelyanskiy, Jongsoo Park, Daehyun Kim, Aniruddha Shet G, Bharat Kaul, Bálint Joó, and Pradeep Dubey, IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014, pdf

  • Distributed SociaLite: A Datalog-Based Language for Large-Scale Graph Analysis, Jiwon Seo, Jongsoo Park, Jaeho Shin, and Monica S. Lam, International Conference on Very Large Data Bases (VLDB), 2014, project homepage, github, pdf


  • Tera-Scale 1D FFT with Low-Communication Algorithm and Intel Xeon Phi Coprocessors, Jongsoo Park, Ganesh Bikshandi, Karthikeyan Vaidyanathan, Ping Tak Peter Tang, Pradeep Dubey, and Daehyun Kim, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2013, pdf

  • Location-Aware Cache Management for Many-Core Processors with Deep Cache Hierarchy, Jongsoo Park, Richard M. Yoo, Daya S. Khudia, Christopher J. Hughes, and Daehyun Kim, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2013, pdf


  • A Framework for Low-Communication 1-D FFT, Ping Tak Peter Tang, Jongsoo Park, Daehyun Kim, and Vladimir Petrov, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), Best paper, 2012, pdf, included in Intel Math Kernel Library, also published in Journal of Scientific Programming, Vol. 21

  • Efficient Backprojection-Based Synthetic Aperture Radar Computation with Many-Core Processors, Jongsoo Park, Ping Tak Peter Tang, Mikhail Smelyanskiy, Daehyun Kim, and Thomas Benson, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), Best paper finalist, 2012, pdf, also published in Journal of Scientific Programming, Vol. 21

  • Billion-Particle SIMD-Friendly Two-Point Correlation on Large-Scale HPC Cluster Systems, Jatin Chhugani, Changkyu Kim, Hemant Shukla, Jongsoo Park, Pradeep Dubey, John Shalf, and Horst D. Simon, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), Gordon Bell award finalist, 2012, pdf, open source with Lawrence Berkely National Laboratory, github

  • CloudRAMSort: Fast and Efficient Large-scale Distributed RAM Sort on Shared-Nothing Cluster, Changkyu Kim, Jongsoo Park, Nadathur Satish, Hongrae Lee, Pradeep Dubey, and Jatin Chhugani, SIGMOD industrial session, 2012, pdf


  • Memory Optimizations of Embedded Applications for Energy Efficiency, Jongsoo Park, Stanford University Ph.D. Dissertation, 2011

  • Fine-grain Dynamic Instruction Placement for L0 Scratch-pad Memory, Jongsoo Park, James Balfour, and William J. Dally, International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES), 2010, pdf, talk

  • Buffer-space Efficient and Deadlock-free Scheduling of Stream Applications on Multi-core Architectures, Jongsoo Park and William J. Dally, Symposium on Parallelism in Algorithms and Architectures (SPAA), 2010, pdf, talk

  • Maximizing the Filter Rate of L0 Compiler-Managed Instruction Stores by Pinning, Jongsoo Park, James Balfour, and William J. Dally, Technical Report 126, Concurrent VLSI Architecture Group, Stanford University, 2009, pdf

  • A Practical Improvement to the Partial Redundancy Elimination in SSA Form, Jongsoo Park and Jaejin Lee, JCSE, 2008, Vol. 2, No. 3, pdf

  • Hierarchical Instruction Register Organization, David Black-Schaffer, James D. Balfour, William J. Dally, Vishal Parikh and Jongsoo Park, Computer Architecture Letters, 2008, Vol. 7, No. 2

  • Efficient Embedded Computing, William J. Dally, James D. Balfour, David Black-Schaffer, James Chen, R. Curtis Harting, Vishal Parikh, Jongsoo Park and David Sheffield, IEEE Computer, July 2008

  • An Energy-Efficient Processor Architecture for Embedded Systems, James D. Balfour, William J. Dally, David Black-Schaffer, Vishal Parikh and Jongsoo Park, Computer Architecture Letters, 2008, Vol. 7, No. 1

  • Register Pointer Architecture for Efficient Embedded Processors, Jongsoo Park, Sung-Boem Park, James D. Balfour, David Black-Schaffer, Christos Kozyrakis and William J. Dally, Proceedings of the Conference on Design Automation and Test in Europe (DATE), 2007, pdf

Hobbies

  • Photography

  • Writing computer games: these are *really* old games that will run only with emulators

    • Galaxy Fighter: 1996, download

    • Sigmacraft: 1998, download