Technical Interest
High performance, low latency, and scalable coherent fabric
ML architecture, design, and implementation
Energy-efficient compute-in-memory architectures for deep learning
Low-latency memory controller SoC for emerging memory technology (ie. MRAM, ReRAM, PCM, etc.)
Hardware/software co-design for accelerator-centric heterogeneous architectures
IP/Network-on-Chip/SoC design, integration, and FPGA prototype
RTL2GDS design flow for SoC including power reduction techniques
Project
Current
Samsung Semiconductor
Technical lead for high-level and micro-architecture definition, RTL implementation, and SW interface of Coherent Interconnect for Samsung mobile SoC.
Experience on fabric protocols, e.g. CHI, ACE, AXI/APB, cache management, last-level cache (LLC)
Analyze the requirements from GPU, CPU, SMC, NPU teams for SoC architecture design and integration.
Define power architecture and low-power RTL implementation (P/Q channel, HW auto clock and power gating) D
Design and integration bandwidth compression IP for CPU workload.
Responsible for collaboration with DV, DFT, and PD (verification plan, timing fixes, CDC, clock/reset network design).
Leading SCI silicon bring-up (debug, scan-dump)
Past
Micron
Design high-performance SoC for AI/ML inference
Design energy-efficient and reconfigurable in/near-memory computing for AI/ML inference
WDC:
Open-source SoC for security (RTL, Lint, and UVM for IPs )
Mixed-signal 3D-stacked neuromorphic SoC tape-out (RTL2GDS)
Dynamic wear-leveling hardware to improve the endurance and security of NVMe memory systems (SW simulation, RTL, and UVM)
PCIe/NVMe System-on-Chip for DDR/ReRAM memory systems (WDC)
Architecture design and model NVDIMM-P controller for NVMe memory system (WDC)
RecoreSystems:
Fault-tolerant Network-on-Chip for multi-core DSP systems (ESA, EU, 2015-2016)
Scalable sensor data processor (SSDP) for harsh environments (ESA, EU, 2015-2016)
Massive parallel processor breadboarding (MPPB-2.0) for space applications (ESA, EU, 2015-2016)
UChicago
10x10 – Systematic heterogeneous architecture (DARPA/NSF, USA, 2013-2015)
Accelerating the next era of measurement (Agilent, KeySight Technologies, USA, 2014-2015)
CTH
Tuned power gating under application control (VR, Sweden, 2010-2013)
FlexSoC - Flexible System-on-Chip platform for embedded systems (SSF, Sweden, 2007-2010)
IMEC
Virtual cell generation flow for rapid digital circuit design (IMEC, KU Leuven, 2010)
KU
Implantable cardiac pacemakers IC design (MEST, Korea, 2004-2007)
Employment
AL/ML Principal Engineer, Micron Technology, USA (07/2021-present)
R&D Engineer, Western Digital Research, USA (11/2016 - 06/2021)
Senior Hardware Design Engineer, Recore System BV, Enschede, The Netherlands (06/2015-12/2016)
Postdoctoral scholar, The University of Chicago, Illinois, USA (05/2013-05/2015)
Education
PhD. in Computer Engineering, Chalmers University of Technology, Sweden, 2013
Lic. in Computer Engineering, Chalmers University of Technology, Sweden, 2010
M.Sc. in Microelectronics, University of Korea, South Korea, 2007
B.Eng in Electronics and Telecommunication, Hanoi University of Technology, Vietnam, 2003
Skills
Programming languages: Verilog/VHDL/UVM, Tcl, Python, C/C++, Matlab
Embedded system: API development, software verification, and tool-chain
Machine learning: Caffe, Tensorflow, energy-efficient architecture for ML
Computer architecture simulators: Gem5, MarssX86, DRAMSim2, SimWattch, Cacti
CAD tools for IC Design: Synopsys (Virtual Platform, Processor Designer, Design Compiler, PrimeTime, PrimeTimePX, IC Compiler, VCS), Cadence (RLT Compiler, SoC Encounter, NCSim, Encounter Power System, Common Power Format, Encounter Library Characterizer), Mentor (ModelSim)
FPGA prototype: Xilinx ISE, Vivaldo HLS, SDSoC
Technology CMOS 130-16nm, chip tape-out
Patent
Granted (6): US11410727 US11397886 US11170290 US11081474 US11568228 US11568200
Application (13)
Effect of OTS Selector Reliabilities on NVM Crossbar-based Neuromorphic Training [IRPS'22]
A System for Validating Resistive Neural Network Prototypes [ICONS'21]
An In-Flash Binary Neural Network Accelerator with SLC NAND Flash Array [ISCAS'20]
Progress in Low-rank Gradient Descent: From Software to Hardware [IRCR'20]
A Data Layout Transformation (DLT) Accelerator: Architectural Support for Data Movement Optimization in Accelerated-centric Heterogeneous Systems [DATE'16]
Fast Support for Unstructured Data Processing: the Unified Automata Processor (UAP) [MICRO'15]
10x10: A Case Study in Highly-Programmable and Energy-Efficient Federated Heterogeneous Architecture [CAN'15]
Performance and Energy Limits of a Processor-integrated FFT Accelerator [HPEC'14]
Data-Width-Driven Power Gating of Integer Arithmetic Circuits [ISVLSI'12]
Power Gating Multiplier of Embedded Processor Datapath [PRIME'11, Silver Leaf Award]
Design High-Speed, Energy-Efficient 2-Cycle Multiply-Accumulate Architecture and Its Application to the Double-Throughput MAC Unit [TCAS-I'10]
Design Space Exploration for an Embedded Processor with Flexible Datapath [ASAP'10]
A Low Complexity, Low Power, Programmable QRS Detector Based On Wavelet Transform for Implantable Pacemaker IC [SoCC'06]
Grand and Award
Ericsson Research Foundation (2013)
Fellowship of European Research Consortium for Informatics and Mathematics (2013)
Silver Leaf Award in IEEE Conf. on Ph.D. Research in Microelectronics and Electronics (2011)
International Scholar Fellowship at IMEC via K.U Leuven University (2010)
Ph.D. Fellowship at Chalmers University of Technology (2007-2013)
Brain Korea 21 (BK21) Scholarship for Master study at Korea University, Korea (2004-2006)
Certificate of Merit for Excellent Undergraduate Student, Hanoi University of Technology, Vietnam (2003)
Professional Activity
Membership: IEEE/IEEE-CAS Member, ACM Member
TPC Member:
NORCAS'19-23, SET-CAS'17 (co-located with ICACCI'17), NORCAS'16, MWSCAS'16, SAI'16, FTC'16 , PRIME'15
Journal reviewer
ACM Trans. on Reconfiguration Technology and Systems (2022), IEEE Journal of Solid-State Circuit (2020-2021), IEEE Trans. on VLSI (TVLSI, 2013-2021), IEEE Trans. on CAS-II (TCAS-II, 2019), Microprocessors and Microsystems (MICPRO, 2018, 2019, 2021), IEEE Trans. on CAS-I (TCAS-I, 2018), VLSI Journal of Integration (2015-2017), EURASIP Journal on Advance in Signal Processing, Springer (2010)
Conference reviewer (*=external reviewer)
2023: NORCAS, ISCAS
2022: NORCAS
2021: ISSCC*, AICAS, NORCAS
2020: ISSSC*, VLSI Sym.*, AICAS, MWSCAS, NORCAS, ICCE
2019: NORCAS, MWSCAS
2018: ISCAS, MWSCAS
2017: NORCAS, MWSCAS, SET-CAS
2016: ISCAS, NORCAS, MWSCAS, ICECS, NEWCAS, ICCE
2015: NORCAS, ICECS, PRIME, ATC
2014: ICCE, ComManTel, ESSCIRC