Experience
Work Experience
Software Development Engineer, Amazon, New York, NY, April 2021 ~ Present
Worked on Annotation Hub, a platform to provide computer vision data for ML training
Worked on AnalyzeID project, allowing human task force to identify and label government IDs
Senior Staff Engineer, Xilinx Research Labs, San Jose, CA, April 2020 ~ April 2021
Led 3 engineers on Zynq RFSoC board development and bring-up
Ported Vitis AI ML models, software libraries, and hardware designs onto embedded platforms
Developed Pybind11 C++ binding magic allowing users to program C++ in Jupyter notebook
Staff Design Engineer, Xilinx Research Labs, San Jose, CA, September 2015 ~ April 2020
Served as the No. 1 contributor for PYNQ (Python Productivity for ZYNQ) open-source framework
Developed bash scripts in QEMU environment to build SD card image for embedded platform
Built Python CFFI/Python-C/Pybind11/SWIG bindings for C/C++ targeting multiple architectures
Summer Research Intern, Hitachi Global Storage Technology, San Jose, CA, May 2013 ~ August 2013
Implemented DDR3 memory controller using Verilog for DRAM and MRAM on FPGA
Verified the design using Xilinx ChipScope and detected MRAM reading/writing bit errors
Summer Intern, Shanghai Jiao Tong University, Shanghai, China, July 2008 ~ Aug 2008
Designed and optimized a wireless Frequency Modulated (FM) transmitter and an FM receiver
Research Projects
High-performance Packet / Traffic Classification on FPGA, May 2011 ~ present
Led a research team developing a 2-dimensional pipelined architecture using Verilog on FPGA
Utilized logic cells and LUT-based distributed RAM to construct modular processing elements
Achieved superior throughput (2-fold compared to prior works) while supporting dynamic updates
Estimated post-simulation power on Vivado for classification / lookup engines on Virtex 5, 6, and 7 FPGAs
Explored the design space with respect to various design parameters such as clock rate and resource utilization
Multi-flow Regular Expression Matching (REM), March 2011 ~ April 2012
Implemented REM engines processing up to 128 packet flows concurrently at 200 MHz clock rate
Parsed packet headers and Snort / Bro regular expression patterns in the backend
Generated VHDL/Verilog files for large RTL designs using automatic Perl scripts
Developed parameterizable designs and conducted design-space exploration by Tcl scripts
Optimized the RTL design using PlanAhead tool and register retiming to meet the clock constraints
Education
University of Southern California, Ph.D., Computer Engineering, 2011 ~ 2015
University of Southern California, Master, Electrical Engineering, 2009 ~ 2011
Shanghai Jiao Tong University, Bachelor, Electrical Engineering, 2005 ~ 2009