Principal Researcher, Intel Labs, Intel Corporation, 1998 - present
Co-developed SIMD instructions in billions of Intel processors since 2006 (Conroe, Nehalem, SandyBridge, Haswell, and Knights Corner processors)
Significance: substantial performance/power improvement
Impacts: SSSE3 in Core 2, AVX in 2nd generation Core, and gather instruction in 4th generation Core
Patents and publications: 22 granted patents and 1 ISCA paper (top computer architecture conference)
Directed the development of intelligent IoT middleware ($100M market potential by 2020)
Developed an end-to-end distributed video analytics framework
Significance: ultra-efficiency in terms of bandwidth, power, and computation
Impact: Public demonstration at 2015 Intel Developer Forum’s executive keynote speech, (upcoming) reference design for Intel’s IoT customers in 2016
Developed an intelligent IoT programming and management platform
Significance: simple, resilient, and efficient large-scale IoT systems
Impact: (Upcoming) public demonstration at Computex 2016
Patents: 8 pending patent applications
Co-directed large-scale collaboration between Intel and National Taiwan University (the first large-scale industry-university research program in Taiwan, with more than US$20M, co-funded by Intel, Taiwan government, and NTU, over 5 years)
Significance: fueling industrial research/development from academic research
Impacts: The first (1) frameworks for distributed video analytics & data mining, (2) unified IoT programming, deployment, management framework, (3) LTE-based M2M communication for energy harvesting devices, and (4) comprehensive framework for visible light communication, using regular LED lights and regular cameras
Publications: 13 papers
Shaped microprocessor designs by building a benchmark suite to evaluate instruction-, data-, and thread-level parallelism (ILP, DLP, and TLP).
Significance: Proper tradeoff among ILP, DLP, and TLP, which are indispensable for today’s processors.
Impacts: Internal SkyBench benchmark suite to drive SkyLake and Knights Ferry processor designs. External ALPBench (with UIUC) and PARSEC (Princeton Univ) to drive academic microarchitecture research
Patents and publications: 1 granted patent and 3 papers
Revamped emerging algorithm designs by developing parallelization and optimization methodology for emerging applications in speech recognition, data mining, & machine learning
Significance: applications must switch to parallel algorithms to take advantage of multicore system performance.
Impacts: 7 tutorials at IEEE conferences
Patents and publications: More than 30 publications, including one with VLDB best paper award, and the first optimized H.264 software implementation.
Designed efficient cache/memory subsystem for multi-thread and multi-core architectures for emerging applications, e.g., recognition, mining, & synthesis algorithms and applications
Significance: memory subsystem is one of the most critical components when we have multi-cores.
Patents and publications: 10 granted patents and 6 papers, including one in IEEE Micro and one at ISCA 2011 (top computer architecture conference).
Co-optimized the performance of MPEG video codec in terms of both speed and compression efficiency for better MPEG video playback on personal computers
Significance: faster speed and lower bit-rate
Impact: first real-time MPEG-4 Fine Granularity Scalability encoder in software
Patents and publications: 14 granted patents and 3 papers
Summer Researcher, Mitsubishi Electric, 7/1997 - 8/1997
Developed a new scheme for intra/inter mode selection in MPEG encoding.
Patent and publication: 1 granted patent and 1 paper