Experience
Hands-on Experience
*Computer Vision, AR*
SfM, visual SLAM and odometry, hand-eye calibration, MVS, disparity estimation in passive stereo, image-based relocalization, content insertion on planar regions, factorization-based reconstruction, image-based ground modeling, depth or motion layer segmentation, object detection and posture estimation (alignment), tracking with PF/mean shift/online learning (tracking-as-detection), indexing & search, scene/image classification.
*Machine (Deep) Learning, Data Mining*
MLP, Convolutional Network (R-CNN, fast R-CNN, faster R-CNN, ResidualNet, FCN, encoder-decoder network), Sparse coding & dictionary learning, offline/online SVM/Boosting, random forest, fast k-NN search, PCA/ICA/NNMF/LDA.
*Cloud computing and distributed deep learning*
AWS-based video transcoding with job scheduling, load balancing and auto scaling, distributed CNN learning with model/data parallelism on the Intel cluster server.
*Computational Photography, VR*
Panoramic view generation, spherical view rendering, volume rendering, depth-image-based rendering and refocusing, inpainting, matting & compositing, tone mapping for HDR, upscaling & super-resolution, retargeting and summarization.
*Image/Video Processing*
Denoising/deblur, deinterlacing/framerate upconversion, dehazing, detail/contrast enhancement, JND.
*ADAS and Autonomous Driving*
Perception with GPS/IMU, camera, radar and LiDAR sensors.
More than 30 academic papers, 17 granted US patents and about 20 US/Europe patents pending.
Professional Activities
Member of IEEE and Member of ACM.
Reviewers for IEEE Transaction on Image Processing, Trans. on Knowledge & Database Engineering, IEEE Trans. on System, Man & Cybernetics, IEEE Signal Processing Letters, IEEE Transaction on Circuit and System for Video Technology.
Reviewers for IEEE Int. Conf. on Computer Vision 2001, IEEE Int. Conf. on Multi-modal Interface 2002, IEEE Int. Conf. on Multimedia and Expo 2007-08-09, European Signal Processing Conference 2008, 3DTV-CON 2009-10 and IEEE Int. Conf. on Image Processing 2008-09-10-11, GlobeCom'10, ICPR'12, ACM MM'13.
Secretary of Human centered communication in IEEE Communication Society, June 2008 - June 2011.
Industrial Publication Committee, Asia-Pacific Signal and Information Processing Association (APSIPA), Feb. 2018 - now.
Tech. Talk "Deep learning for Autonomous Driving (Overview)" at Shanghai Institute for Advanced Communication & Data Science, invited by Prof. Shugong Xu (IEEE Fellow), Shanghai University, Mar. 16 2018.
Organizer of Innovation Forum "Towards Autonomous Driving", IEEE Int. Conference on Multimedia Information Processing and Retrieval (MIPR'19), March 28-30, 2019.
Tech. Talk on "Technologies and Challenges of Autonomous Driving", meanwhile nominated as adjunct professor of Shanghai University, at Shanghai Institute for Advanced Communication & Data Science, April 15 2019.
Nominated by Asia Pacific Signal and Information Processing Association (APSIPA) as Year 2020 Distinguished Industrial Leader (one of 4 award winners).
Computer Skills
Languages: Visual C/C++ (STL), Matlab, Java, Python, Hadoop, Cuda.
Multimedia API Tools: FFmpeg, OpenCV, OpenGL, ROS, PCL, Caffe, Tensorflow, PyTorch.
Operating System: WindowsNT/Windows, Unix, Linux, DOS.
Work Experience
Company NameAsia-Pacific Signal and Information Processing Association (APSIPA)
Company NameAsia-Pacific Signal and Information Processing Association (APSIPA)
Company NameAsia-Pacific Signal and Information Processing Association (APSIPA)
Company NameAsia-Pacific Signal and Information Processing Association (APSIPA)Industrial Publication Committee, Asia-Pacific Signal and Information Processing Association (APSIPA)
Chief Scientist and Global AI Technology Officer: 4/21-3/22, Stealth Startup.
Autonomous driving (visual perception, LiDAR, sensor fusion, localization & mapping, prediction, decision making & planning, control, simulation & testing, software architecture and system platform).
VP of Technology, Autonomous Driving Research: 3/20-4/21, Black Sesame Technology Inc., Santa Clara, California.
Autonomous driving research (perception, mapping & localization, prediction, planning & control, simulation).
Adjunct professor: Mar. 2019-Mar. 2022, Shanghai University, Shanghai, China.
Graduate student tutoring on autonomous driving and deep learning for computer vision.
Senior Staff Architect: 6/14 - 6/16 Intel, Santa Clara, California.
President and Chief Scientist of Autonomous Driving: 1/18 - 3/20, Singulato USA, Santa Clara, California.
Autonomous Driving L2-L3-L4 (perception, mapping & localization, prediction, planning & control, simulation).
Senior Architect of Autonomous Driving: 8/16 - 1/18 Baidu USA, Sunnyvale, California.
Perception (GPS+IMU, LiDAR, Camera) in Autonomous Driving: LiDAR and camera-based visual odometry/SLAM, target-less sensor calibration, hand-eye calibration, stereo vision and early sensor fusion by machine learning (deep learning) etc. online camera calibration for inverse perspective mapping, vanishing point detection, road lane detection and tracking, vehicle detection and tracking, vehicle orientation/distance estimation.
Graphics Media Architect at VPG (6/15-6/16)
VR (panorama generation, spherical video rendering), Machine Learning (visual image search, object detection), compiler optimization (supervising intern from Prof. HT Kung's group, Harvard U) and Deep Learning ( data/model parallelism, feature learning and model fine-tuning, scene categorization in video summarization).
Technical evaluation of "Replay Technologies" (eventually in March 2016 Intel purchased it with $150M), also for VR companies as Immersive Media, 3D4U/Voke (purchased by Intel in Dec. 2016), NextVR, JauntVR etc.
Computer Vision Architect at CCG (6/14-5/15)
Computer Vision (stereo, SfM, MVS, depth-based FG/BG segmentation), visual SLAM (mono, stereo RGB-D), AR (visual + IMU fusion), Machine Learning (image-based relocalization) and Deep Learning (image denoising and SR).
Collaborated with UK startup "Seene", for camera-based scene reconstruction with mobile (iphone). Note: "Seene" was purchased by Snap (Chat) in June 2016.
Implementation of pseudo real time stereo-based depth estimation on Intel Atom mobile products.
Senior Staff SW Development Engineer : 3/13 - 5/14 Harmonic Inc., San Jose, California.
Working on Video Pre-Processing to improve video visual quality and compression performance: 1. Visual masking-based video denoising with non local self-similarity and convolutionsl neural network; 2. Machine learning (SVM)-based visual quality classification of image patches for bitrates reduction in video compression; 3. Contour extraction by ensemble learning with clustered patch features; 4. Image decomposition-based detail and contrast enhancement.
Senior Staff Algorithm Engineer : 09/12 - 3/13 Real Communications Inc, San Jose, California.
Worked on motion estimation and motion compensation-based frame rate up conversion for TV chip product R&D.
Senior Staff Research Engineer : 06/11 - 9/12 Samsung Electronics US R&D Center, Algorithm Group, Digital Media Solutions Lab of Samsung Information Systems America, Irvine, California.
Worked on 3-D TV and Smart TV: video denoising, detail enhancement, upscaling, & super-resolution by collaborative filtering, interactive object cutout (saliency map and grabcut), contour tracking based rotoscoping with image segmentation, optic flow and GMM-based local classifers, Bags of visual words-based image/scene classification by SVM/Naive Bayes classifier, visual object indexing & search through bags of words in kd-tree, inverted file/hashing and geometric consistency-based reranking.
Senior Researcher : 04/08 - 06/11 Futurewei Technology Inc. USA, Media Networking Lab of Core Network Product Line, Bridgewater, New Jersey.
Worked on Smart Media Manipulation for Adaptive Distribution and Service Enhancement. Mentor interns and lead several projects in content processing and understanding, including video ads insertion in video scene for augmented reality, cloud-based media processing using Amazon EC2, S3, CloudFront plus EBS, Video Retargeting and Video Abstraction, i.e. video skimming and storyboard, Clickable Videos and Video Annotation (object-based). Led several academic collaboration projects with Peking University, U. of Missouri at Columbia and State U. of New York at Buffalo, i.e. content-targeted (contextual) advertising, merchandise classification for annotation and sports player detection for highlighting.
Senior Member of Technical Staff (Researcher) : 07/05 - 04/08 Thomson Multimedia Inc. (Technicolor R&I now), Since March 2006, work in the Thomson lab at Princeton, NJ, moving from the Indianapolis lab, Indiana.
Worked for projects in image and video processing related areas, such as object detection and tracking, panoramic view generation from video-based foreground segmentation. Meanwhile, working on stereo rectification, depth estimation and view interpolation for 3-D video coding (previously called Multiple View Video Coding, i.e.MVC), 3DTV and Free Viewpoint Video Rendering. While staying at Indy for half a year, worked for a "Tape Restoration" project (FM demod, NTSC decoding and noise removal).