Research Projects

Note: If you are confused with the description, just click the video demo to play for each project.

8. MetaSeg (IROS 2020) Video Demo1 Video Demo2

This paper tackles the problem of video object segmentation. We are specifically concerned with the task of segmenting all pixels of a target object in all frames, given the annotation mask in the first frame. Even when such annotation is available this remains a challenging problem because of the changing appearance and shape of the object over time. In this paper, we tackle this task by formulating it as a meta-learning problem, where the base learner grasping the semantic scene understanding for a general type of objects, and the meta learner quickly adapting the appearance of the target object with a few examples. Our proposed meta-learning method uses a closed form optimizer, the socalled “ridge regression”, which has been shown to be conducive for fast and better training convergence. Moreover, we propose a mechanism, named “block splitting”, to further speed up the training process as well as to reduce the number of learning parameters. In comparison with the state-of-the art methods, our proposed framework achieves significant boost up in processing speed, while having very competitive performance compared to the best performing methods on the widely used datasets.

7. PALNet (ICRA2020 and RAL2020) Video demo

Semantic scene completion (SSC) refers to the task of inferring the 3D semantic segmentation of a scene while simultaneously completing the 3D shapes. We propose PALNet, a novel hybrid network for SSC based on single depth. PALNet utilizes a two-stream network to extract both 2D and 3D features from multi-stages using fine-grained depth information to efficiently capture the context, as well as the geometric cues of the scene. Current methods for SSC treat all parts of the scene equally causing unnecessary attention to the interior of objects. To address this problem, we propose Position Aware Loss (PA-Loss) which is position importance aware while training the network. Specifically, PA-Loss considers Local Geometric Anisotropy to determine the importance of different positions within the scene. It is beneficial for recovering key details like the boundaries of objects and the corners of the scene. Comprehensive experiments on two benchmark datasets demonstrate the effectiveness of the proposed method and its superior performance. Code and demo : Video demo can be found here: https://youtu.be/j-LAMcMh0yg . Code are avaliable at https://github.com/UniLauX/PALNet .

6. DDR-SSC (CVPR2019) Video Demo

RGB images differentiate from depth as they carry more details about the color and texture information, which can be utilized as a vital complement to depth for boosting the performance of 3D semantic scene completion (SSC). SSC is composed of 3D shape completion (SC) and semantic scene labeling while most of the existing approaches use depth as the sole input which causes the performance bottleneck. Moreover, the state-of-the-art methods employ 3D CNNs which have cumbersome networks and tremendous parameters. We introduce a light-weight Dimensional Decomposition Residual network (DDR) for 3D dense prediction tasks. The novel factorized convolution layer is effective for reducing the network parameters, and the proposed multi-scale fusion mechanism for depth and color image can improve the completion and segmentation accuracy simultaneously. Our method demonstrates excellent performance on two public datasets. Compared with the latest method SSCNet, we achieve 5.9% gains in SC-IoU and 5.7% gains in SSC-IOU, albeit with only 21% network parameters and 16.6% FLOPs employed compared with that of SSCNet.


5. Multi-object Detection and Tracking Video Demo

This is a engineering project, which aim to combine the ouput of Faster-RCNN ( based on python) and JPDA tracking framework (based on Matlab).

[1] Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, Shaoqing Ren and Kaiming He and Ross Girshick and Jian Sun. NIPS 2015

[2] Joint probabilistic data association revisited. Seyed Hamid Rezatofighi, Anton Milan, Zhen Zhang, Qinfeng Shi, Anthony Dick, Ian Reid. ICCV 2015


4. Light Stage based 3D Reconstruction Check Results Video Demo

Duration: Oct. 2013 to July 2015

Just like the Light Stage device built in UC Berkeley and USC/ ICT, our Lab( State Key Lab of CAD&CG, Zhejiang Unversity) has a 7.0 m diameter Light Stage with 54 cameras and 816 write-color LED lights accompanied by a series of video capture software. Our research interests are mainly focus on Computer Vision.

Our main task was to implement a system to reconstruct the 3-D model of a person based on the data captured by the Light Stage, aka, Motion Capture. Our key idea is taking advantage of both the temporal and spacial consistency. We obtained extrinsic and intrinsic parameters using a self-made LED, combined them with a mask (automatically generated using GMM), and applied VisualHull for separated frame. Our work with kernels sought to obtain robust depth maps using Bundle Optimization and adding Optical Flow as constraints to receive the final results. We then used Possion Recon to generate the Point Cloud and 3-D model.

Software-related work: I completed almost all of the programming and debugging for the completed system. My work covers the following scope: Apparatus Construction, Capture Data, Calibration, Video Segmentation, Depth Recovery, Optical Flow, 3D Model Reconstruction.

Hardware-related work: As the head of management for Light Stage rebulit in Zhejiang University with the cooperation of LUSTER LightTech Co.,Ltd, we spend 6 months building a more stable system with High-speed Core Recording Box and high resolution images. I have spend 3 months working with machine fault detection and processing and am considerably familiar with the Huge Multi-Cameras Capture System.

3. USC NDT Project (Light Field & 3D Holographic Display)

Duration: Aug. 2015 to Jan 2016

At the begining of August 2015, I joined in USC ICT Graphics Lab headed by Prof. Paul Debevec and work with Research Scientists Andrew Jones and Graham Fyffe for NDT project( New Dimensions in Testimony).

During the first three months, I first read thoroughly several papers about 3D Display and became familiar with Light Filed principles. In addition, I learned about the work principles (mechanisms) of “Interactive 360° Light Field Display” used two mirrors with 45 angles and “An Autostereoscopic Projector Array Optimized for 3D body-size human Display” with 216 projectors. They combine Holographic Projection and could achieve naked eye 3D.

Inspired by Microsoft Free View Video paper published in Siggraph 2015, and using of Light Stage 6 with 8 diameter for data capture, besides 60 Panasonic cameras, we mounted another 58 IR cameras. After using the cube automatically to get camera parameters, I have finished coding VisualHull for getting the coarse geometry. Meanwhile, due to the state-of-the-art cutout software could not handle the situations with similar color in foreground and background, I used HSV space representation combine with contour detection automatically get a robust matting result. And in next step, we would put the result of real-time face reconstruction by Light Stage X into this project, take advantages of IR cameras for depth recovery and co-work with some mesh tracking technologies to reconstruct 3D Human Body with consistent texture and obvious details.

2. Depth Recovery Check Results Video Demo

Duration: Jul. 2013 to Sep. 2013

This project mainly aims to build a system for recovering consistent depth maps from a video sequence. The algorithm is based on a my supervisor‘s, Guofeng Zhang, paper from TPAMI 2009, "Consistent Depth Maps Recovery from a Video Sequence." We first used the SFM method to get camera parameters, followed with a three-step optimization framework for getting depth maps, initializing the depth maps with Mean-Shift color segementation to ensure the smooth photo-consistency, and refined the disparities by means of bundle optimization, utilizing the geometric coherence with multiple frames. The final step is a 2 iterations Depth Expansion refinement. All the above can effectively deal with image noise, occlusions, and outliers.

I finished coding the whole process for getting depth maps, and my software can work well for obtaining robust and smooth depth maps for some complexed background video examples.


1. Video Segmentaion/Matting Video Demo

Duration: Oct. 2012 to Mar. 2013

This system is my thesis work for my bachelor degree. It mainly discuss about the design and implementation of interactive video cutout system. Firstly, the work analyzes the algorithm principle of GrabCut used in still image and Video SnapCut in video sequences in detail. GrabCut is implemented to get the foreground object in still image, the system mainly based on Video SnapCut, which is a popular video cutout algorithm, can be used for object-tracking and motion estimated dynamically. Additionally, the system also provides a series of tools for user interactive to get a better result in real-time; all this work led to a satisfied result. Moreover, the system also made use of OpenMP for multi-threaded program to get a better reaction speed, which is contributes to enhancing the performance of our system.

I formed a 80 pages thesis paper and implemented a Interactive Video Cutout System, which has the ability to cutout still image and track the people in video sequence. With this system, I obtained the second prize of the College Students' Computer Works Contest in the Chongqing province.