Publications

I aspire to be a researcher in the academic/industrial setting. 

My current research interests lie in the development of computer vision and machine learning techniques for large scale video data analysis, event detection,medical image analysis, biomedical and clinical applications. Moreover I am open to a variety of research direction.

Depth Estimation for Indoor Panoramas through Merging Multiple Perspective Monocular Depth Predictions    WACV Nov 2022 

We propose a method to estimate depths for indoor panoramas by merging multiple perspective depth maps produced by modern monocular depth estimation methods such as LeReS. The challenge is that the perspective depth maps, each covering a different subset of the panorama, tend to have different scales and shifts of the predicted depth values. Simply stitching them together led to inconsistent depth values for the whole panorama with visible seams. To address the challenge, we propose a novel approach to solve a single depth map for the whole panorama with information taken from each perspective depth estimations and a common panoramic depth map served as the reference for merging. 

Distortion Reduction for Off-Center Perspective Projection of Panoramas

Best Short Paper Award    NICOGRAPH Nov 2021 

In this paper, we propose modifications to the equirectangular-to-perspective (E2P) projection that significantly reduce distortions when the camera position is away from the origin. This enables users to not only "look around" but also "walk around" virtually in a single panorama with more convincing renderings. We compare with other techniques that aim to augment panoramas with 3D information, including 1) panoramas with depth information and 2) panoramas augmented with room layouts, and show that our approach provides more visually convincing results.

Synthetic 3D Data Generation Pipeline for Geometric Deep Learning in Architecture

 Project Page     Deep AI News April 2021 

With the growing interest in deep learning algorithms and computational design in the architectural field, the need for large, accessible, and diverse architectural datasets increases. We decided to tackle this problem by constructing a field-specific synthetic data generation pipeline that generates an arbitrary amount of 3D data along with the associated 2D and 3D annotations. The variety of annotations, the flexibility to customize the generated building and dataset parameters make this framework suitable for multiple deep learning tasks, including geometric deep learning that requires direct 3D supervision. Creating our building data generation pipeline we leveraged architectural knowledge from experts in order to construct a framework that would be modular, extendable, and would provide a sufficient amount of class-balanced data samples.  [paper]

Enabling both time-domain and frequency-domain photoacoustic imaging by a fingertip laser diode system    

 Journal Optics Letters     February  2019 

In this paper, we report a custom-made fingertip laser diode system enabling both pulsed and continuous modulation modes with shortest pulse-width of 40 ns, largest driving current of 13 A, and highest modulation frequency of 3 MHz, which is suitable for both time and frequency domain photoacoustic imaging. To the best of our knowledge, this may be the most compact laser source reported for photoacoustic imaging enabling both two modulation modes. Owing to its super-compact size, the proposed LD system could pave the pathway to a low-cost photoacoustic sensing and imaging device, even wearable photoacoustic biomedical sensors.  [paper]

A Noise Reduction Method for Photoacoustic Imaging In Vivo Based on EMD and Conditional Mutual Information 

 February  2019  IEEE Photonics Journal

Without sacrificing signal fidelity and imaging speed, an empirical mode decomposition (EMD) combined with conditional mutual information de-noising algorithm for photoacoustic tomography is proposed in this paper. The simulation results and experimental results of photoacoustic signal de-noising achieve significant improvement of signal-to-noise ratio of photoacoustic signal and the enhancement of contrast of the reconstructed image. The simulation results and experimental results show that EMD combined with mutual information method improves at least 2 dB and 3 dB, respectively, more than traditional wavelet threshold method and band-pass filter. The improvement of contrast-to-noise ratio is more than 2 dB and 3 dB, respectively, more than traditional wavelet threshold method and band-pass filter. [paper]

Photoacoustic Image Classification and Segmentation of Breast Cancer: A Feasibility Study

 Journal   2018 December  IEEE Access            

We used a pre-processing algorithm to enhance the quality and uniformity of input breast cancer images and a transfer learning method to achieve better classification performance. Besides, by comparing the area under the curve, sensitivity, and specificity of support vector machine with AlexNet and GoogLeNet, it can be concluded that the combination of deep learning and photoacoustic imaging has the potential to achieve important impact on clinical diagnostics. Finally, according to the breast imaging reporting and data-system levels, we divided breast cancer images into six grades and designed a segmentation software for identifying the six grades of breast cancer. Then, we tested based on MAMMOGRAPHYC IMAGES DATABASE FROM LAPIMO EESC/USP (Laboratory of Analysis and Processing of Medical and Dental Images) to verify the accuracy of our segmentation method, which showed a satisfactory result.  [paper]

Action detection based on tracklets with the two-stream CNN  

 Springer Multimedia Tools and Applications                   March 2017 

In this paper, a novel action detection method is proposed by embedding multiple object tracking into the action detection process. Firstly, we fine-tune the off-the-shelf faster-RCNN model to detect people in frames. Then, a simple tracking-by-detection algorithm is adopted to obtain tracklets for keeping temporal consistency. After that, we apply a temporal multi-scale sliding window strategy to each tracklet to generate the action proposal. Finally, the action proposal is further fed into a fully connected neural network to complete the classification task. Here, features of the action proposal are obtained by the two-stream CNN.  [paper]