Research
Research
Event Camer-based Computer Vision with Deep Learning
The event camera is a next-generation image sensor that mimics the human visual system by generating asynchronous event streams based on pixel changes in light intensity. This feature leads to high temporal resolution, reduced motion blur, and low system latency, making it ideal for capturing rapidly changing visual information. It operates efficiently with low power consumption when no light intensity changes are detected, and its high dynamic range (140 dB) allows it to capture details in both dark and bright areas, such as when transitioning through varying light conditions. Event cameras are widely used in computer vision, image processing, robotics, and AR/VR applications. They excel in tasks such as object detection, tracking, simultaneous localization and mapping (SLAM), and visual odometry, especially in resource-constrained environments like drones and vehicles. Their high dynamic range also proves useful in surveillance and environmental monitoring, enabling reliable operation in low-light or rapidly changing lighting conditions. Additionally, event cameras are applied to depth estimation, optical flow, human pose estimation, and hand tracking, contributing to advancements in user interaction, AR/VR devices, and human behavior analysis including eye tracking.
Image Processing and Computer Vision Algorithms based on Machine Learning for Augmented Reality (AR), 3D Displays, Advanced Driver Assistance Systems (ADAS)
Eye pupil tracking is important for augmented reality (AR) three-dimensional (3D) head-up displays (HUDs). Accurate and fast eye tracking is still challenging due to multiple driving conditions with eye occlusions, such as wearing sunglasses. In this paper, we propose a system for commercial use that can handle practical driving conditions. Our system classifies human faces into bare faces and sunglasses faces, which are treated differently. For bare faces, our eye tracker regresses the pupil area in a coarse-to-fine manner based on a revised Supervised Descent Method based eye-nose alignment. For sunglasses faces, because the eyes are occluded, our eye tracker uses whole face alignment with a revised Practical Facial Landmark Detector for pupil center tracking. Furthermore, we propose a structural inference-based re-weight network to predict eye position from non-occluded areas, such as the nose and mouth. The proposed reweight sub-network revises the importance of different feature map positions and predicts the occluded eye positions by non-occluded parts. The proposed eye tracker is robust via a tracker-checker and a small model size. Experiments show that our method achieves high accuracy and speed, approximately 1.5 and 6.5 mm error for bare and sunglasses faces, respectively, at less than 10 ms on a 2.0GHz CPU. The evaluation dataset was captured indoors and outdoors to re_ect multiple sunlight conditions. Our proposed method, combined with AR 3D HUDs, shows promising results for commercialization with low crosstalk 3D images.
Machine Learning Algorithms for Computer-aided Diagnosis from Medical Images
Visual identification of coronary arterial lesion from three-dimensional coronary computed tomography angiography (CTA) remains challenging. We aimed to develop a robust automated algorithm for computer detection of coronary artery lesions by machine learning techniques. A structured learning technique is proposed to detect all coronary arterial lesions with stenosis ≥25%. Our algorithm consists of two stages: (1) two independent base decisions indicating the existence of lesions in each arterial segment and (b) the final decision made by combining the base decisions. One of the base decisions is the support vector machine (SVM) based learning algorithm, which divides each artery into small volume patches and integrates several quantitative geometric and shape features for arterial lesions in each small volume patch by SVM algorithm. The other base decision is the formula-based analytic method. The final decision in the first stage applies SVM-based decision fusion to combine the two base decisions in the second stage. The proposed algorithm was applied to 42 CTA patient datasets, acquired with dual-source CT, where 21 datasets had 45 lesions with stenosis ≥25%. Visual identification of lesions with stenosis ≥25% by three expert readers, using consensus reading, was considered as a reference standard. Our method performed with high sensitivity (93%), specificity (95%), and accuracy (94%), with receiver operator characteristic area under the curve of 0.94. The proposed algorithm shows promising results in the automated detection of obstructive and nonobstructive lesions from CTA.