A Look into the Future: Predicting User Intention of Target Pointing and Selection
Freehand ray pointing is a common input interaction in extended reality (XR). Due to noisy input recognition and imprecise hand movements, input is often slow and error-prone. We present a novel computational framework for predicting user intention during point-and-select tasks based on a grid representation of the environment and trajectory.Â
We trained an ensemble model that combines outputs from unimodal models with either gaze or hand data and a multimodal model with both gaze and hand data. The grid representation-based ensemble model outperforms a raw trajectory-based baseline model, achieving 7% to 12.7% higher accuracy at different grid granularity levels.Â
Our novel approach can provide highly accurate predictions during point-and-select tasks across users, which can be used to enable selection facilitation techniques, thus improving performance and user experience during freehand pointing.
RIDS: Implicit Detection of a Selection Gesture Using Hand Motion Dynamics During Pointing in VR
To improve the detection of a selection gesture in VR during point-and-click interaction. We propose and investigate the use of the information contained within the hand motion dynamics that precede a selection gesture. We built two models that classified if a user is likely to perform a selection gesture at the current moment in time.
A logistic regression classifier was trained using predefined hand motion features, and a temporal convolutional network (TCN) classifier was trained using raw hand motion data. The TCN model was found to improve the precision of a noisy selection gesture by 11.2% without sacrificing recall performance.Â
An initial analysis of the generalizability of the models demonstrated above-chance performance, suggesting that this approach could be scaled to other interaction tasks in the future.
Individualized 3D Brain Function Mapping
Proposed an individualized intracranial electrode positioning method based on multimodal medical image data fusion - Patent # CN103932796A
Developed an approach for individualized 3D dynamic visualization of neural information flow in large-scale cortical networks - Patent # CN103942424B
Built a brain function mapping system based on ECoG and multimodal medical image data fusion - Patent # CN103932701A
Decoding Working Memory Load from EEG with LSTM Networks
Developed a novel method of investigating the role of sequential information in the manipulation and storage of verbal information at various time scales and localize topographically the sources of the sequential information based on decodability
Decoding accuracy increases with an increase in the length of the EEG time series given to the LSTM for both ordered and temporally shuffled cases
 According to the decoding weight maps, the frontal, temporal, and parietal areas are an important source of sequential information based decodability
Object Detection and Image Captioning
For object detection from image, I used the Etection TRansformer (DETR) model which is an encoder-decoder transformer with ResNet-50 backbone. It was introduced in the paper "End-to-End Object Detection with Transformers".
Image captioning is the process of generating caption i.e. description from input image. It requires both natural language processing as well as computer vision to generate the caption. Here, I used the Vision Encoder Decoder Architecture: a pre-trained transformer-based vision model as the encoder (ViT) and a pre-trained language model as the decoder (a GPT2).
Refer to this Colab notebook to play with the Object Detection and Image Captioning
Text-to-Speech (TTS) based on SpeechT5 Model
The SpeechT5 model is pre-trained on text as well as speech inputs, with targets that are also a mix of text and speech.Â
By pre-training on text and speech at the same time, it learns unified representations for both, resulting in improved modeling capabilities.
SpeechT5 can be fine-tuned for different speech tasks. This space demonstrates the text-to-speech for the English language.
Refer to this Colab notebook to play with the SpeechT5 TTS model on your own dataset or language.
Text-to-Image Stable Diffusion Demo
Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input.
The Stable-Diffusion-v1-5 checkpoint was initialized with the weights of the Stable-Diffusion-v1-2 checkpoint and subsequently fine-tuned on 595k steps at resolution 512x512 on "laion-aesthetics v2 5+" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling.
Refer to this Colab notebook to try your own prompt.
Non-Invasive Brain Stimulation HD-tACS
Frontoparietal network was stimulated at theta frequency (4 - 8Hz) during verbal working memory and aftereffeccts analyzed
In-phase frontoparietal theta stimulation improved working memory performance in participants with higher working memory capacity
Enhanced behavioral performance was accompanied by enhanced frontoparietal theta synchrony
Enhanced frontoparietal theta synchronization was driven by enhanced frontal→parietal theta Granger causality
EEG Alpha Oscillation Modulation by Working Memory Load
We investigated the relationship between memory load modulation of alpha power and WMC in two verbal working memory experiments
Individuals with low WMC demonstrate stronger alpha power modulation by memory load, reflecting possibly an increased reliance on sensory gating to suppress task-irrelevant information in these individuals, in contrast to their high WMC counterparts who rely more on frontal areas to perform this function
This negative association between memory load modulation of alpha oscillations and WMC is vulnerable to drug-related cognitive disruption
Brain-Inspired DNN to decode and mapping emotion processing in the human brain using fMRI
To develop an advanced DNN to simulate the emotion process in the human brain, which includes observing an event, encoding and decoding the features in visual cortex, producing emotion valence score through amygdala, and returning feedback to visual cortex via the re-entry mechanism.Â
Proposed this EmotionNet to simulate the emotion recognition process. It has 3 building blocks. First, we used GIST to extract low-level features from the image. Second, we used the pre-trained neural networks including VGG16/AlexNet/ResNet to learn the mid-level object feature from the input images. Third, a semantic segmentation deep network was used to represent the semantic, high-level feature from the input images. Then, all the encoded emotion features were input into a 3 layer fully connected network to predicted the final valence score.Â
We found that our proposed EmotionNet was able to generate very similar results when comparing the predicted valence scores with the real valence scores. This result indicating our network was able to mimic how the brain process emotions to some extent.Â
Characterizing the temporal dynamics of working memory (WM) representations with HD-EEG decoding
Applied multivariate pattern analysis (MVPA) to two high-density EEG datasets from healthy human volunteers performing verbal WM tasks with three different levels of memory load to examine the formation and development of WM representations in the brain
We observed evidence suggesting that WM could be maintained in the format of an activity-silent neural state via the activity-silent synaptic mechanisms. Using current source density (CSD) connectivity-based decoding, we could decode the neural representation about the contents in WM from the so-called activity-silent period. It is quite remarkable that the patterned hidden state in WM networks can be detected at the scalp level using whole-head EEG CSD functional connectivity.
Characterizing drug-induced cognitive impairments in linguistic, memory, and executive functions
Conducted research to help understand the functions that lead to cognitive impairment as a side effect of drug use. Specifically, we investigated Topiramate (TPM), an anti-epilepsy drug.
We showed for the first time that parameters of the rsEEG are associated with the severity of TPM-related working memory deficits.
We have identified a potential clinical risk factor, working memory capacity, which is associated with adverse cognitive events.Â
This work supports clinical efforts to mitigate the side effects of epilepsy treatment and provides a basis for informed decision-making by patients and clinicians.
Neural encoding of cortical representations of movement in the motor cortex
Conducted research to help understand how neurons represent stimuli or events with changes in their firing properties.
Pri-event Time Histogram (PSTH) was used to visualize the timing and firing rate of neuronal spike discharges in relation to an external stimulus or event.
To capture how the average response of the neuron varies with the sensory and motor feature and to examine which direction is the neuron’s preferred direction, we generated a tuning curve that maps the feature value onto the average response of the neuron.
Brain-computer interface (BCI)
Build an EEG P300-based BCI communication device for individuals with severe neurological or muscular diseases.
Demonstrated that the sensory motor rhythms (SMR) can be voluntarily controlled by individuals by imagining movements. This ability can be taken as a control signal for BCI systems. Thus, though BCI devices we can translate the intent of a person to control a one or two dimensional cursor rapidly and accurately.Â
Therefore, BCI can provide us a new communication tools to patients with severe neurological or motor diseases. Such as those who have locked-in syndrome.
Automated system for lung cancer classification based on SVM
Lung cancer is one of the leading causes of cancer mortality worldwide and non–small cell lung cancer (NSCLC) accounts for the most part. NSCLC can be further divided into adenocarcinoma (ACA) and squamous cell carcinoma (SCC). It is of great value to distinguish these two subgroups clinically.
We propose an integrated framework that consists of cell image preprocessing, cell segmentation, feature extraction, classification, and prediction. A majority voting algorithm is introduced to predict new cell image.
Real-time 3D pose detection and pose classification
Pose Detection (aka pose estimation) is a widely used computer vision task that enables you to predict humans poses in images or videos by localizing the key body joints (also reffered as landmarks), these are elbows, shoulders, and knees, etc.Â
MediaPipe provides a robust solution capable of predicting 33 3D landmarks on a human body in real-time with high accuracy even on CPU. It utilizes a two-step machine learning pipeline, by using a detector it first localizes the person within the frame and then uses the pose landmarks detector to predict the landmarks within the region of interest.
Projects on quantitative investment strategies
Implemented a long/short equity strategy based on fundamental factors. The strategy used fundamental data as measures of value, quality and momentum, and then ranked all the stocks in the universe according to the factors.
Built and implemented an automatic moving linear regression channel trend following and mean-reversion trading strategy.
Implemented and backtested Kalman filter-based pairs trading strategy in ETFs.
Developed a refined short-term mean reversion trading strategy for futures.
Investigated the order patterns in futures time-series tick data.
Implemented reinforcement learning to short-term stock trading.
Molded high-frequency limit order book dynamics using machine learning.