Towards Hospital Metaverse – Intelligent Hospital Digital Twin
Intelligent Hospital Digital Twin Creation: Develop a first‑of‑its‑kind Intelligent Virtual Hospital that digitally replicates UF Health operating rooms and Neuro ICU suites. This environment will enable high‑fidelity simulation of clinical workflows, multidisciplinary decision‑making, and realistic patient scenarios. By mirroring real hospital operations, the platform will support training, research, and clinical optimization at unprecedented scale.
Real‑World Data Integration: Integrate real‑time clinical, waveform, device, and environmental sensor data from UF Health into NVIDIA’s Omniverse platform, creating a dynamic and continuously updating digital twin. This statewide testbed will allow researchers and clinicians to analyze system‑level behavior, evaluate intervention strategies, and study how environmental and operational factors influence patient outcomes and provider performance.
Advanced AI & VR Capabilities: Incorporate AI‑driven human proxies capable of simulating patients, clinicians, staff, and families, alongside immersive VR interfaces that support natural user interaction. These capabilities will enable deeply interactive training, cognitive workload assessment, and scenario testing, ultimately enhancing clinical preparedness, improving care delivery models, and expanding the Virtual Hospital’s role as a comprehensive platform for healthcare innovation.
MedTwin-LLM: Fine-Tuned LLMs for Intelligent Hospital Digital Twin
Domain‑Optimized Clinical LLM: Fine‑tune a open source LLMs (GPT-OSS, Llama 3) using curated clinical datasets on UF’s HiPerGator multi‑GPU infrastructure, integrating multimodal Retrieval‑Augmented Generation (RAG) and NeMo Guardrails to ensure safe, reliable, and domain‑specific reasoning for critical care environments.
Multimodal Intelligence for Digital Twins: Build speech‑to‑text, text‑to‑speech, and interactive avatar capabilities so the LLM can serve as a natural, conversational interface within hospital digital twin simulations, enabling realistic clinician‑AI and patient‑AI interactions.
AI‑Enhanced Clinical Simulation & Discovery: Deploy the fine‑tuned model into hospital digital twin applications to accelerate medical research, enrich clinical workflow simulation, and provide human‑centered, trustworthy AI support for care teams and life sciences innovation.
CDMAgent: An Agentic Multimodal AI Framework for Autonomous Clinical Decision Making
CDMAgent is a commercialization‑ready agentic multimodal AI framework designed to advance autonomous clinical decision making. The system integrates large language models with structured clinical data—including history, laboratory values, and imaging metadata—to support iterative reasoning, differential diagnosis refinement, and guideline‑aligned recommendations.
CDMAgent has been validated using retrospective, de‑identified MIMIC-IV datasets, demonstrating feasibility in real‑world diagnostic workflows. The platform incorporates multi‑agent orchestration, a scalable SaaS architecture with security and compliance features, and initial interoperability with EHR systems, including preparation for Epic integration.
This project seeks to accelerate early commercialization by finalizing a clinician‑oriented interface, supporting integration for pilot deployments, and engaging clinical partners to evaluate impact on decision quality, workflow efficiency, and care consistency.
We developed a comprehensive social determinants of health (SDoH) database to address limitations in existing SDoH resources, which often lack completeness and integration. By centralizing diverse SDoH indicators and improving usability, the platform supports more personalized and equitable healthcare through streamlined linkage with electronic health records (EHRs).
The system features a scalable architecture with robust spatial relationships, optimized query performance, and an automated linkage tool that enables efficient, reproducible connection of SDoH data to EHRs. A user‑friendly web interface supports researchers, clinicians, and public health professionals, with planned expansions to include additional indicators and longitudinal trend data.
Early deployment demonstrates efficient EHR integration, strong scalability, and accurate spatial data handling. This unified resource strengthens the incorporation of contextual SDoH insights into clinical and population health research, advancing informed and equitable healthcare.
Explainable, Fair, Reproducible and Collaborative Surgical Artificial Intelligence: Integrating data, algorithms and clinical reasoning for surgical risk assessment
This project advances a new framework for Explainable, Fair, Reproducible, and Collaborative Medical AI to improve surgical care at scale. Building on prior success—including a real-time surgical risk assessment system deployed at UF—it aims to overcome major barriers such as limited datasets, lack of interpretability, and challenges in cross‑institutional model sharing.
Leveraging the OneFlorida consortium of 22 hospitals and 10 million patients, the project validates dynamic and equitable surgical risk algorithms, develops a full-stack explainable AI platform, and implements privacy-preserving federated learning for collaborative model training.
By creating the first FAIR surgical AI-ready multimodal dataset and advancing interpretable, actionable real-time risk surveillance, this work ultimately seeks to reduce complications, improve patient outcomes, and lower healthcare costs nationwide.
A Look into the Future: Predicting User Intention of Target Pointing and Selection
Freehand ray pointing is a common input interaction in extended reality (XR). Due to noisy input recognition and imprecise hand movements, input is often slow and error-prone. We present a novel computational framework for predicting user intention during point-and-select tasks based on a grid representation of the environment and trajectory.
We trained an ensemble model that combines outputs from unimodal models with either gaze or hand data and a multimodal model with both gaze and hand data. The grid representation-based ensemble model outperforms a raw trajectory-based baseline model, achieving 7% to 12.7% higher accuracy at different grid granularity levels.
Our novel approach can provide highly accurate predictions during point-and-select tasks across users, which can be used to enable selection facilitation techniques, thus improving performance and user experience during freehand pointing.
RIDS: Implicit Detection of a Selection Gesture Using Hand Motion Dynamics During Pointing in VR
To improve the detection of a selection gesture in VR during point-and-click interaction. We propose and investigate the use of the information contained within the hand motion dynamics that precede a selection gesture. We built two models that classified if a user is likely to perform a selection gesture at the current moment in time.
A logistic regression classifier was trained using predefined hand motion features, and a temporal convolutional network (TCN) classifier was trained using raw hand motion data. The TCN model was found to improve the precision of a noisy selection gesture by 11.2% without sacrificing recall performance.
An initial analysis of the generalizability of the models demonstrated above-chance performance, suggesting that this approach could be scaled to other interaction tasks in the future.
Individualized 3D Brain Function Mapping
Proposed an individualized intracranial electrode positioning method based on multimodal medical image data fusion - Patent # CN103932796A
Developed an approach for individualized 3D dynamic visualization of neural information flow in large-scale cortical networks - Patent # CN103942424B
Built a brain function mapping system based on ECoG and multimodal medical image data fusion - Patent # CN103932701A
Decoding Working Memory Load from EEG with LSTM Networks
Developed a novel method of investigating the role of sequential information in the manipulation and storage of verbal information at various time scales and localize topographically the sources of the sequential information based on decodability
Decoding accuracy increases with an increase in the length of the EEG time series given to the LSTM for both ordered and temporally shuffled cases
According to the decoding weight maps, the frontal, temporal, and parietal areas are an important source of sequential information based decodability
Object Detection and Image Captioning
For object detection from image, I used the Etection TRansformer (DETR) model which is an encoder-decoder transformer with ResNet-50 backbone. It was introduced in the paper "End-to-End Object Detection with Transformers".
Image captioning is the process of generating caption i.e. description from input image. It requires both natural language processing as well as computer vision to generate the caption. Here, I used the Vision Encoder Decoder Architecture: a pre-trained transformer-based vision model as the encoder (ViT) and a pre-trained language model as the decoder (a GPT2).
Refer to this Colab notebook to play with the Object Detection and Image Captioning
Text-to-Speech (TTS) based on SpeechT5 Model
The SpeechT5 model is pre-trained on text as well as speech inputs, with targets that are also a mix of text and speech.
By pre-training on text and speech at the same time, it learns unified representations for both, resulting in improved modeling capabilities.
SpeechT5 can be fine-tuned for different speech tasks. This space demonstrates the text-to-speech for the English language.
Refer to this Colab notebook to play with the SpeechT5 TTS model on your own dataset or language.
Text-to-Image Stable Diffusion Demo
Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input.
The Stable-Diffusion-v1-5 checkpoint was initialized with the weights of the Stable-Diffusion-v1-2 checkpoint and subsequently fine-tuned on 595k steps at resolution 512x512 on "laion-aesthetics v2 5+" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling.
Refer to this Colab notebook to try your own prompt.
Non-Invasive Brain Stimulation HD-tACS
Frontoparietal network was stimulated at theta frequency (4 - 8Hz) during verbal working memory and aftereffeccts analyzed
In-phase frontoparietal theta stimulation improved working memory performance in participants with higher working memory capacity
Enhanced behavioral performance was accompanied by enhanced frontoparietal theta synchrony
Enhanced frontoparietal theta synchronization was driven by enhanced frontal→parietal theta Granger causality
EEG Alpha Oscillation Modulation by Working Memory Load
We investigated the relationship between memory load modulation of alpha power and WMC in two verbal working memory experiments
Individuals with low WMC demonstrate stronger alpha power modulation by memory load, reflecting possibly an increased reliance on sensory gating to suppress task-irrelevant information in these individuals, in contrast to their high WMC counterparts who rely more on frontal areas to perform this function
This negative association between memory load modulation of alpha oscillations and WMC is vulnerable to drug-related cognitive disruption
Brain-Inspired DNN to decode and mapping emotion processing in the human brain using fMRI
To develop an advanced DNN to simulate the emotion process in the human brain, which includes observing an event, encoding and decoding the features in visual cortex, producing emotion valence score through amygdala, and returning feedback to visual cortex via the re-entry mechanism.
Proposed this EmotionNet to simulate the emotion recognition process. It has 3 building blocks. First, we used GIST to extract low-level features from the image. Second, we used the pre-trained neural networks including VGG16/AlexNet/ResNet to learn the mid-level object feature from the input images. Third, a semantic segmentation deep network was used to represent the semantic, high-level feature from the input images. Then, all the encoded emotion features were input into a 3 layer fully connected network to predicted the final valence score.
We found that our proposed EmotionNet was able to generate very similar results when comparing the predicted valence scores with the real valence scores. This result indicating our network was able to mimic how the brain process emotions to some extent.
Characterizing the temporal dynamics of working memory (WM) representations with HD-EEG decoding
Applied multivariate pattern analysis (MVPA) to two high-density EEG datasets from healthy human volunteers performing verbal WM tasks with three different levels of memory load to examine the formation and development of WM representations in the brain
We observed evidence suggesting that WM could be maintained in the format of an activity-silent neural state via the activity-silent synaptic mechanisms. Using current source density (CSD) connectivity-based decoding, we could decode the neural representation about the contents in WM from the so-called activity-silent period. It is quite remarkable that the patterned hidden state in WM networks can be detected at the scalp level using whole-head EEG CSD functional connectivity.
Characterizing drug-induced cognitive impairments in linguistic, memory, and executive functions
Conducted research to help understand the functions that lead to cognitive impairment as a side effect of drug use. Specifically, we investigated Topiramate (TPM), an anti-epilepsy drug.
We showed for the first time that parameters of the rsEEG are associated with the severity of TPM-related working memory deficits.
We have identified a potential clinical risk factor, working memory capacity, which is associated with adverse cognitive events.
This work supports clinical efforts to mitigate the side effects of epilepsy treatment and provides a basis for informed decision-making by patients and clinicians.
Neural encoding of cortical representations of movement in the motor cortex
Conducted research to help understand how neurons represent stimuli or events with changes in their firing properties.
Pri-event Time Histogram (PSTH) was used to visualize the timing and firing rate of neuronal spike discharges in relation to an external stimulus or event.
To capture how the average response of the neuron varies with the sensory and motor feature and to examine which direction is the neuron’s preferred direction, we generated a tuning curve that maps the feature value onto the average response of the neuron.
Brain-computer interface (BCI)
Build an EEG P300-based BCI communication device for individuals with severe neurological or muscular diseases.
Demonstrated that the sensory motor rhythms (SMR) can be voluntarily controlled by individuals by imagining movements. This ability can be taken as a control signal for BCI systems. Thus, though BCI devices we can translate the intent of a person to control a one or two dimensional cursor rapidly and accurately.
Therefore, BCI can provide us a new communication tools to patients with severe neurological or motor diseases. Such as those who have locked-in syndrome.
Automated system for lung cancer classification based on SVM
Lung cancer is one of the leading causes of cancer mortality worldwide and non–small cell lung cancer (NSCLC) accounts for the most part. NSCLC can be further divided into adenocarcinoma (ACA) and squamous cell carcinoma (SCC). It is of great value to distinguish these two subgroups clinically.
We propose an integrated framework that consists of cell image preprocessing, cell segmentation, feature extraction, classification, and prediction. A majority voting algorithm is introduced to predict new cell image.
Real-time 3D pose detection and pose classification
Pose Detection (aka pose estimation) is a widely used computer vision task that enables you to predict humans poses in images or videos by localizing the key body joints (also reffered as landmarks), these are elbows, shoulders, and knees, etc.
MediaPipe provides a robust solution capable of predicting 33 3D landmarks on a human body in real-time with high accuracy even on CPU. It utilizes a two-step machine learning pipeline, by using a detector it first localizes the person within the frame and then uses the pose landmarks detector to predict the landmarks within the region of interest.
Projects on quantitative investment strategies
Implemented a long/short equity strategy based on fundamental factors. The strategy used fundamental data as measures of value, quality and momentum, and then ranked all the stocks in the universe according to the factors.
Built and implemented an automatic moving linear regression channel trend following and mean-reversion trading strategy.
Implemented and backtested Kalman filter-based pairs trading strategy in ETFs.
Developed a refined short-term mean reversion trading strategy for futures.
Investigated the order patterns in futures time-series tick data.
Implemented reinforcement learning to short-term stock trading.
Molded high-frequency limit order book dynamics using machine learning.