Large language models (LLMs) and large vision models (LVMs) have advanced NLP and computer vision, enabling multimodal large language models (MLLMs) to integrate diverse data types. This survey examines MLLMs in radiology, focusing on radiology report generation (RRG) and radiology visual question answering (RVQA). We outline MLLM evolution, key applications, datasets, evaluation metrics, and leading models. Challenges such as dataset scarcity, privacy, bias, hallucinations, and evaluation limitations are discussed. Finally, we propose future research directions to help AI researchers and radiologists advance MLLMs in radiology.
Conventional Applicant Tracking Systems (ATSs) struggle with unstructured data and complex qualifications. We introduce Resume2Vec, a transformer-based approach using encoders (BERT, RoBERTa, DistilBERT) and decoders (GPT, Gemini, Llama) to embed resumes and job descriptions, evaluated via cosine similarity. Combining quantitative embedding analysis with human assessment, Resume2Vec outperforms traditional ATSs, improving nDCG by 15.85% and RBO by 15.94%, especially in mechanical engineering and health and fitness. While ATSs slightly excel in some fields, Resume2Vec aligns better with human judgment overall. This study highlights transformer-based models’ potential to enhance recruiting by improving resume-job matching accuracy.
Wildfire spread prediction is vital as wildfires destroy millions of acres and cause billions in losses annually. Increasing wildfire severity, duration, and frequency highlight two key challenges: limited high-quality meteorological data and inaccurate models. This research addresses these issues by creating a comprehensive dataset from years of granular data, standardized to one-day intervals and one km² areas. Using scientifically validated features from high-quality public datasets, we developed a robust pipeline for wildfire prediction. The dataset supports a CNN-LSTM model, aiding environmental agencies and insurers in mitigating wildfire impacts. Our work aims to improve prediction accuracy, enhancing disaster preparedness and response.
Deploying fair and transparent machine learning models remains a significant challenge. This study examines bias in transformer-based language models of varying sizes, using movie reviews and the Word Embedding Association Test (WEAT). Results show that scaling models reduces bias, with larger models achieving up to a 29% reduction. Additionally, prompt-based learning proves effective, cutting genre bias in reviews by over 37% on average. These findings emphasize the potential of structured prompt engineering to enhance fairness and integrate ethical AI practices into transformer models, paving the way for more equitable and responsible AI systems.
Fine particulate matter (PM2.5) poses significant health and environmental risks, with complex spatiotemporal dynamics in urban areas. Low-cost sensor (LCS) networks enable high-resolution data collection, providing detailed insights for policymaking and health assessment. This study employs classic and spatial Markov chains to analyze PM2.5 seasonality and intra-daily trends using LCS data. Results show better air quality in summer and midday, with poorer conditions during winter and evening rush hours, emphasizing the need for targeted pollution control. Rural sites maintain better air quality, while temporal scale impacts spatial Markov analysis by increasing stability and reducing variation. This work highlights the value of high-resolution data and Markov chains for understanding air pollution.
Stock2Vec utilizes Word2Vec embeddings to turn stock price fluctuations into high-dimensional vectors, and then uses principal component analysis (PCA) to compress those many dimensions to 2 to visualize the similarities between two different stocks. After "compression", we found that similar stocks share similar traits or locations on the graph. Furthermore, we have created a 4-dimensional vector representation of companies that can be used to predict which of 11 sectors of the economy a company belongs to. By refining this model we can start to do stock-risk prediction with these embeddings.
For small clinical samples, a summary metric can be more useful and reliable for assessments. In this project, we applied both principal component analysis (PCA) and a deep autoencoder to create a summary metric from 8 standard clinical scores for the improvement in mobility for 9 lower limb amputees using two different prosthetic devices - a CPU-controlled knee and a mechanical knee. The summary metrics using both methods demonstrate high significance and are more reliable than the individual scores. The autoencoder metric captured 83% of the variance and the PCA only captured 67%. The autoencoder composite score represents a single valued, succinct summary which can be useful for holistic assessment of highly variable, individual scores in limited clinical data sets.
Time, as an essential feature for trend detection, is often neglected in topic modeling. By adding a weighted temporal feature, time, to bias a K-means clustering toward articles is a promising way of trend detection. In this project, Latent Dirichlet Allocation (LDA) and Singular Value Decomposition (SVD) were used in the parameterization of finance journal abstracts. A trend score for automatic detection of a trend was created by utilizing the silhouette score for topic interpretability and the standard deviation of years to quantify localization in time. By introducing the role of time in topic clustering, we are more able to identify historical trends, which ultimately enables a better prediction of the direction of academic research going forward.
Using a combination of Gaussian mixture modeling and hidden Markov models, we have developed an app to automatically cluster speech in recorded audio to isolate and identify each unique speaker. The current version of the app records and analyzes the conversation to identify who is speaking and when. After a conversation the app reports the total time each person spoke and provides a scrollable piano plot to present the timing and interpersonal dynamics between speakers throughout the conversation. For those with android phones, I would encourage a download of our released “Conversation Moderator” by searching the Google Play Store.
Bruce Gaynes, an ophthalmologist at Loyola medical, has established a biomarker for Parkinson’s disease that relies on the rate of pupil dilation in a particular combination of drug and light exposure. We have create a program for him to automatically measure the amount of pupil dilation in video. The program accepts video of a pupil reflexively constricting and dilating upon exposure to particular frequencies of light, and returns useful parameters for clinicians to judge if the subject has a related degenerative disease.
Originally, physical therapists brainstormed a project to track counts of instructed motions during their therapy sessions. However, many of their sessions have movements that are nonstandard, possibly unique to the individual. The goal of this project was to create a system to robustly count periodic motions of any type using a wearable device. This project can be applied to perform and evaluate exercises at home with minimal help from a physical therapist.
In this project, a hidden Markov model (HMM) was used to improve clip-based static classifier accuracy in a patient activity recognition task using wearable sensors for subjects with incomplete spinal cord injury. Tailored activity recognition such as this study is important in creating high-accuracy real-world log of patient activity, so that therapies can be targeted to demonstrably improve at-home movement and quality of life.
We present a precision study of the process e+e−→π+π− using data collected with the CLEO-c detector. By analyzing the cross section in a specific mass range with the initial state radiation method, we provide critical insights into the hadronic contribution to the muon’s anomalous magnetic moment. These results refine a key parameter in testing the Standard Model and contribute to ongoing efforts to understand potential New Physics underlying discrepancies in the muon g−2.
Using data collected with the CLEO-c detector at 4170 MeV, we analyzed the decay ψ(4160) → π⁺π⁻J/ψ and observed the charged particle Zc±(3900) decaying into π⁺J/ψ or π⁻J/ψ, with a significance >5σ. We measured its mass as 3886 ± 4 (stat) ± 2 (syst) MeV and its width as 37 ± 4 (stat) ± 8 (syst) MeV, consistent with prior results from BES III and Belle. Importantly, we report the first discovery of the neutral particle Zc0(3900), observed in its decay to π⁰J/ψ with a significance of 3.5σ.