The NC State CS AI Seminar Series, hosted by the Department of Computer Science, is held biweekly on Fridays in a hybrid format (online and/or in-person). If you are interested in giving a talk or would like to recommend a speaker, please contact the organizers at xliu96@ncsu.edu or submit a nomination through the form.
Faculty Organizers: Xiaorui Liu, Dongkuan Xu, Kaixiong Zhou, Xipeng Shen, Munindar Singh
Date: 11/14/2025, Friday
Time: 11:00 AM - 12:00 PM (EST)
Location: Online Only
Zoom: https://ncsu.zoom.us/j/91683735738
Title: Accelerating Biomolecular Design with Generative AI
Speaker: Wengong Jin (Computer Sciences Department at Northeastern University)
Abstract: The discovery of biomolecules with desired properties is critical to advances in drug discovery and synthetic biology. This problem is challenging due to the combinatorial search space of biomolecules. In this talk, I will present how generative AI can be used to accelerate the discovery process across small molecules, proteins, and RNAs. First, I will present a generative deep learning approach for de novo antibiotic design, where AI methods successfully discovered two lead compounds with in vivo bactericidal efficacy against multidrug-resistant bacteria in mice models. Second, I will present an energy-based modeling approach for protein design named BindEnergyCraft, which provides a principled way for calculating the likelihood of 3D structures and substantially improves the in silico binder success rate of current state-of-the-art binder design methods. Lastly, I will present a diffusion model-based approach for designing RNA translational control elements, using internal ribosome entry sites (IRESs) as a model system. Validated in human cells, we find that AI-generated IRESs circumvent natural sequence constraints and improve IRES activity by nearly 10 fold. In summary, our in silico and experimental results highlight the potential of generative AI for accelerating biomolecular design.
Biography: Wengong Jin is an assistant professor at Khoury College of Computer Sciences at Northeastern University and a visiting research scientist in the Eric and Wendy Schmidt Center at Broad Institute. His research focuses on geometric and generative AI models for drug discovery and synthetic biology. His work has been published in journals including ICML, NeurIPS, ICLR, Nature, Science, Cell, and PNAS, and covered by such outlets as the Guardian, BBC News, CBS Boston, and the Financial Times. He is the recipient of the Google Research Scholar Award, BroadIgnite Award, Dimitris N. Chorafas Prize, and MIT EECS Outstanding Thesis Award.
Date: 10/17/2025, Friday
Time: 11:00 AM - 12:00 PM (EST)
Location: EB2-3-Bridge-3001B-Lactation Room
Zoom: https://ncsu.zoom.us/j/91683735738
Title: Behavior-Aware Data Valuation for LLMs at Scale
Speaker: Zhaozhuo Xu (Computer Science Department at Stevens Institute of Technology)
Abstract: Large Language Models (LLMs) depend on massive datasets whose quality and influence remain largely opaque. Data valuation offers principled methods to quantify how training data contributes to model performance and behavior. Yet, scaling classical approaches such as influence functions to trillion-token corpora continues to be a major challenge. This talk introduces recent advances that address this gap, including the linearized influence kernel, a new and efficient metric that extends to LLMs with billion-scale parameters. We will also highlight system-level frameworks such as RapidIn and present empirical findings of LLM training, including the slowly change phenomenon, which enables forward-looking valuation of future training data. By combining principled algorithms, system optimizations, and case studies, the talk aims to bridge the gap between theory and practice.
Biography: Zhaozhuo Xu is an Assistant Professor in the Department of Computer Science at Stevens Institute of Technology. He received his Ph.D. from Rice University and an M.S. from Stanford University. His research develops randomized algorithms to enhance the efficiency of AI systems on commodity hardware. Dr. Xu’s work has appeared in leading venues such as NeurIPS, ICML, ICLR, OSDI, and ACL, as well as in journals including Nature NPJ AI. His innovations in scalable AI have been integrated into widely used libraries like Hugging Face. He serves as an Associate Editor for Neurocomputing and as an Area Chair for major conferences, including NeurIPS, ICLR, ICML, ACL, EMNLP, NAACL, and COLING. He is a recipient of the AAAI New Faculty Highlights (2025), the NSF CRII Award (2025), and the Stevens Bridging Award.
Date: 10/03/2025, Friday
Time: 11:00 AM - 12:00 PM (EST)
Location: EB II, Conference Room 3211
Zoom: https://ncsu.zoom.us/j/91683735738
Title: Synergizing Sparse Sequence, Experimental, and AI-Predicted Structures for Protein-Nucleic Acid Interaction Predictions
Speaker: Xingcheng Lin (Physics Department at NC State University)
Abstract: Sequence-specific nucleic acid recognition underlies essential processes in gene regulation, yet experiment-independent methods for simultaneous predictions of genomic DNA recognition sites and their binding affinity remain limited. Our group developed data-driven methods and simulation tools to predict and elucidate protein-nucleic acid interactions and their contributions in reshaping chromatin structures. Specifically, we introduce the Interpretable protein-DNA Energy Associative (IDEA) model, an interpretable residue-level biophysical model capable of predicting binding sites and affinities of DNA-binding proteins without relying on experimental binding data. By integrating the structures and sequences of known protein-DNA complexes into an optimized energy model, IDEA enables a direct interpretation of the physicochemical interactions among individual amino acids and nucleotides. Using transcription factors as examples, we demonstrate that IDEA accurately predicts genomic DNA recognition sites and their binding strengths. Additionally, IDEA is incorporated into a coarse-grained simulation framework that quantitatively captures the absolute protein-DNA binding free energies. Collectively, IDEA provides an integrated computational platform that alleviates experimental costs and biases in the assessment of DNA recognition and can be utilized for mechanistic studies of various DNA recognition processes. Finally, I will present our recent progress in extending this framework to predict protein-single-stranded nucleic acid interactions and to design therapeutic aptamers.
Biography: Xingcheng Lin is an assistant professor in the Physics Department at North Carolina State University, starting in August 2023. He is also affiliated with the Bioinformatics Cluster of the Chancellor’s Faculty Excellence Program. Dr. Lin earned his Ph.D. in Biological Physics from the Center for Theoretical Biological Physics and the Physics Department at Rice University. During his graduate studies, he employed both atomistic and coarse-grained simulations to investigate the molecular mechanisms behind the invasion of influenza viruses. He also developed simulation-based tools to refine folded protein structures and to simulate intrinsically disordered proteins. Following his doctorate, Dr. Lin conducted postdoctoral research in the Chemistry Department at the Massachusetts Institute of Technology (MIT), where he broadened his research focus to include the chromatin system. The Lin group focuses on integrating top-down data-driven approaches with bottom-up biophysical simulations to predict protein-nucleic acid interactions and understand their implications for genome regulation.
Date: 9/19/2025, Friday
Time: 11:00 AM - 12:00 PM (EST)
Location: Online Only
Zoom: https://ncsu.zoom.us/j/91683735738
Title: Breaking Barriers: Advancing Long Context LLMs
Speaker: Zirui Liu (University of Minnesota)
Abstract: LLMs have demonstrated impressive conversational abilities. However, scaling them to handle longer contexts, such as extracting information from lengthy articles—a critical task in healthcare, law, and finance applications—presents significant challenges. The two main obstacles are: first, LLMs struggle to process input lengths beyond what they encountered during pre-training; second, even when information is accurately extracted from extended contexts, deploying LLMs in real-world scenarios is limited by hardware capacity. I will discuss recent advances in serving long context LLMs at scale. To address the first challenge, I’ll present our work on extending LLM context length 10X by coarsening the positional encoding. For the second challenge, I will highlight our recent success in 2-bit KV Cache quantization. Lastly, I will briefly discuss the reproducibility issue of reasoning evaluation.
Biography: Zirui Ray Liu is an Assistant Professor of Computer Science at University of Minnesota. His interests lie in the broad area of Machine Learning and Data Mining. He regularly published papers in top venues such as, NeurIPS, ICML, ICLR, and MLSys. His work has been integrated into widely used NLP tools like Llama.cpp and Huggingface Transformers, and was highlighted at Google I/O sessions. Website: https://zirui-ray-liu.github.io/
Date: 9/5/2025, Friday
Time: 11:00 AM - 12:00 PM (EST)
Location: EB II, Conference Room 3211
Zoom: https://ncsu.zoom.us/j/91683735738
Title: Toward Real-Time Ultrasound Computed Tomography: Bridging Wave Physics and Data-Driven Learning
Speaker: Youzuo Lin (University of North Carolina at Chapel Hill & Los Alamos National Laboratory)
Abstract: Ultrasound Computed Tomography (USCT), also known as Full Waveform Inversion (FWI), reconstructs the mechanical properties of biological tissues by modeling the full propagation of ultrasound waves. This modality shows great promise for advanced applications such as breast, neuro, and prostate imaging, yet its clinical adoption has been limited by the trade-off between accuracy and computational efficiency. Physics-based reconstruction methods achieve high-resolution, quantitative maps of tissue properties but are computationally demanding and sensitive to model uncertainties. Data-driven approaches, particularly deep learning, have recently offered accelerated solutions but often lack robustness and generalizability. In this work, we present hybrid USCT strategies that bridge wave physics and machine learning. By embedding physical principles into self-supervised learning frameworks, our methods substantially reduce computational cost while maintaining reconstruction fidelity. We demonstrate their efficacy in challenging prostate imaging scenarios, highlighting their potential to advance USCT toward real-time clinical translation.
Biography: Youzuo Lin is an Associate Professor in the School of Data Science and Society at the University of North Carolina at Chapel Hill. Previously, he served as a Senior Scientist at Los Alamos National Laboratory. He earned his Ph.D. in Applied and Computational Mathematics from Arizona State University in 2010. Youzuo’s research focuses on scientific machine learning methods and their applications, particularly in computational wave imaging, ultrasound tomography, geophysical inversion, and UAV image analysis. He has published over 100 articles in leading journals and conference proceedings and is a co-inventor on several U.S. patents related to ultrasound imaging techniques.