General Papers on Generative Artificial Intelligence and Health Sciences
-Fahrner LJ, Chen E, Topol E, Rajpurkar P. The generative era of medical AI. Cell. 2025 Jul 10;188(14):3648-60.
-Omar M, Nadkarni GN, Klang E, Glicksberg BS. Large language models in medicine: a review of current clinical trials across healthcare applications. PLOS Digital Health. 2024 Nov 19;3(11):e0000662.
-Liu F, Zhou H, Gu B, Zou X, Huang J, Wu J, Li Y, Chen SS, Hua Y, Zhou P, Liu J. Application of large language models in medicine. Nature Reviews Bioengineering. 2025 Apr 7:1-20.
-Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine. Nat Med. 2023 Aug;29(8):1930-1940.
-Clusmann J, Kolbinger FR, Muti HS, Carrero ZI, Eckardt JN, Laleh NG, Löffler CML, Schwarzkopf SC, Unger M, Veldhuizen GP, Wagner SJ, Kather JN. The future landscape of large language models in medicine. Commun Med (Lond). 2023 Oct 10;3(1):141.
-Meng X, Yan X, Zhang K, Liu D, Cui X, Yang Y, Zhang M, Cao C, Wang J, Wang X, Gao J, Wang YG, Ji JM, Qiu Z, Li M, Qian C, Guo T, Ma S, Wang Z, Guo Z, Lei Y, Shao C, Wang W, Fan H, Tang YD. The application of large language models in medicine: A scoping review. iScience. 2024 Apr 23;27(5):109713.
-Haug CJ, Drazen JM. Artificial intelligence and machine learning in clinical medicine, 2023. New England Journal of Medicine. 2023 Mar 30;388(13):1201-8.
-Guo D, Yang D, Zhang H, Song J, Wang P, Zhu Q, Xu R, Zhang R, Ma S, Bi X, Zhang X. Deepseek-r1 incentivizes reasoning in LLMs through reinforcement learning. Nature. 2025 Sep 18;645(8081):633-8.
-Gallifant J, Fiske A, Levites Strekalova YA, Osorio-Valencia JS, Parke R, Mwavu R, Martinez N, Gichoya JW, Ghassemi M, Demner-Fushman D, McCoy LG. Peer review of GPT-4 technical report and systems card. PLOS digital health. 2024 Jan 18;3(1):e0000417.
-Comanici G, Bieber E, Schaekermann M, Pasupat I, Sachdeva N, Dhillon I, Blistein M, Ram O, Zhang D, Rosen E, Marris L. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. arXiv preprint, 2025.
-Bai S, Chen K, Liu X, Wang J, Ge W, Song S, Dang K, Wang P, Wang S, Tang J, Zhong H. Qwen2. 5-vl technical report. arXiv preprint, 2025.
Generative Artificial Intelligence and Health Sciences Education
-Russell RG, Lovett Novak L, Patel M, Garvey KV, Craig KJT, Jackson GP, Moore D, Miller BM. Competencies for the Use of Artificial Intelligence-Based Tools by Health Care Professionals. Acad Med. 2023 Mar 1;98(3):348-356.
-Masters K, Herrmann-Werner A, Festl-Wietek T, Taylor D. Preparing for Artificial General Intelligence (AGI) in Health Professions Education: AMEE Guide No. 172. Medical Teacher. 2024 Aug 7:1-4.
-Tolsgaard MG, Pusic MV, Sebok-Syer SS, Gin B, Svendsen MB, Syer MD, Brydges R, Cuddy MM, Boscardin CK. The fundamentals of Artificial Intelligence in medical education research: AMEE Guide No. 156. Med Teach. 2023 Jun;45(6):565-573.
-Boscardin CK, Gin B, Golde PB, Hauer KE. ChatGPT and Generative Artificial Intelligence for Medical Education: Potential Impact and Opportunity. Acad Med. 2024 Jan 1;99(1):22-27.
-He K, Mao R, Lin Q, Ruan Y, Lan X, Feng M, Cambria E. A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics. Preprint, 2023.
-Hussain Z, Binz M, Mata R, Wulff DU. A tutorial on open-source large language models for behavioral science. Behavior Research Methods. 2024 Dec;56(8):8214-37.
-Zaghir J, Naguib M, Bjelogrlic M, Névéol A, Tannier X, Lovis C. Prompt engineering paradigms for medical applications: Scoping review. Journal of Medical Internet Research. 2024 Sep 10;26:e60501.
-Lin Z. How to write effective prompts for large language models. Nature Human Behaviour. 2024 Apr;8(4):611-5.
-Schulhoff S, Ilie M, Balepur N, Kahadze K, Liu A, Si C, Li Y, Gupta A, Han H, Schulhoff S, Dulepet PS. The Prompt Report: A Systematic Survey of Prompting Techniques. Preprint, 2024.
Dijkstra P, Greenhalgh T, Mekki YM, Morley J. How to read a paper involving artificial intelligence (AI). BMJ Medicine. 2025 Apr 14;4(1).
Reporting Guidelines and Generative Artificial Intelligence
-Huo B, Collins GS, Cacciamani GE, Guyatt G. Reporting guidelines for studies involving generative artificial intelligence applications: what do I use, and when? NPJ Digit Med. 2025 Nov 7;8(1):646.
-Lekadir K, Frangi AF, Porras AR, ..., Salahuddin Z, Starmans MPA; FUTURE-AI Consortium. FUTURE-AI: international consensus guideline for trustworthy and deployable artificial intelligence in healthcare. BMJ. 2025 Feb 5;388:e081554.
-Miao BY, Chen IY, Williams CY, Davidson J, Garcia-Agundez A, Sun S, Zack T, Saria S, Arnaout R, Quer G, Sadaei HJ. The MI-CLAIM-GEN checklist for generative artificial intelligence in health. Nature Medicine. 2025 Feb 6:1-5.
-Gallifant J, Afshar M, Ameen S, Aphinyanaphongs Y, Chen S, Cacciamani G, Demner-Fushman D, Dligach D, Daneshjou R, Fernandes C, Hansen LH. The TRIPOD-LLM reporting guideline for studies using large language models. Nature Medicine. 2025 Jan 8:1-0.
-Park SH, Suh CH, Lee JH, Kahn CE, Moy L. Minimum Reporting Items for Clear Evaluation of Accuracy Reports of Large Language Models in Healthcare (MI-CLEAR-LLM). Korean J Radiol. 2024 Oct;25(10):865-868.
-Vaira LA, Lechien JR, Abbate V, Allevi F, Audino G, Beltramini GA, Bergonzani M, Boscolo-Rizzo P, Califano G, Cammaroto G, Chiesa-Estomba CM. Validation of the Quality Analysis of Medical Artificial Intelligence (QAMAI) tool: a new tool to assess the quality of health information provided by AI platforms. European Archives of Oto-Rhino-Laryngology. 2024 Nov;281(11):6123-31.
Confabulations and Generative Artificial Intelligence
-Smith AL, Greaves F, Panch T. Hallucination or Confabulation? Neuroanatomy as metaphor in Large Language Models. PLOS Digit Health. 2023 Nov 1;2(11):e0000388.
-Farquhar S, Kossen J, Kuhn L, Gal Y. Detecting hallucinations in large language models using semantic entropy. Nature. 2024 Jun;630(8017):625-630.
-Cheng A, Nagesh V, Eller S, Grant V, Lin Y. Exploring AI Hallucinations of ChatGPT: Reference Accuracy and Citation Relevance of ChatGPT Models and Training Conditions. Simul Healthc. 2025 Dec 1;20(6):413-418.
-Cossio M. A comprehensive taxonomy of hallucinations in Large Language Models. arXiv preprint, 2025.
-Kim Y, Jeong H, Chen S, Li SS, Lu M, Alhamoud K, Mun J, Grau C, Jung M, Gameiro RR, Fan L. Medical Hallucination in Foundation Models and Their Impact on Healthcare. Preprint, medRxiv. 2025.
-Bedi S, Liu Y, Orr-Ewing L, Dash D, Koyejo S, Callahan A, Fries JA, Wornow M, Swaminathan A, Lehmann LS, Hong HJ. Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review. JAMA. 2025;333(4):319-328.
-Lobentanzer S, Feng S, Bruderer N, Maier A; BioChatter Consortium; Wang C, Baumbach J, Abreu-Vicente J, Krehl N, Ma Q, Lemberger T, Saez-Rodriguez J. A platform for the biomedical application of large language models. Nat Biotechnol. 2025 Feb;43(2):166-169.
-Longpre S, Mahari R, Chen A, Obeng-Marnu N, Sileo D, Brannon W, Muennighoff N, Khazam N, Kabbara J, Perisetla K, Wu X. A large-scale audit of dataset licensing and attribution in AI. Nature Machine Intelligence. 2024 Aug;6(8):975-87.
Generative Artificial Intelligence as Research Tool
-Messeri L, Crockett MJ. Artificial intelligence and illusions of understanding in scientific research. Nature. 2024 Mar;627(8002):49-58.
-Lehr SA, Caliskan A, Liyanage S, Banaji MR. ChatGPT as Research Scientist: Probing GPT’s capabilities as a Research Librarian, Research Ethicist, Data Generator, and Data Predictor. Proceedings of the National Academy of Sciences. 2024 Aug 27;121(35):e2404328121.
-Vaccaro M, Almaatouq A, Malone T. When combinations of humans and AI are useful: A systematic review and meta-analysis. Nat Hum Behav. 2024 Dec;8(12):2293-2303.
Swanson K, Wu W, Bulaong NL, Pak JE, Zou J. The Virtual Lab of AI agents designs new SARS-CoV-2 nanobodies. Nature. 2025 Jul 29:1-3.
-Gartlehner G, Kahwati L, Nussbaumer-Streit B, Crotty K, Hilscher R, Kugley S, Viswanathan M, Thomas I, Konet A, Booth G, Chew R. From promise to practice: challenges and pitfalls in the evaluation of large language models for data extraction in evidence synthesis. BMJ Evid Based Med. 2024 Dec 20:bmjebm-2024-113199.
-Lieberum JL, Töws M, Metzendorf MI, Heilmeyer F, Siemens W, Haverkamp C, Böhringer D, Meerpohl JJ, Eisele-Metzger A. Large language models for conducting systematic reviews: on the rise, but not yet ready for use-a scoping review. J Clin Epidemiol. 2025 Feb 26;181:111746.
Scherbakov D, Hubig N, Jansari V, Bakumenko A, Lenert LA. The emergence of large language models as tools in literature reviews: a large language model-assisted systematic review. Journal of the American Medical Informatics Association. 2025 Jun;32(6):1071-86.
Au LS, Qu L, Nielsen J, Ge Z, Gurrin LC, Mol BW, Wang R. Using artificial intelligence to semi-automate trustworthiness assessment of randomized controlled trials: a case study. J Clin Epidemiol. 2025 Apr;180:111672
Reynolds SA, Christie AP, Dicks LV, Jaffer S, Madhavapeddy A, Smith RK, Sutherland WJ. Will AI speed up literature reviews or derail them entirely?. Nature. 2025 Jul 10;643(8071):329-31.
-Hauser AS. The future of reviews: Will LLMs render them obsolete?. EMBO reports. 2025 Aug 26:1-5.
-Nejjar M, Zacharias L, Stiehle F, Weber I. LLMs for science: Usage for code generation and data analysis. Journal of Software: Evolution and Process. 2025 Jan;37(1):e2723.
-Dobler D, Binder H, Boulesteix AL, Igelmann JB, Köhler D, Mansmann U, Pauly M, Scherag A, Schmid M, Al Tawil A, Weber S. ChatGPT as a Tool for Biostatisticians: A Tutorial on Applications, Opportunities, and Limitations. Statistics in Medicine. 2025 Oct;44(23-24):e70263.
-Moore JH, Tatonetti N. Vibe coding: a new paradigm for biomedical software development. BioData Mining. 2025 Jul 1;18:46.
-Chow M, Ng O. From technology adopters to creators: Leveraging AI-assisted vibe coding to transform clinical teaching and learning. Medical Teacher. 2025 Apr 9:1-3.
.Huang Y, Wu R, He J, Xiang Y. Evaluating ChatGPT-4.0's data analytic proficiency in epidemiological studies: A comparative analysis with SAS, SPSS, and R. J Glob Health. 2024 Mar 29;14:04070.
-Mohamed AM. A comparative evaluation of statistical product and service solutions (SPSS) and ChatGPT-4 in statistical analyses. Cureus. 2024 Oct 28;16(10).
-Lubiana T, Lopes R, Medeiros P, Silva JC, Goncalves ANA, Maracaja-Coutinho V, Nakaya HI. Ten quick tips for harnessing the power of ChatGPT in computational biology. PLoS Comput Biol. 2023 Aug 10;19(8):e1011319.