AI-Research

Diego A. Forero, MD, PhD

Key Papers on Artificial Intelligence and Scientific Research

General Papers on Generative Artificial Intelligence and Health Sciences

-Fahrner LJ, Chen E, Topol E, Rajpurkar P. The generative era of medical AI. Cell. 2025 Jul 10;188(14):3648-60.

-Omar M, Nadkarni GN, Klang E, Glicksberg BS. Large language models in medicine: a review of current clinical trials across healthcare applications. PLOS Digital Health. 2024 Nov 19;3(11):e0000662.

-Liu F, Zhou H, Gu B, Zou X, Huang J, Wu J, Li Y, Chen SS, Hua Y, Zhou P, Liu J. Application of large language models in medicine. Nature Reviews Bioengineering. 2025 Apr 7:1-20.

-Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine. Nat Med. 2023 Aug;29(8):1930-1940.

-Clusmann J, Kolbinger FR, Muti HS, Carrero ZI, Eckardt JN, Laleh NG, Löffler CML, Schwarzkopf SC, Unger M, Veldhuizen GP, Wagner SJ, Kather JN. The future landscape of large language models in medicine. Commun Med (Lond). 2023 Oct 10;3(1):141.

-Meng X, Yan X, Zhang K, Liu D, Cui X, Yang Y, Zhang M, Cao C, Wang J, Wang X, Gao J, Wang YG, Ji JM, Qiu Z, Li M, Qian C, Guo T, Ma S, Wang Z, Guo Z, Lei Y, Shao C, Wang W, Fan H, Tang YD. The application of large language models in medicine: A scoping review. iScience. 2024 Apr 23;27(5):109713.

-Haug CJ, Drazen JM. Artificial intelligence and machine learning in clinical medicine, 2023. New England Journal of Medicine. 2023 Mar 30;388(13):1201-8.

-Bedi S, Cui H, Fuentes M, Unell A, Wornow M, Banda JM, Kotecha N, Keyes T, Mai Y, Oez M, Qiu H. Holistic evaluation of large language models for medical tasks with MedHELM. Nature Medicine. 2026 Jan 20:1-9.

-Guo D, Yang D, Zhang H, Song J, Wang P, Zhu Q, Xu R, Zhang R, Ma S, Bi X, Zhang X. Deepseek-r1 incentivizes reasoning in LLMs through reinforcement learning. Nature. 2025 Sep 18;645(8081):633-8.

-Gallifant J, Fiske A, Levites Strekalova YA, Osorio-Valencia JS, Parke R, Mwavu R, Martinez N, Gichoya JW, Ghassemi M, Demner-Fushman D, McCoy LG. Peer review of GPT-4 technical report and systems card. PLOS digital health. 2024 Jan 18;3(1):e0000417.

-Comanici G, Bieber E, Schaekermann M, Pasupat I, Sachdeva N, Dhillon I, Blistein M, Ram O, Zhang D, Rosen E, Marris L. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. arXiv preprint, 2025.

-Bai S, Chen K, Liu X, Wang J, Ge W, Song S, Dang K, Wang P, Wang S, Tang J, Zhong H. Qwen2. 5-vl technical report. arXiv preprint, 2025.

Generative Artificial Intelligence and Health Sciences Education

-Russell RG, Lovett Novak L, Patel M, Garvey KV, Craig KJT, Jackson GP, Moore D, Miller BM. Competencies for the Use of Artificial Intelligence-Based Tools by Health Care Professionals. Acad Med. 2023 Mar 1;98(3):348-356.

-Masters K, Herrmann-Werner A, Festl-Wietek T, Taylor D. Preparing for Artificial General Intelligence (AGI) in Health Professions Education: AMEE Guide No. 172. Medical Teacher. 2024 Aug 7:1-4.

-Tolsgaard MG, Pusic MV, Sebok-Syer SS, Gin B, Svendsen MB, Syer MD, Brydges R, Cuddy MM, Boscardin CK. The fundamentals of Artificial Intelligence in medical education research: AMEE Guide No. 156. Med Teach. 2023 Jun;45(6):565-573.

-Ke Y, Jin L, Ong JC, Thirunavukarasu AJ, Car J, Cheung CY, Tham YC, Ting DS, Ong ME, Compton S, Narayan A. AI-induced never-skilling in medical education. Nature Medicine. 2026 May 22:1-0.

-Boscardin CK, Gin B, Golde PB, Hauer KE. ChatGPT and Generative Artificial Intelligence for Medical Education: Potential Impact and Opportunity. Acad Med. 2024 Jan 1;99(1):22-27.

-He K, Mao R, Lin Q, Ruan Y, Lan X, Feng M, Cambria E. A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics. Preprint, 2023.

-Hussain Z, Binz M, Mata R, Wulff DU. A tutorial on open-source large language models for behavioral science. Behavior Research Methods. 2024 Dec;56(8):8214-37.

-Zaghir J, Naguib M, Bjelogrlic M, Névéol A, Tannier X, Lovis C. Prompt engineering paradigms for medical applications: Scoping review. Journal of Medical Internet Research. 2024 Sep 10;26:e60501.

-Lin Z. How to write effective prompts for large language models. Nature Human Behaviour. 2024 Apr;8(4):611-5.

-Schulhoff S, Ilie M, Balepur N, Kahadze K, Liu A, Si C, Li Y, Gupta A, Han H, Schulhoff S, Dulepet PS. The Prompt Report: A Systematic Survey of Prompting Techniques. Preprint, 2024.

Dijkstra P, Greenhalgh T, Mekki YM, Morley J. How to read a paper involving artificial intelligence (AI). BMJ Medicine. 2025 Apr 14;4(1).

Reporting Guidelines and Generative Artificial Intelligence

-Huo B, Collins GS, Cacciamani GE, Guyatt G. Reporting guidelines for studies involving generative artificial intelligence applications: what do I use, and when? NPJ Digit Med. 2025 Nov 7;8(1):646.

-Lekadir K, Frangi AF, Porras AR, ..., Salahuddin Z, Starmans MPA; FUTURE-AI Consortium. FUTURE-AI: international consensus guideline for trustworthy and deployable artificial intelligence in healthcare. BMJ. 2025 Feb 5;388:e081554.

-Miao BY, Chen IY, Williams CY, Davidson J, Garcia-Agundez A, Sun S, Zack T, Saria S, Arnaout R, Quer G, Sadaei HJ. The MI-CLAIM-GEN checklist for generative artificial intelligence in health. Nature Medicine. 2025 Feb 6:1-5.

-Gallifant J, Afshar M, Ameen S, Aphinyanaphongs Y, Chen S, Cacciamani G, Demner-Fushman D, Dligach D, Daneshjou R, Fernandes C, Hansen LH. The TRIPOD-LLM reporting guideline for studies using large language models. Nature Medicine. 2025 Jan;31(1):60-9.

-Park SH, Suh CH, Lee JH, Kahn CE, Moy L. Minimum Reporting Items for Clear Evaluation of Accuracy Reports of Large Language Models in Healthcare (MI-CLEAR-LLM). Korean J Radiol. 2024 Oct;25(10):865-868.

-Vaira LA, Lechien JR, Abbate V, Allevi F, Audino G, Beltramini GA, Bergonzani M, Boscolo-Rizzo P, Califano G, Cammaroto G, Chiesa-Estomba CM. Validation of the Quality Analysis of Medical Artificial Intelligence (QAMAI) tool: a new tool to assess the quality of health information provided by AI platforms. European Archives of Oto-Rhino-Laryngology. 2024 Nov;281(11):6123-31.

-Sounderajah V, Guni A, Liu X, Collins GS, Karthikesalingam A, Markar SR, Golub RM, Denniston AK, Shetty S, Moher D, Bossuyt PM. The STARD-AI reporting guideline for diagnostic accuracy studies using artificial intelligence. Nature Medicine. 2025 Sep 15:1-7.

-Collins GS, Moons KG, Dhiman P, Riley RD, Beam AL, Van Calster B, Ghassemi M, Liu X, Reitsma JB, Van Smeden M, Boulesteix AL. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ. 2024 Apr 16;385.

-Liu X, Rivera SC, Moher D, Calvert MJ, Denniston AK, Ashrafian H, Beam AL, Chan AW, Collins GS, Deeks AD, ElZarrad MK. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. The Lancet Digital Health. 2020 Oct 1;2(10):e537-48.

-Vasey B, Nagendran M, Campbell B, Clifton DA, Collins GS, Denaxas S, Denniston AK, Faes L, Geerts B, Ibrahim M, Liu X. Reporting guideline for the early stage clinical evaluation of decision support systems driven by artificial intelligence: DECIDE-AI. BMJ. 2022 May 18;377.

Confabulations and Generative Artificial Intelligence

-Smith AL, Greaves F, Panch T. Hallucination or Confabulation? Neuroanatomy as metaphor in Large Language Models. PLOS Digit Health. 2023 Nov 1;2(11):e0000388.

-Forero DA. Multiple Confabulations Found in Bioinformatics Tasks Carried Out by Several Free Large Language Models. Current Genomics. 2026.

-Farquhar S, Kossen J, Kuhn L, Gal Y. Detecting hallucinations in large language models using semantic entropy. Nature. 2024 Jun;630(8017):625-630.

-Cheng A, Nagesh V, Eller S, Grant V, Lin Y. Exploring AI Hallucinations of ChatGPT: Reference Accuracy and Citation Relevance of ChatGPT Models and Training Conditions. Simul Healthc. 2025 Dec 1;20(6):413-418.

-Cossio M. A comprehensive taxonomy of hallucinations in Large Language Models. arXiv preprint, 2025.

-Topaz M, Roguin N, Gupta P, Zhang Z, Peltonen LM. Fabricated citations: an audit across 2· 5 million biomedical papers. The Lancet. 2026 May ;407(10541):1779-81.

-Kim Y, Jeong H, Chen S, Li SS, Lu M, Alhamoud K, Mun J, Grau C, Jung M, Gameiro RR, Fan L. Medical Hallucination in Foundation Models and Their Impact on Healthcare. Preprint, medRxiv. 2025.

-Bedi S, Liu Y, Orr-Ewing L, Dash D, Koyejo S, Callahan A, Fries JA, Wornow M, Swaminathan A, Lehmann LS, Hong HJ. Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review. JAMA. 2025;333(4):319-328.

-Holst D, Moenck K, Koch J, Schmedemann O, Schüppstuhl T. Transparent reporting of AI in systematic literature reviews: Development of the PRISMA-trAIce checklist. JMIR AI. 2025 Dec 10;4:e80247.

-Lobentanzer S, Feng S, Bruderer N, Maier A; BioChatter Consortium; Wang C, Baumbach J, Abreu-Vicente J, Krehl N, Ma Q, Lemberger T, Saez-Rodriguez J. A platform for the biomedical application of large language models. Nat Biotechnol. 2025 Feb;43(2):166-169.

-Longpre S, Mahari R, Chen A, Obeng-Marnu N, Sileo D, Brannon W, Muennighoff N, Khazam N, Kabbara J, Perisetla K, Wu X. A large-scale audit of dataset licensing and attribution in AI. Nature Machine Intelligence. 2024 Aug;6(8):975-87.

Generative Artificial Intelligence as Research Tool, Original Articles

-Lehr SA, Caliskan A, Liyanage S, Banaji MR. ChatGPT as Research Scientist: Probing GPT’s capabilities as a Research Librarian, Research Ethicist, Data Generator, and Data Predictor. Proceedings of the National Academy of Sciences. 2024 Aug 27;121(35):e2404328121.

-Vaccaro M, Almaatouq A, Malone T. When combinations of humans and AI are useful: A systematic review and meta-analysis. Nat Hum Behav. 2024 Dec;8(12):2293-2303.

-Hao Q, Xu F, Li Y, Evans J. Artificial intelligence tools expand scientists’ impact but contract science’s focus. Nature. 2026 Jan 14:1-7.

-Swanson K, Wu W, Bulaong NL, Pak JE, Zou J. The Virtual Lab of AI agents designs new SARS-CoV-2 nanobodies. Nature. 2025 Jul 29:1-3.

ottweis J, Weng WH, Daryin A, Tu T, Sirkovic P, Myaskovsky A, Glowaty G, Weissenberger F, Orlandi A, Popovici D, Palepu A. Accelerating scientific discovery with Co-Scientist. Nature. 2026 May 19:1-3.

-Aygün E, Belyaeva A, Comanici G, Coram M, Cui H, Garrison J, Johnston R, Kast A, McLean CY, Norgaard P, Shamsi Z. An AI system to help scientists write expert-level empirical software. Nature. 2026 May 19:1-3.

-Ghareeb AE, Chang B, Mitchener L, Yiu A, Szostkiewicz CJ, Shved D, Gyimesi GJ, Laurent JM, Wright SM, Razzak MT, White AD. A multi-agent system for automating scientific discovery. Nature. 2026 May 19:1-3.

-Lieberum JL, Töws M, Metzendorf MI, Heilmeyer F, Siemens W, Haverkamp C, Böhringer D, Meerpohl JJ, Eisele-Metzger A. Large language models for conducting systematic reviews: on the rise, but not yet ready for use-a scoping review. J Clin Epidemiol. 2025 Feb 26;181:111746.

Scherbakov D, Hubig N, Jansari V, Bakumenko A, Lenert LA. The emergence of large language models as tools in literature reviews: a large language model-assisted systematic review. Journal of the American Medical Informatics Association. 2025 Jun;32(6):1071-86.

-Flemyng E, Noel‐Storr A, Macura B, Gartlehner G, Thomas J, Meerpohl JJ, Jordan Z, Minx J, Eisele‐Metzger A, Hamel C, Jemioło P. Position statement on artificial intelligence (AI) use in evidence synthesis across Cochrane, the Campbell Collaboration, JBI and the Collaboration for Environmental Evidence 2025. Cochrane Database of Systematic Reviews. 2025(10).

-Au LS, Qu L, Nielsen J, Ge Z, Gurrin LC, Mol BW, Wang R. Using artificial intelligence to semi-automate trustworthiness assessment of randomized controlled trials: a case study. J Clin Epidemiol. 2025 Apr;180:111672

-Forero DA, Abreu SE, Tovar BE, Oermann MH. Automated analyses of risk of bias and critical appraisal of systematic reviews (ROBIS and AMSTAR 2): a comparison of the performance of 4 large language models. Journal of the American Medical Informatics Association. 2025 Sep;32(9):1471-6.

-Forero DA, Abreu SE, Tovar BE, Oermann MH. Large Language Models and the Analyses of Adherence to Reporting Guidelines in Systematic Reviews and Overviews of Reviews (PRISMA 2020 and PRIOR). Journal of Medical Systems. 2025 Jun 12;49(1):80.

-He Y, Bu Y. Academic journals’ AI policies fail to curb the surge in AI-assisted academic writing. Proceedings of the National Academy of Sciences. 2026 Mar 3;123(9):e2526734123.

-Erol G, Ergen A, Gülşen Erol B, Kaya Ergen Ş, Bora TS, Çölgeçen AD, Araz B, Şahin C, Bostancı G, Kılıç İ, Macit ZB. Can we trust academic AI detective? Accuracy and limitations of AI-output detectors. Acta neurochirurgica. 2025 Aug 7;167(1):214.

-Nejjar M, Zacharias L, Stiehle F, Weber I. LLMs for science: Usage for code generation and data analysis. Journal of Software: Evolution and Process. 2025 Jan;37(1):e2723.

-Dobler D, Binder H, Boulesteix AL, Igelmann JB, Köhler D, Mansmann U, Pauly M, Scherag A, Schmid M, Al Tawil A, Weber S. ChatGPT as a Tool for Biostatisticians: A Tutorial on Applications, Opportunities, and Limitations. Statistics in Medicine. 2025 Oct;44(23-24):e70263.

-Daniotti S, Wachs J, Feng X, Neffke F. Who is using AI to code? Global diffusion and impact of generative AI. Science. 2026 Jan 22:eadz9311.

-Huang Y, Wu R, He J, Xiang Y. Evaluating ChatGPT-4.0's data analytic proficiency in epidemiological studies: A comparative analysis with SAS, SPSS, and R. J Glob Health. 2024 Mar 29;14:04070.

-Ahn S. Data science through natural language with ChatGPT’s Code Interpreter. Translational and Clinical Pharmacology. 2024 May 29;32(2):73.

-Mohamed AM. A comparative evaluation of statistical product and service solutions (SPSS) and ChatGPT-4 in statistical analyses. Cureus. 2024 Oct 28;16(10).

Generative Artificial Intelligence as Research Tool, Narrative Reviews

-Messeri L, Crockett MJ. Artificial intelligence and illusions of understanding in scientific research. Nature. 2024 Mar;627(8002):49-58.

-Musslick S, Bartlett LK, Chandramouli SH, Dubova M, Gobet F, Griffiths TL, Hullman J, King RD, Kutz JN, Lucas CG, Mahesh S. Automating the practice of science: Opportunities, challenges, and implications. Proceedings of the National Academy of Sciences. 2025 Feb 4;122(5):e2401238121.

-Binz M, Alaniz S, Roskies A, Aczel B, Bergstrom CT, Allen C, Schad D, Wulff D, West JD, Zhang Q, Shiffrin RM. How should the advancement of large language models affect the practice of science?. Proceedings of the National Academy of Sciences. 2025 Feb 4;122(5):e2401227121.

-Lubiana T, Lopes R, Medeiros P, Silva JC, Goncalves ANA, Maracaja-Coutinho V, Nakaya HI. Ten quick tips for harnessing the power of ChatGPT in computational biology. PLoS Comput Biol. 2023 Aug 10;19(8):e1011319.

-Helmy M, Jin L, Alhossary A, Mansour T, Pellagrina D, Selvarajoo K. Ten simple rules for optimal and careful use of generative AI in science. PLOS Computational Biology. 2025 Oct 28;21(10):e1013588.

-Postill G, Sedlakova J, Tancredi S, Smit F, Ekongefeyin S, Baumer AM, Chiolero A, Bernard J, Rosella LC, von Wyl V. Integrating artificial intelligence tools in health research. npj Digital Medicine. 2026 May 16.

-Li B, Saini AK, Hernandez JG, Moore JH. Agentic AI and the rise of in silico team science in biomedical research. Nature Biotechnology. 2026 Feb 24:1-5.

-Bann D, Lowther E, Wright L, Kovalchuk Y. Why can’t epidemiology be automated (yet)?. International Journal of Epidemiology. 2026 Feb;55(1):dyaf210.

-Gartlehner G, Kahwati L, Nussbaumer-Streit B, Crotty K, Hilscher R, Kugley S, Viswanathan M, Thomas I, Konet A, Booth G, Chew R. From promise to practice: challenges and pitfalls in the evaluation of large language models for data extraction in evidence synthesis. BMJ Evid Based Med. 2024 Dec 20:bmjebm-2024-113199.

-Reynolds SA, Christie AP, Dicks LV, Jaffer S, Madhavapeddy A, Smith RK, Sutherland WJ. Will AI speed up literature reviews or derail them entirely?. Nature. 2025 Jul 10;643(8071):329-31.

-Hauser AS. The future of reviews: Will LLMs render them obsolete?. EMBO reports. 2025 Aug 26:1-5.

-Cleland J, Driessen E, Masters K, Lingard L, Maggio LA. When and how to disclose AI use in academic publishing: AMEE Guide No. 192. Medical Teacher. 2025 Dec 29:1-2.

-Masters K, Cleland J. When and how to disclose AI use in academic peer review. Medical Teacher. 2026 Jan 9:1-3.

-Naddaf M, Quill E. Hallucinated citations are polluting the scientific literature. What can be done?. Nature. 2026 Apr 1;652(8108):26-9.

-Moore JH, Tatonetti N. Vibe coding: a new paradigm for biomedical software development. BioData Mining. 2025 Jul 1;18:46.

-Chow M, Ng O. From technology adopters to creators: Leveraging AI-assisted vibe coding to transform clinical teaching and learning. Medical Teacher. 2025 Apr 9:1-3.

Page updated

Google Sites

Report abuse