Kyubum Lee (Ph.D.)
Postdoctoral Fellow
Moffitt Cancer Center
Tampa, Florida, USA

E-mail: KYUBUM.LEE [at] moffitt [dot] org
            KYUBUMLEE [at] gmail [dot] com

  • Expertise in machine learning, natural language processing (NLP), and biomedical informatics
  • Extensive experience and in-depth knowledge of deep learning, statistics, data/text analysis, visualization, data integration, and biomedical text data
  • Spearhead multiple projects collaborating with researchers in the computer science and biomedical fields
    First-author of 8 publications and co-author of 13 publications in peer-reviewed SCI journals 
    - Built 10 web-based services each with more than 1K users including LitVar, ChimerDB 3.0, HiPub, BEST, and BRONCO
  • Fluent in Python and vast experience with various Machine learning and BioNLP tools
            Multi-omics analysis for Kras-driven lung adenocarcinoma using deep learning approach 
           • Comprehensive ORAL cancer Explorer (CORALE) project
           • Tumor histopathology image analysis using deep learning methods
           • Cancer-related biomedical informatics
           • Cancer-related biomedical informatics projects
 Orchestrated a research project which involved using deep learning for biomedical literature triage – Collaboration with the European Bioinformatics Institute (UK) and the Swiss Institute of Bioinformatics (Switzerland): reduced the manual workload by nearly 70% using machine learning + NLP
• Develop a web-based machine learning platform that provides tools for classifying biomedical literature
• Conduct analyses of publications on precision health to identify human genes of translational value – Collaboration with the Centers for Disease Control and Prevention (CDC)
• Participated in designing the search engine LitVar which retrieves genomic variants in biomedical documents in PubMed and PMC 

• Orchestrated a research project on building a cancer mutation knowledge-base (VarDrugPub) which involved using deep learning for extracting gene-drug-disease-mutation relations from biomedical literature 
• Mentoring graduate/undergraduate students
• Built Biomedical Entity Relation Corpus (BRONCO) which contains gene-drug-disease-mutation relations found in biomedical texts
• Extracted gene-drug relations from biomedical texts using NLP tools for creating the Drug Signatures Database (DSigDB
• Worked jointly on biomedical projects with researchers in translational bioinformatics and cancer biology 

Links for Recent Projects

LitSuggest: A Web-based System for Literature Recommendation and Curation using Machine Learning (Co-First Author / Developed ML core and Data processing part.) [Link] [Publication in Nucleic Acids Research]

Literature Triage using Machine Learning (First Author) [GitHub] [Publication in PLOS Computational Biology]

VarDrugPub: Variant-Gene-Drug relations Database (First Author) [Link] [Publication in BMC Bioinformatics] 

LitVar: Search Engine for Genomic Variants in PubMed and PMC [Link] [Publication in Nucleic Acids Research]

HiPub: An application that translates biomedical texts to networks (First Author) [Link] [Publication in Bioinformatics] [AltMetric]

BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations (First Author) [Link] [Publication in Database]

ChimerDB 3.0 (Co-first author, Built a fusion gene DB using Text-mining - ChimerPub) [Link] [Wikipedia] [Publication in Nucleic Acids Research] [PDF]

BEST:  Biomedical Entity Search Tool (Actively Involved) [Link] [Publication in PLOS ONE]

BEReX:  Biomedical Entity-Relationship eXplorer (Actively Involved) [Link] [Publication in Bioinformatics]

 Drug SIGnatures DataBase (Involved in building text-mined drug-gene sets) [Link] [Publication in Bioinformatics]

Other Links:

Google Scholar

  • Recognition for Honorary Award (Recognition and Appreciation of Special Achievement): National Library of Medicine, National Institutes of Health, U.S. Department of Health and Human Services; Dec. 2018
  • Best Paper of the Year Award: Korea University; Feb. 2017



    • Alexis Allot†, Kyubum Lee†, Qingyu Chen†, Ling Luo, and Zhiyong Lu*: LitSuggest: A Web-based System for Literature Recommendation and Curation using Machine Learning. Nucleic Acids Research, 2021  ( These authors contributed equally) [Full Text] [Database Link]
    • Mengyu Xie, Kyubum Lee, John H. Lockhart, Scott D. Cukras, Rodrigo Carvajal, Amer A. Beg, Elsa R. Flores, Mingxiang Teng, Christine H. Chung, and Aik Choon Tan*: TIMEx: tumor-immune microenvironment de-convolution web-portal for bulk transcriptomics using pan-cancer scRNA-seq signatures. Bioinformatics, 2021 [Full Text] [Link]
    • Kyubum Lee, Chih-Hsuan Wei, and Zhiyong Lu*: Recent advances of automated methods for searching and extracting genomic variant information from biomedical literature. Briefings in Bioinformatics, 2020 [Full Text]

    • Qingyu Chen, Kyubum LeeShankai Yan, Sun Kim, and Zhiyong Lu*BioConceptVec: creating and evaluating literature-based biomedical concept embeddings on a large scale. PLOS Computational Biology, 2020. [Full Text]

    • Kyubum Lee, Mindy Clyne, Wei Yu, Zhiyong Lu*, and Muin Khoury*: Tracking human genes along the translational continuum. npj Genomic Medicine, 2019. [Full Text]

    • Kyubum LeeMaria Livia Famiglietti, Aoife McMahon, Chih-Hsuan Wei, Jacqueline Ann Langdon MacArthur, Sylvain Poux, Lionel Breuza, Alan Bridge, Fiona Cunningham, Ioannis Xenarios and Zhiyong Lu*Scaling up data curation using deep learning: An application to literature triage in genomic variation resources. PLOS Computational Biology, 2018. [Full Text] [Hot paper of the week at NIH Intramural Research News Letter]

    • Alexis Allot†, Yifan Peng†, Chih-Hsuan Wei, Kyubum Lee, Lon Phan, and Zhiyong Lu*LitVar: a semantic search engine for linking genomic variant data in PubMed and PMC. Nucleic Acids Research 2018. [Full Text] [Database Link]

    • Kyubum Lee, Byounggun Kim, Yonghwa Choi, Sunkyu Kim, Wonho Shin, Sunwon Lee, Sungjoon Park, Seongsoon Kim, Aik Choon Tan* and Jaewoo Kang*Deep learning of mutation-gene-drug relations from the literature. BMC Bioinformatics, 2018. DOI: 10.1186/s12859-018-2029-1 [Full Text] [Database Link]

    • Sangrak Lim, Kyubum Lee, Jaewoo Kang: Drug-drug interaction extraction from the literature using a recursive neural network. PLoS ONE, 2018. DOI: 10.1371/journal.pone.0190926 [Full Text]

    • Seongsoon Kim, Donghyeon Park, Yonghwa ChoiKyubum Lee, Byounggun Kim, Minji Jeon, Jihye Kim, Aik Choon Tan, Jaewoo Kang*A Pilot Study of Biomedical Text Comprehension using an Attention-Based Deep Neural Reader: Design and Experimental Analysis JMIR Medical Informatics 2017. DOI: 10.2196/medinform.8751 [Full Text]

    • Myunggyo Lee†, Kyubum Lee†, Namhee Yu†, Insu Jang†, Ikjung Choi, Pora Kim, Ye Eun Jang, Byounggun Kim, Sunkyu Kim, Byungwook Lee, Jaewoo Kang*, and Sanghyuk Lee*ChimerDB 3.0: an enhanced database for fusion genes from cancer transcriptome and literature data miningNucleic Acids Research 2017. DOI:10.1093/nar/gkw1083 ( These authors contributed equally) [Full Text] [Database Link]

    • Kyubum Lee, Wonho Shin, Byunggun Kim, Sunwon Lee, Yonghwa Choi, Sunkyu Kim, Minji Jeon, Aik Choon Tan* and Jaewoo Kang*HiPub: Translating PubMed and PMC Texts to Networks for Knowledge Discovery. Bioinformatics 08/2016; 32(18). DOI:10.1093/bioinformatics/btw511 [Full Text] [Link]

    • Kyubum Lee, Sunwon Lee, Sungjoon Park, Sunkyu Kim, Suhkyung Kim, Kwanghun Choi, Aik Choon Tan* and Jaewoo Kang*BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations. Database The Journal of Biological Databases and Curation 04/2016; 2016. DOI:10.1093/database/baw043 [Full Text]

    • Jocelyn Barbosa, Kyubum Lee, Sunwon Lee, Bilal Lodhi, Jae-Gu Cho, Woo-Keun Seo, Jaewoo Kang*Efficient quantitative assessment of facial paralysis using iris segmentation and active contour-based key points detection with hybrid classifier. BMC Medical Imaging 12/2016; 16(1). DOI:10.1186/s12880-016-0117-0 [Full Text]

    • Sunwon Lee†, Donghyeon Kim†, Kyubum Lee, Jaehoon Choi, Seongsoon Kim, Minji Jeon, Sangrak Lim, Donghee Choi, Sunkyu Kim, Aik-Choon Tan, Jaewoo Kang*BEST: Next-Generation Biomedical Entity Search Tool for Knowledge Discovery from Biomedical Literature. PLoS ONE 10/2016; 11(10). DOI:10.1371/journal.pone.0164680 († These authors contributed equally to the work.) [Full Text] [Link]

    • Minjae Yoo, Jimin Shin, Jihye Kim, Karen A Ryall, Kyubum Lee, Sunwon Lee, Minji Jeon, Jaewoo Kang, Aik Choon Tan*DSigDB: Drug Signatures Database for Gene Set Analysis. Bioinformatics 05/2015; 31(18). DOI:10.1093/bioinformatics/btv313 [Link] [Full Text]

    • Woo Keun Seo, Jaewoo Kang, Minji Jeon, Kyubum Lee, Sunwon Lee, Ji Hyun Kim, Kyungmi Oh, Seong Beom Koh: Feasibility of Using a Mobile Application for the Monitoring and Management of Stroke-Associated Risk Factors. Journal of Clinical Neurology 04/2015; 11(2). DOI:10.3988/jcn.2015.11.2.142 [Link]

    • Minji Jeon, Sunwon Lee, Kyubum Lee, Aik-Choon Tan, Jaewoo Kang*BEReX: Biomedical Entity-Relationship eXplorer. Bioinformatics 01/2014; 30(1). DOI:10.1093/bioinformatics/btt598 [Link] [Full Text]

    • Junkyu Lee, Seongsoon Kim, Sunwon Lee, Kyubum Lee, Jaewoo Kang*On the efficacy of per-relation basis performance evaluation for PPI extraction and a high-precision rule-based approach. BMC Medical Informatics and Decision Making 04/2013; 13(1). DOI:10.1186/1472-6947-13-S1-S7 [Link]

    • Jaehoon Choi, Donghyeon Kim, Seongsoon Kim, Sunwon Lee, Kyubum Lee, Jaewoo Kang*BOSS: Context-enhanced search for biomedical objects. BMC Medical Informatics and Decision Making 04/2012; 12 Suppl 1(Suppl 1). DOI:10.1186/1472-6947-12-S1-S7 [Link]

    • Hanjun Shin, Ki Hoon Kim, Chihwan Song, Injoon Lee, Kyubum Lee, Jaewoo Kang, Yoon Kyoo Kang*Electrodiagnosis support system for localizing neural injury in an upper limb. Journal of the American Medical Informatics Association 05/2010; 17(3). DOI:10.1136/jamia.2009.001594 [Link]

    • Sunwon Lee, Kyubum Lee, Jaewoo Kang*, Jaehoon Choi, Junho Oh:   Trends in Personalized Medicine Research. Communications of the Korean Institute of Information Scientists and Engineers, Vol.29, Issue 4, Pages:19-25, Apr 2011 [Written in Korean]


        Conferences / Meetings:


  • Chih-Hsuan Wei, Kyubum Lee, Robert Leaman, Zhiyong Lu: Biomedical Mention Disambiguation Using a Deep Learning Approach. The 10th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB 2019), Niagara Falls, NY; Sept 2019 [Link]
  • Donghyeon Kim†, Sunwon Lee†, Kyubum Lee, Jaehoon Choi, Seongsoon Kim, Minji Jeon, Sangrak Lim, Donghee Choi, Aik-Choon Tan, Jaewoo Kang*BEST: Next-Generation Biomedical Entity Search Tool for Knowledge Discovery from Biomedical LiteratureThe 5th Annual Translational Bioinformatics Conference (TBC 2015), Tokyo, Japan; Sept. 2015  († These authors contributed equally to the work.)
  • Kyubum Lee, Sunwon Lee, Minji Jeon, Jaehoon Choi, Jaewoo Kang*Drug-drug interaction analysis using heterogeneous biological information network. IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2012), Philadelphia, USA; Oct. 2012 [Link]
  • Junkyu Lee, Seongsoon Kim, Sunwon Lee, Kyubum Lee, Jaewoo Kang*High Precision Rule Based PPI Extraction and Per-Pair Basis Performance Evaluation. ACM sixth international workshop on Data and text mining in biomedical informatics (DTMBIO 2012), Maui, Hawaii, USA; Oct. 2012
  • Kyubum Lee, Sunwon Lee, Jaewoo Kang*SNP Grouping Method Based on PPI Network Information. The 37th Conference of the Korea Information Processing Society, Apr 2012 [Written in Korean]
  • Taewon Joh, Kyubum Lee, Jaewoo Kang*: Comparative analysis of Biomedical Databases and Text mining Technologies. The 34th Conference of the Korea Information Processing Society, Nov 2010 [Written in Korean]
  • Hojun Kim, Seongyeon Won, Seungwoo Gang, Kyubum Lee, Byounggun Kim, Sunkyu Kim, Jaewoo Kang*Research on Identifying Mutation-Drug Relationship in Biomedical Literature Using Biomedical Context based pre-trained word embedding (KIPS 2017 Spring), Jeju, Korea; April 2017 [Written in Korean]
    • John H Lockhart, Hayley D Ackerman, Kyubum Lee, Mahmoud Abdalah, Andrew Davis, Nicole Montey, Theresa Boyle, James Saller, Ayensur Keske, Kay Hänggi, Brian Ruffell, Olya Stringfield, Aik Choon Tan, Elsa R Flores: Automated tumor segmentation, grading, and analysis of tumor heterogeneity in preclinical models of lung adenocarcinoma. AACR Virtual Special Conference on Artificial Intelligence, Diagnosis, and Imaging; Jan. 13-14, 2021 [Abstract]
    • John H. Lockhart, Kyubum Lee, Hayley D. Ackerman, Mahmoud Abdulah, Andrew Davis, Nicole Montey, Theresa Boyle, James Saller, Aysenur Keske, Kay Hanggi, Olya Stringfield, Aik Choon Tan and Elsa R. Flores: Spatial genomics coupled with machine learning to identify p53-driven molecular signatures that are predictive of lung adenocarcinoma progression. AACR Virtual Special Conference on Tumor Heterogeneity: From Single Cells to Clinical Impact; September 17-18, 2020 [Abstract]
    • Kyubum Lee, Chih-Hsuan Wei, Livia Famiglietti, Sylvain Poux, Lionel Breuza, Alan Bridge, Ioannis Xenarios and Zhiyong Lu*: Scaling up data curation using deep learning: An application to literature triage in genomic variation resources. ISMB 2018, Chicago, USA; July 2018 
    • Kyubum Lee, Byounggun Kim, Yonghwa Choi, Sunkyu Kim, Wonho Shin, Sunwon Lee, Sungjoon Park, Seongsoon Kim, Aik Choon Tan and Jaewoo Kang*Deep learning of mutation-gene-drug relations from the literature for precision medicine. ISMB/ECCB 2017, Prague, Czech Republic; July 2017 (doi: 10.7490/f1000research.1114641.1) [Poster] [Link]


    • Applying Machine Learning to Biomedical Literature Mining: ‘Natural Language Processing & Health’ class, Population Health Sciences, Weill Cornell Medicine; Mar. 2021

    • Machine Learning for Literature Mining: ‘Introduction to Text Mining’ class, FAES, NIH; Mar. 2020
    • Scaling up data curation using machine learning: An application to literature triage in genomic variation resources. CBB Seminar, NCBI, NLM, NIH; Feb. 2019
    • Biomedical Literature Search, Mining and Applications: College of Medicine, Seoul National University; Nov. 2018 [Online talk]
    • Machine-assisted Variant CurationBiomedical Linked Annotation Hackathon 4 (BLAH4), Kashiwa, Japan; Jan. 2018 [Link]


    Korea University / Data Mining & Information Systems Lab (Advisor: Prof. Jaewoo Kang) - Seoul, Korea

        Ph.D. in Computer Science and Engineering (Data Mining and Machine Learning): September 2012 to February 2017

        M.S. in Computer Science and Bioinformatics (Bioinformatics): September 2010 to August 2012

        B.S. in Computer Science: September 2008 to August 2010

        B.S. in Life Science: March 2002 to August 2008 (On leave: 2003–2005, Military Service)