Current Positions

  • Assistant Professor at Oregon Health & Science University
  • Research Collaborator at Mayo Clinic
My past employers include Trapit, Mayo Clinic, Starkey Laboratories, Honeywell Research, and Medtronic.  For more details (and up-to-date info), check out my LinkedIn profile.

Research Interests

My current work on my NLM-funded R01 (at OHSU) is focused on information retrieval.  I also have an NIAID-funded R21 on aggregating patient information over time. There are some foundational interests that guide how I think about these problems:
  • Computational semantics.  I am interested in distributional/geometric models of semantics and in ontological semantics.  Each type of representation may provide complementary benefits in NLP applications.  I am also interested in compositionality and similarity in these semantic frameworks.
  • Clinical information extraction/retrieval.  Much of the important information in electronic medical records (EMR) is in free-text clinical notes rather than form fields.  With clinical NLP, this information may be accessed and used to improve patient care.  I am also interested in implementing the systems that make  this information available to clinicians and researchers, hence my interest in information retrieval.
Some of what I've been interested in is embodied in a workshop I organized: Computational Semantics in Clinical Text (CSCT) 2013.


    In Winter Term 2016, I taught CS 562/662 Natural Language Processing.  I wrote a blog post about my experiences, advocating for a new teaching framework called techied. I have since taught CS 555/655 Analyzing Sequences and a second installment of NLP, and made some minor refinements to the framework. Feel free to clone/use/fork my techied repository and build off of it!


    • cTAKES: Apache Clinical Text Analysis and Knowledge Extraction System.  Originally developed at Mayo Clinic, this is a customizable UIMA-based pipeline that takes clinical text as input and gives lots of different clinical IE/NLP output.  I'm a Project Management Committee member.
    • UIMA: Apache's (formerly IBM's) Unstructured Information Management Architecture.  Useful for making components somewhat plug-and-play, so that with relative ease you can swap out components for the latest and greatest, or add new components.
    • modelblocks: Contains a C++ template library for random variables, a speech decoder that takes semantic context into account, and various parsers such as the right-corner HHMM parser.  Formerly by the NLP group at the University of Minnesota; I was originally a developer.  The Sourceforge has been repurposed and a lot of the stuff I worked on is no longer there.
    • NLTK: Natural Language Toolkit.  Easy-to-use toolbox for natural language processing, written in Python.  Includes great introductory material for those new to NLP.  Good stuff, but I am not a developer.
    • scikit-learn: Machine learning in Python.


    University of Minnesota, Minneapolis, MN
    PhD, Computer Science (Natural Language Processing)
    University of Minnesota, Minneapolis, MN
    MS, Electrical Engineering (Digital and Wireless Communications)
    Duke University, Durham, NC
    BSE, Electrical Engineering and Biomedical Engineering

    Other links