Current Positions

  • Associate Professor at UTHealth (University of Texas Health Science Center at Houston)
My past employers include Addis Ababa University, OHSU, Trapit, Mayo Clinic, Starkey Laboratories, Honeywell Research, and Medtronic.  For more details (and up-to-date info), check out my LinkedIn profile.

Research Interests

I secured and led an NLM-funded R01 (at OHSU) focused on information retrieval.  I also had an NIAID-funded R21 (with Mayo Clinic) on aggregating patient information over time. There are some foundational interests that guide how I think about these problems:
  • Computational semantics.  I am interested in distributional/geometric models of semantics and in ontological semantics.  Each type of representation may provide complementary benefits in NLP applications.  I am also interested in compositionality and similarity in these semantic frameworks.
  • Clinical information extraction/retrieval.  Much of the important information in electronic medical records (EMR) is in free-text clinical notes rather than form fields.  With clinical NLP, this information may be accessed and used to improve patient care.  I am also interested in implementing the systems that make  this information available to clinicians and researchers, hence my interest in information retrieval.
Some of what I've been interested in is embodied in a workshop I organized: Computational Semantics in Clinical Text (CSCT) 2013.

Beginning from my time in Ethiopia (2017-2018), I began to work on deep learning NLP for Semitic languages.


    In Winter Term 2016, I taught CS 562/662 Natural Language Processing at OHSU.  I wrote a blog post about my experiences, advocating for a new teaching framework called techied

    In Spring 2018, I taught ITSC-5C22 Social Network Analysis at Addis Ababa University, and wrote another blog post about this very cross-cultural teaching experience.

    I have also taught CS 555/655 Analyzing Sequences at OHSU.

    Feel free to clone/use/fork my techied repository and build off of it!


    • cTAKES: Apache Clinical Text Analysis and Knowledge Extraction System.  Originally developed at Mayo Clinic, this is a customizable UIMA-based pipeline that takes clinical text as input and gives lots of different clinical IE/NLP output.  I'm a Project Management Committee member.
    • UIMA: Apache's (formerly IBM's) Unstructured Information Management Architecture.  Useful for making components somewhat plug-and-play, so that with relative ease you can swap out components for the latest and greatest, or add new components.
    • NLTK: Natural Language Toolkit.  Easy-to-use toolbox for natural language processing, written in Python.  Includes great introductory material for those new to NLP.  Good stuff, but I am not a developer.
    • keras: The deep learning toolkit that I typically use.


    University of Minnesota, Minneapolis, MN
    PhD, Computer Science (Natural Language Processing)
    University of Minnesota, Minneapolis, MN
    MS, Electrical Engineering (Digital and Wireless Communications)
    Duke University, Durham, NC
    BSE, Electrical Engineering and Biomedical Engineering

    Other links