SemGen help

This is the help document for SemGen, software for semantics-based composition and decomposition of biosimulation models.

We welcome feedback from the user community in order to improve SemGen's utility.

Please send feedback and bug reports to:

Maxwell Neal: maxneal[at]gmail[dot]com

SemGen origins

A number of years ago I got interested in creating integrated models of human physiology for use as medical decision support tools. At the time, I was new to the field of simulation and modeling. I quickly learned how difficult and time-consuming it was to integrate two models created by authors in different research labs. I was discouraged by the difficulties in obtaining code for published models, and when the code was not available, I was discouraged by the tedious and error-prone tasks associated with re-coding the model myself from its source publication. I wanted to be able to search for and download two models from a physiological model repository and have a computer automate their integration. Later, after I had amassed a collection of large models, I found that many of my modeling tasks required extracting out a subcomponent from a larger model. Therefore, many of my modeling tasks during this time, whether integrations or decompositions, would have been accelerated by a modular modeling framework. This is why I and my colleagues at the University of Washington created the SemSim model description format, which provides a modular framework to model composition and decomposition. Soon thereafter I created SemGen as part of my dissertation work because we needed a software tool to help automate the creation, annotation, composition and decomposition of SemSim models.

In a nutshell, SemSim is a semantics-based approach to modular modeling. If you're interested in the gory details of the SemSim approach, you can find them in my dissertation, and the publications of the Semantics of Biological Processes group. In my experience, modelers do not create models with the intention of integrating them with models developed by outside researchers. Furthermore, to do so requires an understanding of how the model will be repurposed once disseminated among the modeling community. As it is not feasible to create a model that anticipates these diverse purposes, it becomes difficult to specify how a model should interface with others in a broad sense. Therefore, to automate model integration, a computer needs to recognize how two models should interface with each other. For biological modeling, this task requires identifying where the models describe the same biology. In other words, the computer needs to know where the models are semantically equivalent, because those are the biologically-meaningful points of coupling between the models - i.e. the interface. The SemSim format provides the rich semantics needed to capture a model's biological contents in a machine-readable way, and SemGen provides the tools for adding these semantic annotations to SemSim models. Semantically enriched models offer more opportunities to automate model composition and decomposition, and SemGen was designed to take advantage of these opportunities.

The SemSim vision of modular modeling

Ultimately, the SemSim vision is to allow a modeler to download a model from any of the major declarative repositories (BioModels, the CellML model repository, and the physiome.org model repository) and convert them into interoperable SemSim models. Using SemGen, modelers will be able to automate, as much as possible, the modular composition and decomposition of SemSim models without the need for manual coding. Integrated and extracted SemSim models will in turn be added to a model repository for dissemination and reuse. Eventually there will be a number of encoding tools within SemGen that will translate SemSim models into a variety of simulation languages so that modelers can generate executable simulation code in their language of choice.

Differences between SemSim and other declarative modeling languages

Unlike the Systems Biology Markup Language (SBML), the SemSim approach to modularity and interoperability scales across biological levels of organization and research domains. Whereas SBML models carry an underlying assumption that they represent a set of chemical reactions, the SemSim framework is intended to be multi-scale and multi-domain. To realize this vision, SemSim leverages the wealth of semantic information contained in standardized reference ontologies, and together, these ontologies provide annotation concepts across modeling scales and domains. Thus, SemSim provides an annotation framework that is more expressive and explicit than other declarative modeling formats. Although the SBML standard allows for some semantic annotation against reference ontologies, it cannot currently capture the full biological meaning of model codewords in a machine-readable way. The same is true with the CellML modeling language.

SemSim also differs from the CellML approach to model sharing and modularity in that SemSim models do not dilneate their internal components into a single set of specific sub-components. Instead, by leveraging the rich semantic annotations in SemSim models, modelers can use SemGen's Extractor tool to decompose a SemSim model in a number of different ways. This enables modelers to "carve out" the exact parts of a model they want to extract, without being constrained by a pre-coordinated decomposition.

A work in progress

Both SemSim and SemGen are works in progress. I and my colleagues hope that the broader biosimulation community will find these technologies useful, and we look forward to improving SemGen to meet users' needs. We welcome constructive feedback from anyone interested in applying this semantics-based modular modeling approach. Thanks for your interest!

-Maxwell Neal


SemGen version 4.0 and higher

SemGen is an experimental software tool for automating the modular composition and decomposition of biosimulation models.

SemGen facilitates the construction of complex, integrated models, and the swift extraction of reusable sub-models from larger ones. SemGen relies on the semantically-rich SemSim model description format to help automate these modeling tasks.

With SemGen, users can:

  • Visualize models using D3 force-directed networks

  • Create SemSim versions of existing models and annotate them with rich semantic data

  • Automatically decompose models into interoperable sub-models

  • Semi-automatically merge models into more complex systems

  • Encode models in executable simulation formats

Prerequisites

SemGen is a Java-based program and requires Java Runtime Environment version 1.7 (64-bit) or higher to execute.

To check your Java version, go to a command prompt and enter:

java -version

Installing

Simply download the appropriate build for your operating system from the releases page.

Windows: Run the Windows installer. You will then be able to run SemGen from the location where you installed it by double-clicking the SemGen.exe file, or if using installation defaults, from the Windows Start menu.

Mac: Open the SemGen .dmg file, and drag SemGen.app to Applications folder. Double-click SemGen.app to start the program.

Linux: Unarchive the SemGen .tar.gz file. Double-click the "SemGen.jar" file in the main SemGen directory to start the program.

Running SemGen

Here is a primer on how to use SemGen to load, visualize, annotate, extract, and merge models.

In SemGen, the Project tab will be your main workspace:

  • Search: Hovering your cursor over the magnifying glass brings up the search bar. You can search for example models, or currently visualized nodes by typing in search terms. The search can be performed over the name, description, or the annotation.

  • Project Actions: The menu on the left side contains project-level actions. This menu can be collapsed/expanded by clicking the chevron on the left edge.

  • Stage Options: The menu on the right side contains visualization options, as well as additional information about the selected node. This menu can be collapsed/expanded by clicking the chevron on the right edge.

  • Selection/Navigation: The buttons in the top right corner toggles click-and-drag between moving the visualization, and selecting multiple nodes. Additionally, the mouse scroll wheel can be used to zoom in/out of the visualization.

Loading a model

To load a model, click the Open model button under Project Actions on the lefthand side. This will prompt you to select a model file to load (SemGen currently supports SemSim, CellML, SBML, JSim file formats):

Once you select a model, it will be loaded in SemGen and visualized as a model node:

Alternatively, SemGen comes with a library of example models. These can be accessed by using the search bar. Hover over the magnifying glass on the top left and type in terms to search for. Click the model name in the results to load the model:

Visualizing a model

Once a model is loaded in SemGen, there are several ways to visualize and explore the model.

Select the model you want to visualize by clicking the model node (selected node will have a yellow ring around it). Then click one of the visualizations from the Project Actions menu on the lefthand side.

An entire model or submodel can be moved by clicking and dragging the *hull* surrounding the group of nodes. You can also adjust the view by clicking and dragging the whitespace around the model or zooming in and out using the mouse wheel.

Submodels

The submodel visualization shows the hierarchical and/or compartmental organization of the model:

Each submodel node can be further expanded by double clicking it:

Dependencies

The dependency visualization shows the mathematical dependency network in the model:

Different node types can be hidden or shown in the Stage Option menu, which can be useful for visualizing large models:

PhysioMap

PhysioMap displays the physiological processes and their participants (sources, sinks, and mediators) based on the semantics of the biological processes and entities:

Annotator

Comprehensive Annotator Tutorial

With the Annotator tool, you can convert mathematical models into the SemSim format and annotate the model's codewords using concepts from online reference ontologies. Currently the Annotator can convert MML, SBML, and CellML models into the SemSim format. The Semantics of Biological Processes group maintains a protocol for annotating a model which can help guide the annotation process.

To annotate a model, click Annotate button under Project Actions. This will create a new Annotation tab:

Composite annotations

Each composite annotation consists of a physical property term connected to a physical entity or physical process term. The physical entity term can itself also be a composite of ontology terms. We recommend using only terms from the Ontology of Physics for Biology (OPB) for the physical property annotation components. For the physical entity annotations we recommend using robust, thorough, and widely accepted online reference ontologies like the Foundational Model of Anatomy (FMA), Chemical Entities of Biological Interest (ChEBI), and Gene Ontology cellular components (GO-cc). For physical processes annotations, we recommend creating custom terms and defining them by identifying their thermodynamic sources, sinks and mediators from the physical entities in the model.

When you edit a composite annotation for a model codeword, the Annotator provides an interface for rapid searching and retrieval of reference ontology concepts via the BioPortal web service.

Example: Suppose you are annotating a beta cell glycolysis model that includes a codeword representing glucose concentration in the cytosol of the cell.

A detailed composite annotation would be:

OPB:Chemical concentration <propertyOf> CHEBI:glucose <part_of> FMA:Portion of cytosol <part_of> FMA:Beta cell

In this case we use the term Chemical concentration from the OPB for the physical property part of the annotation, and we compose the physical entity part by linking four concepts - one from the OPB, one from ChEBI and two from the FMA. This example illustrates the post-coordinated nature of the SemSim approach to annotation and how it provides high expressivity for annotating model terms.

The above example represents a very detailed composite annotation, however, such detail may not be necessary to disambiguate concepts in a given model. For example, there may not be any other portions of glucose within the model apart from that in the cytosol. In this case, one could use the first three terms in the composite annotation and still disambiguate the model codeword from the rest of the model's contents:

OPB:Chemical concentration <propertyOf> CHEBI:glucose

Although this annotation approach does not fully capture the biophysical meaning of the model codeword, SemGen is more likely to find semantic overlap between models if they use this shallower annotation style. This is mainly because the SemGen Merger tool currently only recognizes semantic equivalencies; it does not identify semantically similar terms in models that a user wants to integrate. Therefore, if a user wants to integrate our example glycolysis model with a TCA cycle model based on cardiac myocyte metabolism, the shallower approach would likely identify more semantic equivalencies than the more detailed approach.

Nonetheless, we recommend using the more detailed approach, given that future versions of SemGen will include a "Merging Wizard" that will identify and rank codewords that are semantically similar, not just semantically identical.

Extractor

The Extractor tool provides ways to decompose SemSim models into sub-models. This decomposition process is useful if you want to "carve out" a smaller portion of a given model in order to remove extraneous model features.

Below is an step-by-step example of an extraction:

1. Load a model and select one or more nodes you would like to extract by left-clicking. Multiple nodes can be selected by control+click (command+click on Mac), or by toggling selection in the top right corner.

2. Right-click one of the selected nodes and click Extract Selected. In case you want to extract the majority of the model, it may be more convenient to select the nodes you do not wish to save in the extraction, and click Extract Unselected.

3. Enter a new name for the extracted nodes, and the newly extracted nodes will appear in SemGen.

4. Extraction can also be performed on submodel and PhysioMap nodes.

Merger

The Merger tool helps automate the integration of two SemSim models. The Merger identifies the interface between two models by comparing the biological meaning of the models' codewords as expressed by their composite and singular annotations. If the two models share the same biological concept, the codewords representing this concept are mapped to each other and the user must decide which computational representation of the concept they want to preserve in the integrated model.

Below is an step-by-step example of a merge between a cardiovascular dynamics and a baroreceptor model:

1. Load two models you would like to merge in SemGen.

2. Drag-and-drop one of the models on top of the other. SemGen will automatically find semantic overlaps between the two models.

3. Click the panel to see more information about the codewords. Click the Preview button to visualize what the local dependency network would look like for the merged model using each computational representation.

4. Indicate which computational representation of the concept you wish to preserve in the merged model.

5. If SemGen did not include all of the merge points, you can manually add mappings by selecting individual codewords from the bottom panels and clicking the Add manual mapping button.

Manual mappings can also be added visually by clicking Visualize and dragging-and-dropping a node onto another (blue link indicates exact semantic match, and yellow link indicates manual mapping).

6. Once you have indicated all of the merge resolution points, click Resolve Merge Conflicts. Resolve any duplicate code name or unit conversion conflicts.

7. When finished, click Merge to save the new merged model!

Authors

Dr. Maxwell Neal originally developed the SemGen software as part of his dissertation research.

Currently, Dr. Neal leads a team of developers to further augment, test and evaluate SemGen under an R01 grant from the National Library of Medicine (PIs: John Gennari and Brian Carlson) that aims to accelerate model-driven research.

Contributors to SemGen development include Christopher Thompson, Graham Kim and Ryan James.

SemGen development is currently supported by a grant from the National Library of Medicine and through the Virtual Physiological Rat project.

SemGen version 3.0

SemGen 3.0 help can be found here.