Are you:
• A researcher that periodically reads and analyzes biomedical articles to extract molecular interactions described in the literature?
• A researcher that has text documents (e.g. reports, documents, published articles, notes, etc.) from which you need to get insights specific to your research topic?
• A researcher, biologist or computer scientist dedicated to providing the research community with accurate, current, accessible information concerning the genetics, genomics and biology of a particular organism?
• A student that needs training data/labeled data for your new AI algorithm?
• A researcher interested to create an annotated corpus?
What can an annotator do with TeamTat?
A unique session ID will be assigned to you, and you will be able to build an annotation task.
The unique link allows you to have your own workspace, where you can:
Saving this unique link is useful, so that you may come back to this workspace, and continue the work over a period of time. Your data will be available for you to download anytime, and if a profile has been inactive for a certain period of time, the profile and all data associated with it will be erased.
Clicking on the Project tab, brings you to the list of Projects. You can try the Sample projects that we have prepared for you, or start a new one.
Let us create a new project:
TeamTat allows a lot of flexibility in retrieving documents:
The first step in setting up a project is to Add Documents to it. For this project we will select randomly two PMC articles in Open Access dataset, and add their IDs in.
Now the project will list the two ids:
Clicking on any of the documents brings us to the document annotation editor page:
TeamTat works seamlessly with BioC format, and recognizes the article structure, which is displayed in the Outline, the left hand-side of the screen. The Table of contents allows the annotator to browse different sections of the article. Also a built-in memory feature allows the annotator to return back and the article loads automatically to the last paragraph of the article being reviewed.
The middle of the screen shows the article content. Title, and metadata are listed first, as well as an automatic recognition of the PMID and PMCID, which allows easy access to the article PubMed page, and PubMed Central page.
The right-hand side once populated will list the annotated entities and relations.
The second step in setting up a project is to define the annotation task.
For example, for this project, we will annotate Genes, and Diseases.
We give a name, and select a highlight color as shown below:
And define a gene-disease relation:
The third step in setting up a project is to assign annotators to it. For the purpose of this test project, we will create two new workspaces: one for annotator “john” and one for “alice”. We will send both john and alice their unique URLs so that they may annotate Genes and Diseases in this document set.
For this purpose, we will manually assign both documents to both annotators. For big projects of many documents and many annotators the tool offers a random assignment option, or an upload assignment matrix option. Both options aim to balance the workload amongst annotators so that each document is annotated by at least X number of annotators (where X is the input parameter).
After completing the project definition you can start annotation.
This will create two copies of the documents, which will be placed in John’s workspace, and Alice’s workspace. They can see the documents and start annotating.
Click on a document in a project on your workspace.
Look at the annotation types, and relation types. If you find a text string that corresponds to that entity then highlight it using your mouse. A pop-up window will open as below.
Here, John, opened the first document in his project, and found the term “Mik1” which corresponds to a gene in the organism S. pombe. John consults NCBI Gene database and adds the corresponding GENE ID to the Concept ID box. John can choose to annotate all occurrences of this term in the whole document and link them to the same GENE ID.
For the other document John annotates the gene BRCA1, and the disease “ovarian carcinoma”
Meanwhile alice is also working on her workspace, and does her annotations. When both are done reading the documents they mark them as complete, and the project manager can see the result of the annotation round.
How to keep track of annotation progress and annotator disagreements?
In this figure, we see that the gene “Chk1” has a grey underline, “Wee1” has a black underline, and “Cdc25” has no underline. The project manager sees that annotators agree on the annotation of “Chk1”, they disagree on the Concept ID for gene “Wee1” and only one annotator has annotated “Cdc25”.
On the other document we can annotate a relation:
The project manager can end the round, review inter-annotator agreement, and decide whether they want to improve annotation quality or finalize the corpus.
Data is always available for download at any time.
Let us try the Sample Project 1.
This project contains three documents, two PubMed abstracts, and one PubMed Central full text article. This project is rich with annotations, because it has defined these annotation types: