Red Hen Edge2 Search Engine


The Edge2 search engine is an experimental project designed to give you access to the metadata in the NewsScape dataset. While the Edge search engine gives you access to the transcripts (closed captioning, teletext, and speech-to-text), Edge2 will allow you to search for the different types of tags that have been added to the dataset, such as parts of speech in several languages, linguistic frames, gestures, named entities, and sentiment (see Current state of text tagging).


  • Introduction
  • How to search for gestures
  • How to search for frames
  • How to search for parts of speech
  • How to search for linguistic patterns
  • How to search for clauses where a field has anything (i.e. field exists)
  • How to search for clauses in the same sentence (i.e. with the same start/end times)

How to search for gestures

The Edge2 search engine is designed as a query builder. This software architecture allows you to construct your search incrementally, selecting which dimensions of the dataset you want to query.

1. In the opening screen of, click on "Advanced Search":

2. In the Advanced Search interface, click on "add" next to "word/phrase":

3. Under "tag(s)", select "GES_02"

4. Under "field(s)", select a field that you would like to search in

5. Under "word(s)/phrases(s): click on "add". Then, in the text box that appears, enter a word or a phrase to search for:

6. Enter the word or phrase to search for:

7. Click on "Search"

8. Display the results

How to search for frames

Frame information is generated with FrameNet and Semafor. There are three types of frame information that can be searched.

  1. Frame name (e.g. "calendric_unit", "leadership", "entity"). Add prefix "[[framename=" and suffix "]]" to your search term, e.g. "[[framename=leadership]]".
  2. Frame element name (e.g. "theme", "unit", "locale"). Add prefix "[[frameelement=" and suffix "]]" to your search term, e.g. "[[frameelement=theme]]".
  3. Sementic role (e.g. "I", "you", "people"). Add prefix "[[semanticrole=" and suffix "]]" to your search term, e.g. "[[semanticrole=people]]".

Note: The prefixes and suffixes are subject to change in the future.

Example: Searching for the frame name "Experiencer".

Advanced search page at

Click on the "add" button for "word/phrase," and see

Search result without highlighting.

Search result with highlighting

How to search for parts of speech

Part of speech information is generated by MBSP and Stanford Parts of Speech Tagger. There are three types of frame information that can be searched.

  1. Part of speech (e.g. "nn", "dt", "in"). Add prefix "[[partofspeech=" and suffix "]]" to your search term, e.g. "[[framename=nn]]".
  2. Chunk (e.g. "o"). Add prefix "[[chunk=" and suffix "]]" to your search term, e.g. "[[chunk=o]]".
  3. Lemma (e.g. "be"). Add prefix "[[lemma=" and suffix "]]" to your search term, e.g. "[[lemma=be]]".

Note: The prefixes and suffixes are subject to change in the future.

First example: Searching for lemma "be". Note: The screenshot is outdated. Type "[[lemma=be]]" instead of "[[lemma be]]" in the workd(s)/phrase(s) input.

Advanced search page.

Search result without highlighting

Search result with highlighting

Second example: search for instances of the word “account” used as a verb.

1. Add a "regex" search criterion in advanced search.

2. Change "regex mode" to "multi".

3. Select the tags and field. In this case they are "POS_01" and "POS_01 Text" respectively.

4. Click on "add" next to "patterns".

5. Enter a pattern, for instance:

[[lemma=account]] & ([[partofspeech=vb]] | [[partofspeech=vbd]] | [[partofspeech=vbg]] | [[partofspeech=vbn]] | [[partofspeech=vbp]] | [[partofspeech=vbz]])

Basically it means matching those words with the lemma "account" (e.g. account, accounts, accounted) that has a verb part of speech (e.g. "vb" for base form, "vbd" for past tense).

6. Make sure that "regex mode" is "multi". This is very important because the default ("raw") takes a long time and won't give you meaningful result. The "regex" criterion will look like this -- note you can use POS_01 and POS_02 for English text:

7. If everything looks good, click on the "Search" button. If everything goes well, the result should look like this:

8. And if you want highlighting, you can click on "matched X time(s)".

This ends the instructions for searching for the word "account" as a verb.

How to search for linguistic patterns

In the Advanced Search interface, click on "add" next to "regex". Change "regex mode" to "multi". Click on "add" next to "pattern(s)". Enter a pattern without double-quotes.




exact word or word with wildcards




Matches the word "surprise".


Wildcard characters are supported: "?" stands for one character. "*" stands for zero or more characters. E.g. "an?" matches both "and" and "ant". "ask*" matches all of "ask", "asking", "asked", "askew", etc.




any words that match the given regex




Matches words ending in "ization", such as "kardashianization", "putinization", "goldwaterization".


Regex syntax. This can potentially be quite slow and return a lot of results, if the pattern is too broad.


word1 word2 word3


words as a phrase in the given order


the kardashianization of


Matches the phrase "the kardashianization of".


Separate each word in the phrase by space. Can specify a regex (with slashes at both ends) instead of a word. E.g. "the /.*ization/ of".


word1 & word2


overlap, or "is also a"


[[lemma=surprise]] & [[partofspeech=vb]]


Matches the word that has the lemma "surprise" and is also used as a verb (base form).


See the list of field names for a list of lemmas, parts of speech, etc.


word1 | word2


any of the words


this | that


Matches the word "this" or the word "that".


Parentheses are optional (but recommended) unless needed to limit the "any" effect. E.g. "one two | three four". matches the phrase "one two" or "three four", while "one (two | three) four" matches the phrase "one two four" or "one three four".


word1 | word2 | ... | word N


any of the words


this | that | these | those


Matches any of the four words.





optional word


this is (so)? cool


Matches the phrase "this is so cool" or "this is cool".


Cannot be used by itself, i.e. "(so)?" is not a valid pattern, but "this is (so)? cool" is. There must be no space before the question mark. Otherwise, the pattern becomes one matching for a phrase ending with a question mark (which is treated as a separate token). Do not omit the parentheses or the pattern's meaning changes (see the "word" pattern).




excluded word


^(you know) what


Matches "guest what" but not "you know what".


Parentheses are required after the caret.




placeholder for one word


the * of *


Matches "the" followed by one word followed by "of" followed by one word. E.g. "the group of people".


Matches "the" followed by one word followed by "of" followed by one word. E.g. "the group of people".




optional placeholder for one word


it is *? raining


Matches "it is not raining", "it is so raining", or "it is raining"


Cannot be used by itself, i.e. "*?" is not a valid pattern by itself.

How to search for clauses where a field has anything (i.e. field exists)

1. In the Advanced Search interface, click on "add" next to "field exists".

2. Select the "NER_03" tag and the "NER_03 TIME" field.

3. Click on "Search".

4. Display the results (with highlighting).

5. Display the results (with highlighting).

How to search for clauses in the same sentence (i.e. with the same start/end times)

Example: Look for "trump" in the transcript/caption that is on the same sentence as a time reference.

1. In the Advanced Search interface, click on "add" next to "same start/end time".

2. Click on "add" next to "word/phrase".

3. Select "TEXT" tag and "TEXT Text" field.

4. Enter the search word/phrase (e.g. "trump"). Then, click on "field exists" in the "Same Start/End Time" clause.

5. Select the "NER_03" tag and the "NER_03 TIME" field.

6. Make sure the "NER_03" tag and the "NER_03 TIME" field are selected. Scroll down to the "Search" button at the bottom of the page.

7. Click on "Search".

8. Display the results.

9. Display the results (with highlighting).