Red Hen Edge2 Search Engine

Introduction

The Edge2 search engine is an experimental project designed to give you access to the metadata in the NewsScape dataset. While the Edge search engine gives you access to the transcripts (closed captioning, teletext, and speech-to-text), Edge2 will allow you to search for the different types of tags that have been added to the dataset, such as parts of speech in several languages, linguistic frames, gestures, named entities, and sentiment (see Current state of text tagging).

Contents

  • Introduction
  • How to search for gestures
  • How to search for frames
  • How to search for parts of speech
  • How to search for linguistic patterns
  • How to search for clauses where a field has anything (i.e. field exists)
  • How to search for clauses in the same sentence (i.e. with the same start/end times)

How to search for gestures

The Edge2 search engine is designed as a query builder. This software architecture allows you to construct your search incrementally, selecting which dimensions of the dataset you want to query.

1. In the opening screen of https://tvnews.sscnet.ucla.edu/edge2/, click on "Advanced Search":

2. In the Advanced Search interface, click on "add" next to "word/phrase":

3. Under "tag(s)", select "GES_02"

4. Under "field(s)", select a field that you would like to search in

5. Under "word(s)/phrases(s): click on "add". Then, in the text box that appears, enter a word or a phrase to search for:

6. Enter the word or phrase to search for:

7. Click on "Search"

8. Display the results

How to search for frames

Frame information is generated with FrameNet and Semafor. There are three types of frame information that can be searched.

  1. Frame name (e.g. "calendric_unit", "leadership", "entity"). Add prefix "[[framename=" and suffix "]]" to your search term, e.g. "[[framename=leadership]]".
  2. Frame element name (e.g. "theme", "unit", "locale"). Add prefix "[[frameelement=" and suffix "]]" to your search term, e.g. "[[frameelement=theme]]".
  3. Sementic role (e.g. "I", "you", "people"). Add prefix "[[semanticrole=" and suffix "]]" to your search term, e.g. "[[semanticrole=people]]".

Note: The prefixes and suffixes are subject to change in the future.

Example: Searching for the frame name "Experiencer".

Advanced search page at https://tvnews.sscnet.ucla.edu/edge2/search.php

Click on the "add" button for "word/phrase," and see

Search result without highlighting.

Search result with highlighting

How to search for parts of speech

Part of speech information is generated by MBSP and Stanford Parts of Speech Tagger. There are three types of frame information that can be searched.

  1. Part of speech (e.g. "nn", "dt", "in"). Add prefix "[[partofspeech=" and suffix "]]" to your search term, e.g. "[[framename=nn]]".
  2. Chunk (e.g. "o"). Add prefix "[[chunk=" and suffix "]]" to your search term, e.g. "[[chunk=o]]".
  3. Lemma (e.g. "be"). Add prefix "[[lemma=" and suffix "]]" to your search term, e.g. "[[lemma=be]]".

Note: The prefixes and suffixes are subject to change in the future.

First example: Searching for lemma "be". Note: The screenshot is outdated. Type "[[lemma=be]]" instead of "[[lemma be]]" in the workd(s)/phrase(s) input.

Advanced search page.

Search result without highlighting

Search result with highlighting

Second example: search for instances of the word “account” used as a verb.

1. Add a "regex" search criterion in advanced search.

2. Change "regex mode" to "multi".

3. Select the tags and field. In this case they are "POS_01" and "POS_01 Text" respectively.

4. Click on "add" next to "patterns".

5. Enter a pattern, for instance:

[[lemma=account]] & ([[partofspeech=vb]] | [[partofspeech=vbd]] | [[partofspeech=vbg]] | [[partofspeech=vbn]] | [[partofspeech=vbp]] | [[partofspeech=vbz]])

Basically it means matching those words with the lemma "account" (e.g. account, accounts, accounted) that has a verb part of speech (e.g. "vb" for base form, "vbd" for past tense).

6. Make sure that "regex mode" is "multi". This is very important because the default ("raw") takes a long time and won't give you meaningful result. The "regex" criterion will look like this -- note you can use POS_01 and POS_02 for English text:

7. If everything looks good, click on the "Search" button. If everything goes well, the result should look like this:

8. And if you want highlighting, you can click on "matched X time(s)".

This ends the instructions for searching for the word "account" as a verb.

How to search for linguistic patterns

In the Advanced Search interface, click on "add" next to "regex". Change "regex mode" to "multi". Click on "add" next to "pattern(s)". Enter a pattern without double-quotes.

Pattern

word








Meaning

exact word or word with wildcards

Example

surprise

Explanation

Matches the word "surprise".

Notes

Wildcard characters are supported: "?" stands for one character. "*" stands for zero or more characters. E.g. "an?" matches both "and" and "ant". "ask*" matches all of "ask", "asking", "asked", "askew", etc.

Pattern

/regex/


Meaning

any words that match the given regex

Example

/.*ization/

Explanation

Matches words ending in "ization", such as "kardashianization", "putinization", "goldwaterization".

Notes

Regex syntax. This can potentially be quite slow and return a lot of results, if the pattern is too broad.

Pattern

word1 word2 word3

Meaning

words as a phrase in the given order

Example

the kardashianization of

Explanation

Matches the phrase "the kardashianization of".

Notes

Separate each word in the phrase by space. Can specify a regex (with slashes at both ends) instead of a word. E.g. "the /.*ization/ of".

Pattern

word1 & word2

Meaning

overlap, or "is also a"

Example

[[lemma=surprise]] & [[partofspeech=vb]]

Explanation

Matches the word that has the lemma "surprise" and is also used as a verb (base form).

Notes

See the list of field names for a list of lemmas, parts of speech, etc.

Pattern

word1 | word2

Meaning

any of the words

Example

this | that

Explanation

Matches the word "this" or the word "that".

Notes

Parentheses are optional (but recommended) unless needed to limit the "any" effect. E.g. "one two | three four". matches the phrase "one two" or "three four", while "one (two | three) four" matches the phrase "one two four" or "one three four".

Pattern

word1 | word2 | ... | word N

Meaning

any of the words

Example

this | that | these | those

Explanation

Matches any of the four words.

Notes

Pattern

(word)?

Meaning

optional word

Example

this is (so)? cool

Explanation

Matches the phrase "this is so cool" or "this is cool".

Notes

Cannot be used by itself, i.e. "(so)?" is not a valid pattern, but "this is (so)? cool" is. There must be no space before the question mark. Otherwise, the pattern becomes one matching for a phrase ending with a question mark (which is treated as a separate token). Do not omit the parentheses or the pattern's meaning changes (see the "word" pattern).

Pattern

^(word)

Meaning

excluded word

Example

^(you know) what

Explanation

Matches "guest what" but not "you know what".

Notes

Parentheses are required after the caret.

Pattern

*

Meaning

placeholder for one word

Example

the * of *

Explanation

Matches "the" followed by one word followed by "of" followed by one word. E.g. "the group of people".

Notes

Matches "the" followed by one word followed by "of" followed by one word. E.g. "the group of people".

Pattern

*?

Meaning

optional placeholder for one word

Example

it is *? raining

Explanation

Matches "it is not raining", "it is so raining", or "it is raining"

Notes

Cannot be used by itself, i.e. "*?" is not a valid pattern by itself.

How to search for clauses where a field has anything (i.e. field exists)

1. In the Advanced Search interface, click on "add" next to "field exists".

2. Select the "NER_03" tag and the "NER_03 TIME" field.

3. Click on "Search".

4. Display the results (with highlighting).

5. Display the results (with highlighting).

How to search for clauses in the same sentence (i.e. with the same start/end times)

Example: Look for "trump" in the transcript/caption that is on the same sentence as a time reference.

1. In the Advanced Search interface, click on "add" next to "same start/end time".

2. Click on "add" next to "word/phrase".

3. Select "TEXT" tag and "TEXT Text" field.

4. Enter the search word/phrase (e.g. "trump"). Then, click on "field exists" in the "Same Start/End Time" clause.

5. Select the "NER_03" tag and the "NER_03 TIME" field.

6. Make sure the "NER_03" tag and the "NER_03 TIME" field are selected. Scroll down to the "Search" button at the bottom of the page.

7. Click on "Search".

8. Display the results.

9. Display the results (with highlighting).