https://tvnews.sscnet.ucla.edu/edge/
Introduction
Front Page- Basic Search
Advanced Search
Display Format
Search Boxes
Search
Export
Regular Expression Mode
Browse
Regular Expression
Search Results
Video
Text
Montage
Image Flow
Metadata
Permalink
Bookmark
Exporting
Job List
Various sites capture TV News from around the world. Closed captioning texts in English and other languages are digitalized, captured, and put in this search engine. The search engine can potentially include the text of on-screen text boxes (harvested through optical character recognition), the text of transcripts, and other aspects of the broadcast.
A UCLA account is required to access this archive. (For researchers in the Distributed Little Red Hen Lab, one of the co-directors must approve your acquisition of a UCLA account and monitor your research.) Upon logging in, you will be taken to the main page. The basic search features can be accessed here. The search will show 10 programs/results per page.
You can also access Advanced Search and Browse from this page.
The advanced search screen lets you control display format, specify networks, series, and date ranges through menus.
There are three different display formats available: list, table, and chart.
The search button will lead you to the results page.
The export button will lead you to the Job List page. This will allow one to export searches which you will be able to open in another program on your computer. You can continue searching while the export job is completing.
Searches are exported to csv (comma-separated values) files. They can be read by spreadsheet programs such as Excel, Numbers, and OpenOffice. If prompted, select line 6 to start the rows, the UTF-8 character set, comma as the field delimiter, and double quote as the text delimiter.
In the spreadsheet application, you can convert the URLs in the csv file into hyperlinks:
Numbers: triple-click on the link and press Control-K
Excel: see instructions for bulk conversion of URLs to hyperlinks
Click a hyperlink to call up that search result.
Test: opens a new tab to the test page, which allows you to test a regex pattern.
The browse function will take you to the image flow set to today's date.
You may test a regex pattern through the test link found in the Advanced Search page.
A regular expression pattern can contain subpatterns separated by space. Each subpattern matches a consecutive word, and the pattern as a whole matches a phrase.
A regular expression takes the form of /subpattern1 subpattern2 ... subpatternN/
Each subpattern can contain the following:
String
.
[a-z]
|
?
{m,n}
Matches
any character (within a word)
a single character from a to z
any of the elements
the preceding element occurring at most once
the preceding element occurring at least m times and at most n times
Example
(he|she|it) matches any of the words "he", "she" or "it"
(en)?large matches the words "large" or "enlarge"
grea(2,4)t matches the words "greaat", "greaaat" or "greaaaat"
The regular expression search use the BRICS automaton package.
Currently, please use lower case to enter a pattern.
Placeholders can be used as a whole subpattern, and in the place of a single or multiple words.
Placeholder
*
*+
*?
*{n}
*{m,}
*{,n}
*{m,n}
Matches
0 or more words
1 or more words
0 or 1 words
exactly n words
m or more words
up to n words
at least m words and up to n words
Please note that a placeholder must be used between non-placeholders. For example, the pattern /*+/
does not match anything. Currently, leading and trailing placeholders are discarded. For example, /*+ of the *+/
behaves the same as /of the/
.
More Examples
Pattern
/[A-Za-z]{10}/
/[0-9A-Za-z]{10}/
/[A-Za-z]{10,12} /
/[A-Za-z]{10,12}(,|.)/
/[A-Za-z-]{10,12}S/
/[A-Za-z-]{12,}(!|?|.|,|:|;)/
/G[A-Za-z-]{10,12} /
/ G[A-Za-z-]{10,12} /
/ A G[A-Za-z-]{10,12} /
/ A G[A-Za-z-]{10,} /
/IS NOW [A-Za-z-]{2,}ING /
/* is the * of */
/has .*ed now/
/vote for (obama|romney)/
Matches
words at least ten letters long
words 10 alphanumeric characters long
10 to 12 letters followed by a space
10-12 followed by a space, comma, or period
including hyphens, and end in S
12+ and then space or punctuation
G followed by 10-12 letters and then a space
same, but G is the first letter
same, preceded by the indefinite article "A"
same, but no maximum number of letters
"is now *ing"
"is the", followed by exactly one word, followed by "of"
"has", followed by a word that ends with "ed", followed by "now"
"vote for", followed by either "obama" or "romney"
There are two times listed for each program. The first one is the local time when it was broadcasted. Although most of the programs are broadcasted in California, there are some that come internationally. Thus, the second time is in the form of UTC.
Clicking on the video link or on the thumbnail will play the video in the player on the right. You can skip forward or backward by using the buttons in the video player.
Note: On a given search results page, when you click on a thumbnail to cue the video player you may notice a discrepancy between the thumbnail image and the frame that appears in the video player. There is a 10-second difference between the two. This is due to the closed captioning text, which always lags in timecode behind the actual video. This is not an error in the DCL's timecode structure. If you immediately click on the "skip ahead 10" link in the video player after loading the clip, you will see identical images.
This will take you to the page with the transcription of the video.
Clicking on the montage link will lead you to thumbnails taken every 10 seconds of the show. Clicking on a thumbnail will start the video at that specific time of the show.
Metadata is where you can find the closed captioning along with the corresponding time stamp. Clicking on the time will play the video at that specific moment in the show. You may also bookmark the video.
The permalink will be able to take you to the page for linking the video. This bookmarks the video and will begin from the beginning of the show.
The bookmark will bring you to a page with the video which will start at a specific timecode of the show.
To bookmark a video: Simply right click (or CTRL + click if on a mac) the link "permalink" and select "Bookmark This Link." Type in a reference for the bookmark in the "name" field, then select "ok." The reference will now appear under the bookmarks menu. If using a browser other than Firefox, note that the particular steps for accessing the bookmark may differ slightly. Refer to your browser's "help" section if necessary. This bookmark will cue the video at the beginning of the clip. If you desire a bookmark for a specific time in the video, use the second option.
Bookmark a video at specific timecode: After playing the video and noting the desired timecode, click on the paper icon located at the end of each caption preview. This will open a page containing only that particular video clip. In the URL field at the top of your browser, change the last set of numbers to your desired timeocde. Note, you must convert the timecode into seconds and the number must be in ten second increments. Click enter to load the page for that specific timecode. Then select Bookmark from the browser menu bar. Select Bookmark this page. Fill in a reference in the "name" field.
Example:
Noted timecode is 15 minutes and 23 seconds into a given clip (923 total seconds).
After clicking on the paper icon, the following appears in the URI field: "http://dcl.sscnet.ucla.edu/search/video,20279,170".
Change the "170" to 920 since the timecode must be converted to seconds and be in ten second increments. The URI would now read: http://dcl.sscnet.ucla.edu/search/video,20279,920
Select bookmark from the browser menu bar.
Select Bookmark this page.
Fill in a reference in the "name" field.
There are two options for exporting: "export this page" or "export all pages." Both will lead you to the Job List page.
Export this page: Exports the number of programs found on that page.
Export all pages: Exports all of the programs found based on that search.
Upon completion of the export, you may download it and open the file using another program such as Excel. The export is text-only.
The job list will give you the list of activity done by the user. This allows you to go back to previous searches and download exports. The job list can be viewed by clicking your name at the top right corner. Displays type, start/end, query, status, message, action of activity.
Type:
Start/End:
Query:
Status:
Message:
Action:
Files rejected by the Edge import script are listed in tvnews:/data/tna/edge/solr/invalid_files.txt.
We should monitor this file. There are three common failure types: