Red Hen Lab is a global laboratory and consortium for research into multimodal communication. We aim to develop a multilevel integrated research infrastructure that supports a wide community of diverse researchers. Here is an overview of some of the dimensions involved.
Red Hen Lab works with a variety of datasets, from television news to art. The research methods we develop can typically be applied to a variety of data.
Our main large dataset is the UCLA NewsScape Library of International Television News
, a digital collection of around 420,000 television news programs. It was initiated and developed by the Department of Communication Studies at UCLA and is hosted by the UCLA Library. Currently, 10% is international, including news programs from Belgium, Brazil, Czechia, Denmark, Egypt, France, Germany, Italy, Japan, Norway, Qatar, Poland, Portugal, Russia, Spain, and Sweden. A growing list of collaborators will greatly increase the proportion of international newscasts, making the repository a vibrant archive for research in a wide variety of disciplines. Along with the video, we capture closed captioning / teletext, and align this with transcripts when available; the corpus is now annotated by more than three billion words.
2017-03-12 8:29 GMT (previous values from checkpoint at 2016-02-20 3:30 GMT)
Total networks: 51
Total series: 2,798
Total duration in hours: 320,792 (275,647)
Total metadata files (CC, OCR, TPT): 847,204 (732,732)
Total words in metadata files (CC, OCR, TPT): 3.92 billion, 3,922,794,392 exactly (3.40)
Total caption files: 413,387 (355,338)
Total words in caption files: 2.59 billion, 2,594,702,167 exactly (2.25)
Total OCR files: 397,637 (344,734)
Total TPT files: 36,180 (32,660)
Total words in OCR files: 899.54 million, 899,535,745 exactly (758.65)
Total words in TPT files: 428.56 million, 428,556,480 exactly (386.82)
Total video files: 413,199 (355,155)
Total thumbnail images: 115,485,216 (99,233,008)
Storage used for core data: 114.03 terabytes (100.39)
While text corpora pose known problems and partial solutions, massive video corpora remain largely inaccessible to systematic analysis. Textual and visual information is complementary rather than duplicative, adding complexity to the parsing task.
The computational challenges exist at three levels:
- Surface ontologies -- classifying, identifying, and labeling people, actions, objects, and places shown in video frames
- Syntagmatic ontologies -- detecting story boundaries, story topics, scene types, and spatiotemporal patterns
- Communicative ontologies -- camera techniques, presentational patterns, persuasion effects
Because television news programs, unlike surveillance video, are professionally constructed as intentional acts of communication, the computational challenges can only be met through a close collaboration of statisticians / computer scientists and media scholars / cognitive scientists. Teams need to work on all three levels of ontologies at the same time to make meaningful progress, so that communicative frames inform the choice of computational techniques, and available computations techniques inform which cognitive effects to focus on. At each level, textual / verbal information can be recruited for probability weightings. Coverage from different networks and countries can be used to triangulate on a single event, creating a multiperspective construct.
A. Red Hen's integrated research workflow
The research workflow we are developing for Red Hen scholars and students integrates established manual analytical practices in the multimodal study of language with a new generation of computational tools centered around "deep learning". This integrated research workflow can be described as a sequence of topics in a Red Hen Summer School:
- Introduction to multimodal communication research
- Red Hen Primer: selective unix shell commands, applied python, how to deploy NLP engines, and the statistical package R
- Multimodal analysis with Elan -- hands-on, best coding practices, annotations integrated into the Red Hen dataset
- Research question development in the student's area of interest -- e.g., crossmodal constructions, complex blends, multimodal disambiguation
- Apply machine learning tools to annotations to create feature-specific classifiers on Red Hen servers, semi-supervised through feedback in Elan
- Run the classifier for feature extraction and annotation on high-performance computing clusters at CWRU, UCLA, and Erlangen (cf. Audio processing pipeline)
- Search for complex correlations between linguistic, auditory, and visual annotations in the Red Hen dataset
- Interpretation, qualitative and quantitative/statistical analysis of the search results
- Experimental testing of communicative effects
- Visualization, write-up, presentations
We invite contributions to any stage of this workflow; see for instance Barnyard.
B. Levels of the Red Hen project
- Data acquisition -- global capture of television news; we can now set up wholly automated capture stations using a Raspberry Pi
- Data storage -- NewsScape's home is the UCLA Library, on secure servers located in the ITS machine room
- Data enhancement -- machine processing requires high-quality data; we do spell-checking, speech-to-text, download and align transcripts
- Data mining research -- this is a vast task at the cutting edge of computer science and statistics, that includes the establishment of stable and extensible HPC processing pipelines
- Communications research -- media studies on effects, political communication, advertising, etc.
- Linguistics research -- verbal, textual, and visual modes of communication
- Cognitive science -- underlying mental processes enabling and recruited by multimodal news
- Neuroscience -- multimodal integration, underlying neurocognitive processes
- Search engines -- multimodal search
- User interfaces -- research and instruction interfaces, search, visual browsing, video annotation
- Presentational tools -- visualizations, publication platforms
- Spain's Excelencia grant for fundamental research awarded to Red Hen researchers Inés Olza and Cristóbal Pagán Cánovas at the University of Navarra for a project on language and gesture in
- Anneliese Maier Research Prize from the Alexander von Humboldt Foundation awarded to Red Hen co-director Mark Turner (2016-2020).
- Grant from the Research Council of Norway awarded to Red Hen co-director Francis Steen for the NECORE project
- Google Summer of Code in 2015 and 2016 awarded to Red Hen Lab.
- Grant by the Cyberenabled Discovery and Innovation program of the US National Science Foundation CNS 1028381 and 1027965 (2010-2016) awarded to Red Hen researchers Song-Chun Zhu (Statistics, UCLA), Tim Groeling, Francis Steen (both Communication Studies, UCLA), & Cheng Zhai (Computer Science, UIUC, a text mining expert). Focus levels 4 and 5, touching all.
- Several OID grants for the development of a search engine. Focus level 9.
- Several CCLE grants to develop an online video annotation tool. Focus level 10. Collaboration with HyperCities.
D. Research network
(A few examples from a large and diverse group)
- Francis Steen, Co-Director of Red Hen Lab -- communication researcher at UCLA
- Mark Turner, Chair of the International Advisory Committee -- cognitive scientist and linguist
- Gerard Steen, International Advisory Committee, director of Metaphor Lab Amsterdam
- Song-Chun Zhu, Statistics, UCLA -- computer vision, hierarchical image parsing
- Irene Mittelberg, U Aachen -- gesture analysis
- Anders Hougaard, Communication Studies, Southern Denmark University -- satellite capture station, research team
- Erik Bucy, leading political communication / visual studies scholar, Texas Tech (visiting scholar with NewsScape 2012)
- Javier Valenzuela, University of Murcia, Spain -- linguistics
- Rajesh Kasturirangan, National Institute for Advanced Studies, Bangalore -- multimodal ontologies
- Cristóbal Pagán Cánovas, University of Navarra, Spain -- humanities
E. Research projects
- NSF/CDI: Computer vision is focused on surveillance video. NewsScape contains deliberate communicative acts in a multimodal format; this is a new challenge to the computer vision community. Joint text/image mining is also nearly untouched; we have around a billion words. Progress: story segmentation and hierarchical topic modeling using Wordnet, Stanford parser, UIUC Named Entity Recognition. On-screen text OCR. Human figure detection, face detection. Visual identification of frequently recurring people in the news. Commercial detection. Alignment of transcripts with closed captioning. Topic modeling.
- Syntactical parsing: Eckhart Bick at Southern Danish University
- Minding the News: Turner & Steen book project, articles
- Multimodal integration: Iacoboni, Enyedy, Steen fMRI pilot data, article project; meeting planned with Turner for NIH project development
- Gesture Group: Eve Sweetser, Berkeley; seminars and workshops at CWRU and Aarhus U, Denmark this week and next
- Interface development: the UCLA Library's simul8 group is working with the CDI team to prepare a new image-based searching and browsing visualization; the Processing Foundation has applied for NEH/DFG to work with NewsScape on new modes of perspectivized visualizations
F. Challenges and opportunities
Research on multimodal communication is an exploding opportunity filled with hard questions at multiple interacting levels:
- Global data collection
- Metadata development
- Data mining -- joint text/image parsing and persuasion techniques
- Media effects studies, political communication
- Information processing model development
- Cognitive neuroscience
- User experience and interface development
- Multimodal search engines
- Educational outreach
- Publishing platforms
G. Disciplinary opportunities and challengs
The core research issues relate to multimodal information processing:
- Computer Science / Statistics: the challenge is to approximate the very sophisticated abilities of the human brain in joint image/text parsing
- Political Science / Communication: the challenge is to understand how integrating verbal and visual information impacts audiences and affects elections
- Humanities / Art History: communicative strategies in painting, sculpture, and other forms of multimodal art
- Linguistics: the challenge is to extend the theories developed for language to multimodal information
- Neuroscience: the challenge is to understanding how the brain integrates multimodal information
- Education: how can instructors and students make the best use of their cognitive capacities in multimodal information processing?
- Library science: how can libraries provide the support researchers need to work with multimodal datasets?
- Design: how to design intuitive and powerful user interfaces to massive multimodal datasets with n-dimensional annotations
A note on the last point: the distinguishing factor in the most successful technology companies these days is often user experience design -- this is what Apple and Samsung are suing each other over.