Overview of research

Introduction

Red Hen Lab is a global laboratory and consortium for research into multimodal communication. We aim to develop a multilevel integrated research infrastructure that supports a wide community of diverse researchers. Here is an overview of some of the dimensions involved.

Related scrolls

Datasets

Red Hen Lab works with a variety of datasets, from television news to art. The research methods we develop can typically be applied to a variety of data.

Our main large dataset is the UCLA NewsScape Library of International Television News, a digital collection of around 420,000 television news programs. It was initiated and developed by the Department of Communication Studies at UCLA and is hosted by the UCLA Library. Currently, 10% is international, including news programs from Belgium, Brazil, Czechia, Denmark, Egypt, France, Germany, Italy, Japan, Norway, Qatar, Poland, Portugal, Russia, Spain, and Sweden. A growing list of collaborators will greatly increase the proportion of international newscasts, making the repository a vibrant archive for research in a wide variety of disciplines. Along with the video, we capture closed captioning / teletext, and align this with transcripts when available; the corpus is now annotated by more than three billion words.

NewsScape summary statistics at 2018-04-08 8:00 GMT (previous values from checkpoint at 2016-02-20 3:30 GMT)

Total networks: 51

Total series: 3,111

Total duration in hours: 367,717 (275,647)

Total metadata files (CC, OCR, TPT): 958,748 (732,732)

Total words in metadata files (CC, OCR, TPT): 4.43 billion, 4,433,470,505 exactly (3.40)

Total caption files: 474,740 (355,338)

Total words in caption files: 2.96 billion, 2,958,967,839 exactly (2.25)

Total OCR files: 446,461 (344,734)

Total TPT files: 37,547 (32,660)

Total words in OCR files: 1,029.37 million, 1,029,369,623 exactly (758.65)

Total words in TPT files: 445.13 million, 445,133,043 exactly (386.82)

Total video files: 473,994 (355,155)

Total thumbnail images: 132,378,119 (99,233,008)

Storage used for core data: 127.38 terabytes (100.39)

Computational analysis

While text corpora pose known problems and partial solutions, massive video corpora remain largely inaccessible to systematic analysis. Textual and visual information is complementary rather than duplicative, adding complexity to the parsing task.

The computational challenges exist at three levels:

Surface ontologies -- classifying, identifying, and labeling people, actions, objects, and places shown in video frames
Syntagmatic ontologies -- detecting story boundaries, story topics, scene types, and spatiotemporal patterns
Communicative ontologies -- camera techniques, presentational patterns, persuasion effects

Because television news programs, unlike surveillance video, are professionally constructed as intentional acts of communication, the computational challenges can only be met through a close collaboration of statisticians / computer scientists and media scholars / cognitive scientists. Teams need to work on all three levels of ontologies at the same time to make meaningful progress, so that communicative frames inform the choice of computational techniques, and available computations techniques inform which cognitive effects to focus on. At each level, textual / verbal information can be recruited for probability weightings. Coverage from different networks and countries can be used to triangulate on a single event, creating a multiperspective construct.

A. Red Hen's integrated research workflow

The research workflow we are developing for Red Hen scholars and students integrates established manual analytical practices in the multimodal study of language with a new generation of computational tools centered around "deep learning". This integrated research workflow can be described as a sequence of topics in a Red Hen Summer School:

Introduction to multimodal communication research
Red Hen Primer: selective unix shell commands, applied python, how to deploy NLP engines, and the statistical package R
Multimodal analysis with Elan -- hands-on, best coding practices, annotations integrated into the Red Hen dataset
Research question development in the student's area of interest -- e.g., crossmodal constructions, complex blends, multimodal disambiguation
Apply machine learning tools to annotations to create feature-specific classifiers on Red Hen servers, semi-supervised through feedback in Elan
Run the classifier for feature extraction and annotation on high-performance computing clusters at CWRU, UCLA, and Erlangen (cf. Audio processing pipeline)
Search for complex correlations between linguistic, auditory, and visual annotations in the Red Hen dataset
Interpretation, qualitative and quantitative/statistical analysis of the search results
Experimental testing of communicative effects
Visualization, write-up, presentations

We invite contributions to any stage of this workflow; see for instance Barnyard.

B. Levels of the Red Hen project

Data acquisition -- global capture of television news; we can now set up wholly automated capture stations using a Raspberry Pi
Data storage -- NewsScape's home is the UCLA Library, on secure servers located in the ITS machine room
Data enhancement -- machine processing requires high-quality data; we do spell-checking, speech-to-text, download and align transcripts
Data mining research -- this is a vast task at the cutting edge of computer science and statistics, that includes the establishment of stable and extensible HPC processing pipelines
Communications research -- media studies on effects, political communication, advertising, etc.
Linguistics research -- verbal, textual, and visual modes of communication
Cognitive science -- underlying mental processes enabling and recruited by multimodal news
Neuroscience -- multimodal integration, underlying neurocognitive processes
Search engines -- multimodal search
User interfaces -- research and instruction interfaces, search, visual browsing, video annotation
Presentational tools -- visualizations, publication platforms

C. Grants

Grant from DFG (the Deutsche Forschungsgemeinschaft) for Red Hen's International Conference on Multimodal Communication in 2017.
Spain's Excelencia grant for fundamental research awarded to Red Hen researchers Inés Olza and Cristóbal Pagán Cánovas at the University of Navarra for a project on language and gesture in time expressions
Anneliese Maier Research Prize from the Alexander von Humboldt Foundation awarded to Red Hen co-director Mark Turner (2016-2020).
Grant from the Research Council of Norway awarded to Red Hen co-director Francis Steen for the NECORE project
Google Summer of Code in 2015, 2016, 2017, 2018 and 2019 awarded to Red Hen Lab.
Grant by KONWIHR awarded to Peter Uhrig.
Grant by the Cyberenabled Discovery and Innovation program of the US National Science Foundation CNS 1028381 and 1027965 (2010-2016) awarded to Red Hen researchers Song-Chun Zhu (Statistics, UCLA), Tim Groeling, Francis Steen (both Communication Studies, UCLA), & Cheng Zhai (Computer Science, UIUC, a text mining expert). Focus levels 4 and 5, touching all.
Several OID grants for the development of a search engine. Focus level 9.
Several CCLE grants to develop an online video annotation tool. Focus level 10. Collaboration with HyperCities.

D. Research network

(A few examples from a large and diverse group)

Francis Steen, Co-Director of Red Hen Lab -- communication researcher at UCLA
Mark Turner, Chair of the International Advisory Committee -- cognitive scientist and linguist
Gerard Steen, International Advisory Committee, director of Metaphor Lab Amsterdam
Song-Chun Zhu, Statistics, UCLA -- computer vision, hierarchical image parsing
Irene Mittelberg, U Aachen -- gesture analysis
Anders Hougaard, Communication Studies, Southern Denmark University -- satellite capture station, research team
Erik Bucy, leading political communication / visual studies scholar, Texas Tech (visiting scholar with NewsScape 2012)
Peter Uhrig, FAU Erlangen-Nürnberg -- cognitive linguistics, corpus linguistics, data processing
Javier Valenzuela, University of Murcia, Spain -- linguistics
Rajesh Kasturirangan, National Institute for Advanced Studies, Bangalore -- multimodal ontologies
Cristóbal Pagán Cánovas, University of Navarra, Spain -- humanities
Anna Wilson (formerly Pleshakova), University of Oxford, UK -- cognitive linguistics, media linguistics, multimodal analyses

E. Research projects

NSF/CDI: Computer vision is focused on surveillance video. NewsScape contains deliberate communicative acts in a multimodal format; this is a new challenge to the computer vision community. Joint text/image mining is also nearly untouched; we have around a billion words. Progress: story segmentation and hierarchical topic modeling using Wordnet, Stanford parser, UIUC Named Entity Recognition. On-screen text OCR. Human figure detection, face detection. Visual identification of frequently recurring people in the news. Commercial detection. Alignment of transcripts with closed captioning. Topic modeling.
Syntactical parsing: Eckhart Bick at Southern Danish University
Minding the News: Turner & Steen book project, articles
Multimodal integration: Iacoboni, Enyedy, Steen fMRI pilot data, article project; meeting planned with Turner for NIH project development
Gesture Group: Eve Sweetser, Berkeley; seminars and workshops at CWRU and Aarhus U, Denmark this week and next
Interface development: the UCLA Library's simul8 group is working with the CDI team to prepare a new image-based searching and browsing visualization; the Processing Foundation has applied for NEH/DFG to work with NewsScape on new modes of perspectivized visualizations

F. Challenges and opportunities

Research on multimodal communication is an exploding opportunity filled with hard questions at multiple interacting levels:

Global data collection
Metadata development
Data mining -- joint text/image parsing and persuasion techniques
Media effects studies, political communication
Information processing model development
Cognitive neuroscience
User experience and interface development
Multimodal search engines
Educational outreach
Publishing platforms

G. Disciplinary opportunities and challengs

The core research issues relate to multimodal information processing:

Computer Science / Statistics: the challenge is to approximate the very sophisticated abilities of the human brain in joint image/text parsing
Political Science / Communication: the challenge is to understand how integrating verbal and visual information impacts audiences and affects elections
Humanities / Art History: communicative strategies in painting, sculpture, and other forms of multimodal art
Linguistics: the challenge is to extend the theories developed for language to multimodal information
Neuroscience: the challenge is to understanding how the brain integrates multimodal information
Education: how can instructors and students make the best use of their cognitive capacities in multimodal information processing?
Library science: how can libraries provide the support researchers need to work with multimodal datasets?
Design: how to design intuitive and powerful user interfaces to massive multimodal datasets with n-dimensional annotations

A note on the last point: the distinguishing factor in the most successful technology companies these days is often user experience design -- this is what Apple and Samsung are suing each other over.

Page updated

Google Sites

Report abuse