Closed captioning is the is the visual display (as text) of audio on media. This is the most common form of captioning and can be identified by the [CC] symbol. Closed captioning is often referred to as subtitles or subtitling, despite their distinctive differences. While subtitling involves translation into an alternate language, closed captions are in the same language as the original audio. Closed captioning is also in distinction to open captioning, in that it can be toggled on or off by users. Open captions are instead burned into the video - think English subtitles on a French film at the cinema!
For people who are d/Deaf or hard of hearing (HoH), closed captioning allows greater accessibility to content - be it on television, video or other media sources. Think of hearing loss as being on a spectrum; everyone is different and has a different type or degree of hearing loss.
University captioning or closed captioning for other educational institutions provides critical access and improved literacy for deaf and hard of hearing people, but there’s also evidence about the positive impacts of captioning lecture content more broadly. Breaking this down, captioning can function as a tool for all students to improve their note taking, revision and day-to-day study habits.
Closed captions display not only spoken dialogue, but also denote sound effects and music. But how do viewers know who is saying what, and when? Well, you may have noticed captioners use colour changes or speaker tags to indicate who is talking. This is a particularly important feature when a character is off-screen, or for a change in speaker. It’s worth noting, a speaker change may also be represented by a dash. Different networks or clients have different standards, but these are a few common ones to keep an eye out for! For recorded programs, the timing of each caption (how long it stays on screen) is set to match that of the speaker, while allowing enough time for viewers to read and process the information.
Closed captioning can either be done live (in real-time) or in a recorded (offline) format. Chances are, you thought it was all done by a computer, but captioning is a very human thing!
The three main methods of captioning are respeaking, stenocaptioning, and automatic speech recognition (ASR) Let's break down the captioning jargon a little further to understand exactly what each entails, and how they differ.
Respeaking
A respeaker listens to an audio feed from the TV show or live event and ‘re-speaks’ it, repeating what they hear into voice recognition software, complete with punctuation, grammar, and any number of formatting instructions. From there, their computer then ‘translates’ this into the text you see on screen. They are simultaneously thinking about positioning, coloring, speakers and more. The ultimate goal is to make sure the captions are clear, accurate and accessible for the viewer, with as little delay as possible and without distracting too much from what’s happening on screen. Does that all sound hard to picture?
Stenocaptioning
Stenographers are trained to use a special shorthand keyboard to type out dialogue in real time, sometimes working at over 300 words per minute! With their specially-adapted keyboard, a stenocaptioner can spell out a whole syllable or word with a single keystroke. To do so, they have to remember the thousands of different key combinations required to produce the different words they have in their dictionaries (and then spit them all back out at lighting speed!). Like respeakers, stenocaptioners consider line positioning, speaker changes and more. While respeakers usually work in pairs, taking turns live captioning every 15 minutes, a stenocaptioner can go for over three hours by themselves.
Automatic Speech Recognition (ASR)
When machines create captions, it’s called Automatic Speech Recognition (ASR). Most video hosting sites provide auto-generated closed captions for videos uploaded. Certainly, this is a step in the right direction for global accessibility, however, they can be extremely inaccurate. While the wrong captions may seem pretty funny, they can actually make things confusing or even misleading for viewers who rely on them. But it’s not all bad, they do provide a good baseline and you can, in fact, edit those to be accurate.
When it comes to captions, does one file fit all? There’s a range of file formats to provide closed captions, with various levels of compatibility and functionality. These include STL, SCC, CAP, WebVTT, SRT, and XML, to name a few. The main format used at E&H for closed caption files is an SRT, or a ‘SubRip Subtitle’ file. This is one of the most common captioning file types, and is used due to its compatibility with a range of major platforms, including Facebook, YouTube, Vimeo, Kaltura, and various lecture-capture programs.