With a clickable transcript on the side

Live coding video: a player with a clickable transcript

Hi! This time, we will just go a little bit further than in the previous examples.

When we click on a button, we will force the loading of the track, we will read the content, add cue listeners in order to trigger events when the video is played and we will display them on the side this time, and there are hyperlinks you can click and if I click somewhere, the video starts at the corresponding location and you can see that the cues are highlighted in black as the video advances.

There are not a lot of subtitles at this location but... you can see that the transcript is highlighted as the video is playing.

What have we added to the previous example? The first thing is that we just defined a rectangular area here: it is just a div with an id="transcript", and we added some CSS here, for locating the video and the transcript on the same horizontal position, so that the video can grow and the transcript too.

We use floating positions, I put ‘float:left;’ for the transcript that is on the right because if I put 'right' it will grow but it will be aligned on the right... and I prefer it on the left.

We can give a look at the CSS, there is nothing complicated and this can scroll because of the overflow:auto; rule we added to the div.

When we click on the buttons here, we call a function called loadTranscript(), instead of forceLoadTrack(0) and forceLoadTrack(1), this time we just ask for a particular language and, it's implicit, but we are also looking for track files that are not chapters.

Let's have a look at this loadTranscript() function here.

So the loadTranscript() function has a parameter that is the language.

The first thing we do is that we clear the div, we are just setting the content to null and then we disable all the tracks: we set the modeof the all the tracks ‘disabled’ because when we click here and we can change the language of the transcript, we need to disable all track and enable just the one we are interested in.

How can we locate the right track with the language? We just iterate on the tracks... this is the text tracks object... and we get the current track as an HTML element and as a TextTrack and using the TextTrack, we just check the language and the kind.

And if the language is equal to the one we are looking for, and if the kind is different than chapters, then this is the track we would like.

By forcing the mode to "showing", in case the track has not already been loaded, it will trigger (it will ask the browser) to load the file.

This is where we test if the file is already been loaded, it is exactly the same test we did in the previous example.

If the track is already loaded, just display the cues in sync with the video and if the track has not been loaded, display the cues after the track has been loaded.

This is the same function as the getContent we had, except that we renamed it.

This takes the track as an HTML element and the TextTrack as parameters.

Let's have a look at how it is done.

"displayCuesAfterTrackLoaded" just waits for the load event to be triggered and then it called display the cues function that will display the cues in sync.

Either we call it directly if the track is loaded, or we know that the loading has been triggered if necessary, and we just wait in the load event listener.

Let's have a look at what displayCues function does.

The displayCues function is exactly the same as the readContent we had earlier.

It gets the cue list for the given track, add listeners to the track.

And instead of just displaying the plain text below the video as we did earlier, we will just make a nice format and we will add the id of the cue in the element we are creating.

Let's have a look... I'm calling the inspector... let's have a look at one of the list items here.

You can see that in the list item we use the CSS class called cues, just for the formatting, for putting them in blue and adding an underline when the mouse is over, and we use the id of the cue in the list item.

So the id is 10, and we also created an onclick listener that calls the function we will detail later, that is call jumpTo.

And here is the starting time of the cue.

What we did is that we created a list item with a given id and if we click on it, it has a click listener that will call the jumpTo with the time as the parameter.

This is the trick, this onclick listener that will make the video jump to the right position.

How did we create that? We use classical techniques.

So we created a string called clickableTransText that is just an HTML list item with the id, the onclick listener that is built with the start time of the cue and we just add this list item to the div.

This function here, addToTranscriptDiv, just adds in the DOM the HTML fragment.

We can give a look at addToTranscriptDiv.

It just does transcriptDiv.innerHTML += this text.

The jumpTo method that makes the video jump... in order to jump to a particular time in the video we are just setting the currentTime property of the video element and we want it to start playing as soon as the jump is done.

So video = document.querySelector("#myVideo") is just the video HTML element.

I recommend to look slowly at the code, it is a bit longer because we added some formatting for the voices and so on, but it is not complicated.

Take your time and look at how it's done....

Foreword about the set of five examples presented in this section: the code of the examples is larger than usual, but each example integrates blocks of code already presented and detailed in the previous lessons.

Example #1: create an accessible player with a clickable transcript of the video presentation

It might be interesting to read the content of a track before playing the video. This is what the edX video player does: it reads a single subtitle file and displays it as a transcript on the right. In the transcript, you can click on a sentence to make the video jump to the corresponding location. We will learn how to do this using the track API.

Read the WebVTT file at once using the track API and make a clickable transcript

Here we decided to code something similar, except that we will offer a choice of track/subtitle language. Our example offers English or German subtitles, and also another track that contains the chapter descriptions (more on that later). Using a button to select a language (track), the appropriate transcript is displayed on the right. Like the edX player, we can click on any sentence in order to force the video to jump to the corresponding location. While the video is playing, the current text is highlighted.

Some important things here:

Browsers do not load all the tracks at the same time, and the way they decide when and which track to load differs from one browser to another. So, when we click on a button to choose the track to display, we need to enforce the loading of the track, if it has not been loaded yet.
When a track file is loaded, then we iterate on the different cues and generate the transcript as a set of <li>...</li> elements. One <li> per cue/sentence.
We define the id attribute of the <li> to be the same as the cue.id value. In this way, when we click on a <li> we can get its id and find the corresponding cue start time, and make the video jump to that time location.
We add an enter and an exit listener to each cue. These will be useful for highlighting the current cue. Note that these listeners are not yet supported by FireFox (you can use a cuechange event listener on a TextTrack instead - the source code for FireFox is commented in the example).

HTML code:

<head>

<title>Video player with clickable transcript</title>

</head>

<body>

<h1>Using the track API to extract the content of webVTT files in <code><track></code> elements</h1>

<p>Click on the buttons under the video to extract the english or german subtitles

</p>

<p>

<button disabled

id="buttonEnglish"

onclick="loadTranscript('en');">

Display English transcript

</button>

<button disabled

id="buttonDeutsch"

onclick="loadTranscript('de');">

Display Deutsch transcript

</button>

</p>

</video>

</section>

</body>

</html>

CSS code:

#all {

background-color: lightgrey;

border-radius:10px;

padding: 20px;

border:1px solid;

display:inline-block;

/*height:500px;*/

margin:30px;

width:90%;

}

.cues {

color:blue;

}

.cues:hover {

text-decoration: underline;

}

.cues.current {

color:black;

font-weight: bold;

}

#myVideo {

display: block;

float : left;

margin-right: 2.85714%;

width: 65.71429%;

background-color: black;

position: relative;

}

#transcript {

padding: 10px;

border:1px solid;

float: left;

max-height: 225px;

overflow: auto;

width: 25%;

margin: 0;

font-size: 14px;

list-style: none;

}

JS code:

let video, transcriptDiv;

let tracks, trackElems, tracksURLs = [];

let buttonEnglish, buttonDeutsch;

window.onload = () => {

console.log("init");

// when the page is loaded

video = document.querySelector("#myVideo");

transcriptDiv = document.querySelector("#transcript");

// The tracks as HTML elements

trackElems = document.querySelectorAll("track");

for(let i = 0; i < trackElems.length; i++) {

var currentTrackElem = trackElems[i];

tracksURLs[i] = currentTrackElem.src;

}

buttonEnglish = document.querySelector("#buttonEnglish");

buttonDeutsch = document.querySelector("#buttonDeutsch");

// we enable the buttons and show transcript

buttonEnglish.disabled = false;

buttonDeutsch.disabled = false;

// The tracks as JS objects

tracks = video.textTracks;

};

function loadTranscript(lang) {

// clear current transcript

clearTranscriptDiv();

// set all track mode to disabled. We will only activate the

// one whose content will be displayed as transcript

disableAllTracks();

// Locate the track with language = lang

for(let i = 0; i < tracks.length; i++) {

// current track

let track = tracks[i];

let trackAsHtmlElem = trackElems[i];

if((track.language === lang) && (track.kind !== "chapters")) {

track.mode="showing";

if(trackAsHtmlElem.readyState === 2) {

// the track has already been loaded

displayCues(track);

} else {

displayCuesAfterTrackLoaded(trackAsHtmlElem, track);

}

/* FOR FIREFOX....

track.addEventListener("cuechange", function(e) {

var cue = this.activeCues[0];

console.log("cue change");

var transcriptText = document.getElementById(cue.id);

transcriptText.classList.add("current");

});

*/

}

function displayCuesAfterTrackLoaded(trackElem, track) {

// Create a listener that will be called only when the track has

// been loaded

trackElem.addEventListener('load', (e) => {

console.log("track loaded");

displayCues(track);

});

}

function disableAllTracks() {

for(let i = 0; i < tracks.length; i++)

tracks[i].mode = "disabled";

}

function displayCues(track) {

let cues = track.cues;

//append all the subtitle texts to

for(let i=0, len = cues.length; i < len; i++) {

let cue = cues[i];

addCueListeners(cue);

let voices = getVoices(cue.text);

let transText="";

if (voices.length > 0) {

for (let j = 0; j < voices.length; j++) { // how many voices ?

transText += voices[j].voice + ': ' + removeHTML(voices[j].text);

}

} else

transText = cue.text; // not a voice text

let clickableTransText = "<li class='cues' id=" + cue.id + " onclick='jumpTo(" + cue.startTime + ");'" + ">" + transText + "</li>";

addToTranscriptDiv(clickableTransText);

}

function getVoices(speech) { // takes a text content and check if there are voices

let voices = []; // inside

let pos = speech.indexOf('<v'); // voices are like <v michel> ....

while (pos != -1) {

endVoice = speech.indexOf('>');

let voice = speech.substring(pos + 2, endVoice).trim();

let endSpeech = speech.indexOf('</v>');

let text = speech.substring(endVoice + 1, endSpeech);

voices.push({

'voice': voice,

'text': text

});

speech = speech.substring(endSpeech + 4);

pos = speech.indexOf('<v');

}

return voices;

}

function removeHTML(text) {

let div = document.createElement('div');

div.innerHTML = text;

return div.textContent || div.innerText || '';

}

function jumpTo(time) {

video.currentTime = time;

video.play();

}

function clearTranscriptDiv() {

transcriptDiv.innerHTML = "";

}

function addToTranscriptDiv(htmlText) {

transcriptDiv.innerHTML += htmlText;

}

function addCueListeners(cue) {

cue.onenter = (e) => {

console.log('enter id=' + e.target.id);

let transcriptText = document.getElementById(e.target.id);

transcriptText.classList.add("current");

};

cue.onexit = (e) => {

console.log('exit id=' + e.target.id);

let transcriptText = document.getElementById(e.target.id); transcriptText.classList.remove("current");

};

}

HTML code —

<button disabled id="buttonEnglish"

onclick="loadTranscript('en');">

Display English transcript

</button>

<button disabled id="buttonDeutsch"

onclick="loadTranscript('de');">

Display Deutsch transcript

</button>

</p>

<source src="https://...../elephants-dream-medium.mp4"

type="video/mp4">

<source src="https://...../elephants-dream-medium.webm"

type="video/webm">

<track label="English subtitles"

kind="subtitles"

srclang="en"

src="https://...../elephants-dream-subtitles-en.vtt" >

<track label="Deutsch subtitles"

kind="subtitles"

srclang="de"

src="https://...../elephants-dream-subtitles-de.vtt"

default>

<track label="English chapters"

kind="chapters"

srclang="en"

src="https://...../elephants-dream-chapters-en.vtt">

</video>

</section>

Load a WebVTT file using Ajax/XHR2 and parse it manually

This is an old example written in 2012 at a time when the track API was not supported by browsers. It downloads WebVTT files using Ajax and parses it "by hand". Notice the complexity of the code, compared to the previous example that uses the track API instead. We give this example as is. Sometimes, bypassing all APIs can be a valuable solution, especially when support for the track API is sporadic, as was the case in 2012...

Here is an example at JSBin that displays the values of the cues in the different tracks:

This example, adapted from an example from (now offline) dev.opera.com, uses some JavaScript code that takes a WebVTT subtitle (or caption) file as an argument, parses it, and displays the text on screen, in an element with an id of transcript.

Extract from HTML code:

...

<track label="English subtitles" kind="subtitles" srclang="en"

src="https://..../elephants-dream-subtitles-en.vtt" default>

<track label="Deutsch subtitles" kind="subtitles" srclang="de"

src="https://..../elephants-dream-subtitles-de.vtt">

<track label="English chapters" kind="chapters" srclang="en"

src="https://..../elephants-dream-chapters-en.vtt">

</video>

...

<h3>Video Transcript</h3>

<button onclick="loadTranscript('en');">English</button>

<button onclick="loadTranscript('de');">Deutsch</button>

</div>

...

JavaScript code:

// Transcript.js, by dev.opera.com

function loadTranscript(lang) {

var url = "https://mainline.i3s.unice.fr/mooc/" +

'elephants-dream-subtitles-' + lang + '.vtt';

// Will download using Ajax + extract subtitles/captions

loadTranscriptFile(url);

}

function loadTranscriptFile(webvttFileUrl) {

// Using Ajax/XHR2 (explained in detail in Module 3)

var reqTrans = new XMLHttpRequest();

reqTrans.open('GET', webvttFileUrl);

// callback, called only once the response is ready

reqTrans.onload = function(e) {

var pattern = /^([0-9]+)$/;

var patternTimecode = /^([0-9]{2}:[0-9]{2}:[0-9]{2}[,.]{1}[0-9]{3}) --\> ([0-9]

{2}:[0-9]{2}:[0-9]{2}[,.]{1}[0-9]{3})(.*)$/;

var content = this.response; // content of the webVTT file

var lines = content.split(/\r?\n/); // Get an array of text lines

var transcript = '';

for (i = 0; i < lines.length; i++) {

var identifier = pattern.exec(lines[i]);

// is there an id for this line, if it is, go to next line

if (identifier) {

i++;

var timecode = patternTimecode.exec(lines[i]);

// is the current line a timecode?

if (timecode && i < lines.length) {

// if it is go to next line

i++;

// it can only be a text line now

var text = lines[i];

// is the text multiline?

while (lines[i] !== '' && i < lines.length) {

text = text + '\n' + lines[i];

i++;

}

var transText = '';

var voices = getVoices(text);

// is the extracted text multi voices ?

if (voices.length > 0) {

// how many voices ?

for (var j = 0; j < voices.length; j++) {

transText += voices[j].voice + ': '

+ removeHTML(voices[j].text)

+ '<br />';

}

} else

// not a voice text

transText = removeHTML(text) + '<br />';

transcript += transText;

}

var oTrans = document.getElementById('transcript');

oTrans.innerHTML = transcript;

}

};

reqTrans.send(); // send the Ajax request

}

function getVoices(speech) { // takes a text content and check if there are voices

var voices = []; // inside

var pos = speech.indexOf('<v'); // voices are like <v Michel> ....

while (pos != -1) {

endVoice = speech.indexOf('>');

var voice = speech.substring(pos + 2, endVoice).trim();

var endSpeech = speech.indexOf('</v>');

var text = speech.substring(endVoice + 1, endSpeech);

voices.push({

'voice': voice,

'text': text

});

speech = speech.substring(endSpeech + 4);

pos = speech.indexOf('<v');

}

return voices;

}

function removeHTML(text) {

var div = document.createElement('div');

div.innerHTML = text;

return div.textContent || div.innerText || '';

}