Accessible Audio Descriptions for HTML5 Video

James Edwards
Share

A client recently asked me to produce an accessible video player, and one of the features she was very keen to have is audio descriptions. Audio descriptions are intended for people who are blind or have impaired vision, providing additional spoken information to describe important visual details.

Traditionally, audio-described videos have to be made specially, with the audio encoded in a separate track of the single video file. It takes pretty specialised video-editing equipment to encode these audio tracks, and that raises the bar for most content producers beyond a practical level.

All the audio-described content I’ve seen on the web is like this. For example, BBC iPlayer has a selection of such content, but the video player doesn’t give you control over the relative volumes, and you can’t turn the audio-descriptions off — you can only watch separate described or non-described versions of the program.

Enter HTML5

The HTML5 video specification does provide an audioTracks object, which would make it possible to implement an on/off button, and to control the audio and video volumes separately. But its browser support is virtually non-existent — at the time of writing, only IE10 supports this feature.

In any case, what my client wanted was audio descriptions in a separate file, which could be added to a video without needing to create a separate version, and which would be easy to make without specialised software. And of course, it had to work in a decent range of browsers.

So my next thought was to use a MediaController, which is a feature of HTML5 audio and video that allows you to synchronise multiple sources. However browser support for this is equally scant — at the time of writing, only Chrome supports this feature.

But you know — even without that support, it’s clearly not a problem to start two media files at the same time, it’s just a case of keeping them in sync. So can we use existing, widely-implemented features to make that work?

Video Events

The video API provides a number of events we can hook into, that should make it possible to synchronise audio playback with events from the video:

  • The "play" event (which fires when the video is played).
  • The "pause" event (which fires when the video is paused).
  • The "ended" event (which fires when the video ends).
  • The "timeupdate" event (which fires continually while the video is playing).

It’s the "timeupdate" event that’s really crucial. The frequency at which it fires is not specified, and practise it varies considerably — but as a rough, overall average, it amounts to 3–5 times per second, which is enough for our purposes.

I’ve seen a similar approach being tried to synchronise two video files, but it isn’t particularly successful, because even tiny discrepancies are very obvious. But audio descriptions generally don’t need to be so precisely in sync — a delay of 100ms either way would be acceptable — and playing audio files is far less work for the browser anyway.

So all we need to do is use the video events we have, to lock the audio and video playback together:

  • When the video is played, play the audio.
  • When the video is paused, pause the audio.
  • When the video ends, pause the video and audio together.
  • When the time updates, set the audio time to match the video time, if they’re different.

After some experimentation, I discovered that the best results are achieved by comparing the time in whole seconds, like this:

if(Math.ceil(audio.currentTime) != Math.ceil(video.currentTime))
{
  audio.currentTime = video.currentTime;
}

This seems counter-intuitive, and initially I had assumed we’d need as much precision as the data provides, but that doesn’t seem to be the case. By testing it using a literal audio copy of the video’s soundtrack (i.e. so the audio and video both produce identical sound), it’s easy to hear when the synchronisation is good or bad. Experimenting on that basis, I got much better synchronisation when rounding the figures, than not.

So here’s the final script. If the browser supports MediaController then we just use that, otherwise we implement manual synchronisation, as described:

var video = document.getElementById('video');
var audio = document.getElementById('audio');
    
if(typeof(window.MediaController) === 'function')
{
  var controller = new MediaController();
  video.controller = controller;
  audio.controller = controller;
} 
else
{
  controller = null;
}
    
video.volume = 0.8;
audio.volume = 1;
    
video.addEventListener('play', function() 
{
  if(!controller && audio.paused)
  {
    audio.play();
  }
}, false);
    
video.addEventListener('pause', function()
{
  if(!controller && !audio.paused)
  {
    audio.pause();
  }
}, false);
    
video.addEventListener('ended', function()
{
  if(controller)
  {
    controller.pause();
  }
  else
  {
    video.pause();
    audio.pause();
  }
}, false);
    
video.addEventListener('timeupdate', function()
{
  if(!controller && audio.readyState >= 4)
  {
    if(Math.ceil(audio.currentTime) != Math.ceil(video.currentTime))
    {
      audio.currentTime = video.currentTime;
    }
  }
}, false);

Note that the MediaController itself is defined only through scripting, whereas it is possible to define a controller using the static "mediagroup" attribute:

<video mediagroup="foo"> ... </video>
<audio mediagroup="foo"> ... </audio>

If we did that, then it would work without JavaScript in Chrome. It would sync the media sources, but the user would have no control over the audio (including not being able to turn it off), because the browser wouldn’t know what the audio represents. This is the case in which it would be better to have the audio encoded into the video, because then it could appear in the audioTracks object, and the browser could recognise that and be able to provide native controls.

But since we have no audioTracks data, that’s rather a moot point! So if scripting is not available, the audio simply won’t play.

Here’s the final demo, which will work in any recent version of Opera, Firefox, Chrome, Safari, or IE9 or later:

This is just a simple proof-of-concept demo, of course — there’s no initial feature detection, and it only has the basic controls provided by the native "controls" attribute. For a proper implementation it would need custom controls, to provide (among other things) a button to switch the audio on and off, and separate volume sliders. The interface should also be accessible to the keyboard, which is not the case in some browsers’ native controls. And it would need to handle buffering properly — as it is, if you seek past the point where the video has preloaded, the audio will continue to play freely until the video has loaded enough to bring it back into sync.

I might also mention that the descriptions themselves are hardly up to professional standards! That’s my voice you can hear, recorded and converted using Audacity. But such as it is, I think it makes an effective demonstration, of how low the technical barrier-to-entry is with this approach. I didn’t have to edit the video, and I made the audio in an hour with free software.

As a proof of concept, I’d say it was pretty successful — and I’m sure my client will be very pleased!