An Introduction to the getUserMedia API

Aurelio De Rosa
Share

In the mid-90s, chat was one of the best products available on the web. Raise your hand if you were young and thought how cool it would be to develop your own chat application. One of their best features was their ability to capture microphone audio and/or video from a webcam, and send it over the Internet. To implement these features, developers have relied on plugins like Flash and Silverlight for a long time. However, Flash and Silverlight can be a problem if you don’t have the proper permissions or you’re not tech-savvy. Today, such plugins aren’t required anymore thanks to the WebRTC project and its related APIs. This article will introduce the getUserMedia API, one of the APIs derived from the WebRTC project.

What’s the getUserMedia API

The getUserMedia API provides access to multimedia streams (video, audio, or both) from local devices. There are several use cases for this API. The first one is obviously real-time communication, but we can also employ it to record tutorials or lessons for online courses. Another interesting use case is the surveillance of your home or workplace. On its own, this API is only capable of acquiring audio and video, not sending the data or storing it in a file. To have a complete working chat, for example, we need to send data across the Internet. This can be done using the RTCPeerConnection API. To store the data we can use the MediaStreamRecorder API.

The getUserMedia API is amazing for both developers and users. Developers can now access audio and video sources with a single function call, while users don’t need to install additional software. From the user perspective, this also means a decrease in the time to start using the feature, and also an increased use of the software by non tech-savvy people.

Although the getUserMedia API has been around for a while now, as of December, 30th 2013 it’s still a W3C Working Draft. So, the specifications may be susceptible to several changes. The API exposes only one method, getUserMedia(), that belongs to the window.navigator object. The method accepts as its parameters an object of constraints, a success callback, and a failure callback. The constraints parameter is an object having either one or both the properties audio and video. The value of these properties is a Boolean, where true means request the stream (audio or video), and false does not request the stream. So, to request both audio and video, pass the following object.

{
  video: true,
  audio: true
}

Alternatively, the value can be a Constraints object. This type of object allows us to have more control over the requested stream. In fact, we can choose to retrieve a video source at high resolution, for example 1280×720, or a low one, for example 320×180. Each Constraints object contains two properties, mandatory and optional. mandatory is an object that specifies the set of Constraints that the UA must satisfy or else call the errorCallback. optional, is an array of objects that specifies the set of Constraints that the UA should try to satisfy but may ignore if they cannot be satisfied.

Let’s say that we want audio and video of the user, where the video must be at least at a high resolution and have a framerate of 30. In addition, if available, we want the video at a framerate of 60. To perform this task, we have to pass the following object.

{
  video: {
    mandatory: {
      minWidth: 1280,
      minHeight: 720,
      minFrameRate: 30
    },
    optional: [
      { minFrameRate: 60 }
    ]
  },
  audio: true
}

You can find more information on the properties available in the specifications.

The other two arguments to getUserMedia() are simply two callbacks invoked on success or failure, respectively. On success, the retrieved stream(s) are passed to the callback. The error callback is passed a MediaError object containing information on the error that occurred.

Browser Compatibility

The support for the getUserMedia API is decent on desktop but quite poor on mobile. Besides, the majority of the browsers that support it, still have the the vendor prefixed version. Currently, the desktop browsers that implement the API are Chrome 21+ (-webkit prefix), Firefox 17+ (-moz prefix), and Opera 12+ (unsupported from version 15 to 17) with some issues in older versions. On mobile browsers, only Chrome 21+ (-webkit prefix), and Opera 12+ (-webkit prefix from version 16) support the API. Also note that if a page containing the instructions to work with this API is opened through the file:// protocol in Chrome, it won’t work.

The case of Opera is really interesting and deserves a note. This browser implemented the API but for an unknown (to me) reason, after the switch to the Blink rendering engine in version 15, they didn’t support it anymore. Finally, the API support was restored in version 18. As if it was not enough, Opera 18 is the first version to support the audio stream too.

That said, we can ignore the compatibility issues thanks to a shim called getUserMedia.js. The latter will test the browser and if the API isn’t implemented, it fallbacks to Flash.

Demo

In this section I’ll show you a basic demo so that you can see how the getUserMedia API works and concretely see its parameters. The goal of this demo is to create a “mirror”, in the sense that everything captured from the webcam and the microphone will be streamed via the screen and the audio speakers. We’ll ask the user for permission to access both multimedia streams, and then output them using the HTML5 video element. The markup is pretty simple. In addition to the video element, we have two buttons: one to start execution and one to stop it.

Regarding the scripting part, we first test for browser support. If the API isn’t supported, we display the message “API not supported”, and disable the two buttons. If the browser supports the getUserMedia API, we attach a listener to the click event of the buttons. If the “Play demo” button is clicked, we test if we’re dealing with an old version of Opera because of the issues described in the previous section. Then, we request the audio and video data from the user’s device. If the request is successful, we stream the data using the video element; otherwise, we show the error that occurred on the console. The “Stop demo” button causes the video to be paused and the streams to be stopped.

A live demo of the code below is available here.

<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
    <title>getUserMedia Demo</title>
    <style>
      body
      {
        max-width: 500px;
        margin: 2em auto;
        font-size: 20px;
      }

      h1
      {
        text-align: center;
      }
         
      .buttons-wrapper
      {
        text-align: center;
      }

      .hidden
      {
        display: none;
      }

      #video
      {
        display: block;
        width: 100%;
      }

      .button-demo
      {
        padding: 0.5em;
        display: inline-block;
        margin: 1em auto;
      }

      .author
      {
        display: block;
        margin-top: 1em;
      }
    </style>
  </head>
  <body>
    <h1>getUserMedia API</h1>
    <video id="video" autoplay="autoplay" controls="true"></video>
    <div class="buttons-wrapper">
      <button id="button-play-gum" class="button-demo" href="#">Play demo</button>
      <button id="button-stop-gum" class="button-demo" href="#">Stop demo</button>
    </div>
    <span id="gum-unsupported" class="hidden">API not supported</span>
    <span id="gum-partially-supported" class="hidden">API partially supported (video only)</span>
    <script>
      var videoStream = null;
      var video = document.getElementById("video");

      // Test browser support
      window.navigator = window.navigator || {};
      navigator.getUserMedia = navigator.getUserMedia       ||
                               navigator.webkitGetUserMedia ||
                               navigator.mozGetUserMedia    ||
                               null;

      if (navigator.getUserMedia === null) {
        document.getElementById('gum-unsupported').classList.remove('hidden');
        document.getElementById('button-play-gum').setAttribute('disabled', 'disabled');
        document.getElementById('button-stop-gum').setAttribute('disabled', 'disabled');
      } else {
        // Opera <= 12.16 accepts the direct stream.
        // More on this here: http://dev.opera.com/articles/view/playing-with-html5-video-and-getusermedia-support/
        var createSrc = window.URL ? window.URL.createObjectURL : function(stream) {return stream;};

        // Opera <= 12.16 support video only.
        var audioContext = window.AudioContext       ||
                           window.webkitAudioContext ||
                           null;
        if (audioContext === null) {
          document.getElementById('gum-partially-supported').classList.remove('hidden');
        }

        document.getElementById('button-play-gum').addEventListener('click', function() {
          // Capture user's audio and video source
          navigator.getUserMedia({
            video: true,
            audio: true
          },
          function(stream) {
            videoStream = stream;
            // Stream the data
            video.src = createSrc(stream);
            video.play();
          },
          function(error) {
            console.log("Video capture error: ", error.code);
          });
        });
        document.getElementById('button-stop-gum').addEventListener('click', function() {
          // Pause the video
          video.pause();
          // Stop the stream
          videoStream.stop();
        });
      }
    </script>
  </body>
</html>

Conclusion

This article has introduced you to the WebRTC project, one of most exciting web projects in recent years. In particular, this article discussed the getUserMedia API. The possibility of creating a real-time communication system using the browser only and very few lines of code is terrific and opens a lot of new opportunities.

As we’ve seen, the getUserMedia API is simple yet very flexible. It exposes just one method, but its first parameter, constraints, allows us to require the audio and video streams that better fit our application’s needs. The compatibility among browsers isn’t very wide, but it’s increasing, and this is good news! To better understand the concepts in this article, don’t forget to play with the provided demo. As a final note, I strongly encourage you to try to change the code to perform some task, for example applying a CSS filter to change how the video stream is shown.