Under the hood: What happens to your videos after upload
Anil Murching, Senior Program Manager, den 27 juli 2016
Using Stream to share videos is really easy - all you have to do is to upload your video, and within a short time, you get a link that you can share within your organization. You don't need to worry about technical details about the video (its format, its resolution) or how it was captured. Nor do you need to worry about which device your colleagues use, and its playback capabilities. We take care of all the heavy lifting, so your colleagues can view your video at a quality that is tailored to its content, as well as the device capabilities. A lot of intelligent processing goes on in our cloud service in order to give you this experience. In this blog, I will explain, in layman terms, how we go about accomplishing this.
Encoding for Adaptive Streaming
Every video that is uploaded to Stream is encoded - processed and converted to a format to make it ready for streaming. The technology by which we ensure we can stream your videos to all devices is called adaptive streaming. The basic idea is to take your input video, and encode it into multiple renditions, starting from a high quality rendition and gradually degrading in steps to a low quality rendition. The playback device is made aware of all the available renditions. It can then adapt to fluctuations in network conditions. When the available bandwidth is high, the device can download and play the high quality video. When the available bandwidth drops, the device can drop down to a quality version.
The net result: your viewing experience degrades gracefully if available bandwidth drops, and playback never stalls. While every video streaming service makes use of adaptive streaming. What makes Stream different is that we are smart about how we produce the different quality renditions.
Deciding the Number of Renditions
Our experience with Office365 Video has taught us to anticipate a variety of uploaded videos, ranging from professional marketing videos authored in full 1080p HD resolution, to legacy Windows Media files recorded at much smaller resolutions, meant for display on VGA monitors. We made use of this experience to build some smarts around how we pick the number of renditions or qualities that we encode a video into. For example, if the input video is at full 1080p HD resolution (1920x1080 pixels), then we would decide to use 6 steps, starting from 1920x1080, down to 320x180. If instead the input video is of standard definition resolution (eg. 640x480 pixels), then we would pick just 3 steps, from 640x480 down to 240x180. Naturally, we would never exceed the resolution of the input video.
Deciding the Bitrate for each Rendition
Once the number of renditions is decided, the next stage is to determine the bitrate for each rendition. Naturally, higher the quality of the rendition, the more bits it requires - but not all videos are created equal. Different types of videos require different bitrates to achieve 'high quality' - so we needed to be smart about choosing the bitrate. Here too, our experience with Office365 Video came in handy. We've observed, for example, that marketing videos are delivered at high bitrates, since they were most often produced by professional agencies. We also receive a ton of PowerPoint presentations which are captured at full 1080p HD resolution, but at very low bitrates - the screen has mostly static text content. And of course, there are legacy Windows Media videos that were authored for delivery over dialup connections on VGA monitors.
We used all this information to come up with a simple yet elegant function that measures the characteristics of the input video, and comes up with a recommended bitrate for that rendition. In our tests, this function is holding up well - the marketing videos end up getting encoded at close to 6 Mbps at 1080p, whereas a PowerPoint presentation would use just around 500 kbps.
When you are viewing a video on Stream, if you click on the bars icon on the bottom right, you can bring up the list of available renditions. The screenshot above shows how we processed a marketing video, with renditions going from 1080p down to 180p, with the bitrates changing accordingly.
The screenshot above shows how we processed a PowerPoint presentation, where the resolution was kept at full 1080p to preserve the clarity of text, but the assigned bitrate was much lower.
Need for Speed
It's great fun for our engineers to work on intelligent algorithms to produce great quality videos. But it would not have been much fun for you, if we made you wait all day to get a video processed and ready for streaming. So we also optimized our pipeline for speed. When you upload a video, we first generate a medium quality rendition as quickly as possible and release it to you for viewing/sharing. Then, in the background, we do the heavy lifting to generate all the other renditions required. We take pains to ensure your viewers have a seamless experience, so that as soon as the multiple renditions are ready, the player automagically discovers them and makes use of them.
How has your experience been with videos you've uploaded? Do you have ideas on how we could improve? Please add to our idea board on the Community site!