Microsoft Silverlight and Smooth Streaming – and why you should care
July 18, 2009
Microsoft Silverlight 3 was released last week and this seems like a good time to write an article about the technology. This article will provide a quick overview of adaptive streaming which is a key feature of Silverlight. Microsoft’s Silverlight is well positioned to compete with Adobe Flash which is the other dominant adaptive streaming technology. In particular, this article will discuss what adaptive streaming means to the videophile community. If time permits, I may do a second and more in depth article on the technology itself.
Let me start by saying that I hate to come out and say it, but I have this recurring perception of Microsoft as a tech giant that is more interested in strip mining customers for every last dollar than doing something truly innovative (Vista comes to mind). Sometimes change for the sake of change can be used as an excuse to generate profits without providing true value to the customers. Every once in awhile though, something comes out of Microsoft that shakes this negative perception and makes me realize that they actually do have some smart people in Redmond (although having met Stacey Spears I already knew that). Microsoft Silverlight is one such product.
Prior State of the Art in Video Streaming
First, let’s discuss traditional video streaming and examine the issues and see why something better is needed. Traditional streaming video has used a “Progressive download” format where a client requests a video and the server satisfies this request by dumping a a lot of data in file transfer fashion over the network which the client queues up in a play-out buffer. Once the play-out buffer has filled up sufficiently to ensure a few seconds of uninterrupted playback, the video is displayed. This is a very common approach that is still in widespread use today. Unfortunately this ”dump and play” format has a lot of drawbacks that I’ll highlight below:
- Dump and Play places large unrestricted bandwidth demands on a network. The dump process will use as much bandwidth as is available so that it can transfer as much data to the client as quickly as it can. This is great for downloading files, but if a user is going to be downloading and watching say a 2 hr video it might be better to throttle (control) the download and spread it out over time and give some of the network bandwidth to other users. In some cases, throttling mechanisms exist, but it requires equipment, planning, usage policies, etc., and all of this “OA&M” adds cost to the content providers.
- If the user terminates playback prematurely, the play-out buffer is wasted which in turn means wasted bandwidth that could have been allocated to other users.
- Random Access is a cumbersome process with the dump and play format because a user may not have yet downloaded the new playback location. If this is the case, then satisfying this location request means tossing the old play-out buffer and streaming from the new location which also means wasted bandwidth. It can also mean long delays while the new play-out buffer is filled.
- Streamed files are difficult to cache and current systems do not scale well.
- Older streaming technology isn’t web friendly and it often uses custom ports that aren’t well known, which means that they may be blocked by firewalls.
- TCP is often used as the transport, which while reliable, can add retransmission delays if data is lost. These retransmission delays can be detrimental to the user experience if the user wants smooth uninterrupted playback (although at the possible expense of additional video noise represented by the missing data).
- In addition to the retransmission issue, TCP is also not an ideal transport from a video streaming perspective because it incorporates a congestion control mechanism called the, “Nagle Algorithm”. This means that data is initially sent slowly using a reduced window size which is gradually ramped up over time so long as the data is received okay on the other end. For video streaming we need to send the data quickly, as needed and on demand and without outmoded bandwidth restrictions.
To solve the dump and play problems described above, various improvements were attempted such as using RTSP as a protocol and download bandwidth thinning. All of these new features improved things for both the user and the content data networks (CDN), but it despite these improvements however, serious problems remained.
Enter Smooth Streaming (aka Silverlight)
Adaptive Streaming and in particular Microsoft’s Silverlight incorporate several clever features designed to improve life for both the user and the CDNs. Rather than look at video streaming as a large file transfer with local playback at the client, adaptive streaming is designed to break up the video into small manageable chunks and send the client only the few seconds of video that is currently needed. This helps to ensure a fast and responsive user experience and it eliminates wasted download data and therefore wasted bandwidth. The first question that may come to mind though is, “well that sounds great, but what happens if the network is slow”. This is where the “Adaptive” in adaptive streaming comes in. The video being downloaded isn’t just one encoded stream but is actually encoded in multiple formats from higher quality (higher encoder bit rates and resolutions) to lower quality (lower bit rates and resolution) and each is available as a separate stream. The client is then free to request video from any of the available streams.
If the client is not able to receive the video chunks in a timely fashion (fast enough to ensure uninterrupted playback), it can request a stream where the video chunks are smaller and hopefully, easier for the network to send. It’s important to realize that the smaller amounts of data don’t represent shorter intervals of time, instead they represent video data that has been encoded at a lower resolution or a lower bit rate or more typically, a combination of both. For a user with a fast connection for example, chunks of video data all encoded at high resolution and high bit rates may be sent while a user with a slower connection will get video data of a lesser quality. In both cases however, the user experience in terms of responsiveness should be similar.
Adaptively switching between streams seems easy enough in theory, but MPEG technology was built on the premise of video frames being encoded in a form that represents differences between past and future frames. Randomly picking a location from a stream that starts on say a P or B frame will result in a messy image that may not even be recognizable. To fix this situation, each chunk from all of the streams must start at the beginning of a GOP and each GOP must be closed and not have references to other GOPs. This allows the adaptive switching to be seamless and the only artifacts that the user will see is a difference in the quality of the video.
Since data is being sent as small chunks or fragments, Microsoft has opted to use a new file format based on a Mpeg-4 container format which is the first new file format used by Microsoft since ASP. Since each chunk of video data is small and completely self contained, the client need only rely on it’s file-name as a way to reference its stream location and quality which makes the files easily cacheable by the CDN and therefore easily shared by other clients who may at some point want to download the same file.
Silverlight makes full benefit of the fact that these files can be cached by using HTTP as the download mechanism for fetching these files. In addition to smooth streaming, using standard HTTP commands to fetch these files is the other key benefit of Silverlight. Suddenly, we have an architecture that is completely compatible with the world wide web and we’ve gone from an older style video streaming architecture with heavyweight file transfer operations, custom servers, bandwidth issues, etc. to something that pulls data on demand from the client side via HTTP and is therefore very representative and compatible with typical web and HTTP operations.
So let’s summarize the benefits that we’ve talked about so far:
- User interactions are quick and responsive.
- Bandwidth usage is more predictable.
- More users can be supported over a given bandwidth because every user consumes only the bandwidth that they are currently viewing rather than downloading large chunks to be played later.
- HTTP based access plays well with the Internet (no firewall problems). This also facilitates video playback from a browser (note: up until version 3, Silverlight was a browser plug-in technology, but version 3 supports desktop playback as well).
- Small files are completely cacheable so that the “edge caching” that is already in place by CDNs can be used.
- No specialty Infrastructure is required – HTTP Internet equipment will do the job.
- Since a file isn’t transferred in it’s entirety, the only portions of a video that may be resident on the client is the small chunk being played, so in theory, copyright protection is improved.
In addition, Silverlight 3 also has some other key features that we haven’t yet discussed, including:
- Improved Digital Rights Management (DRM).
- A rich multimedia toolkit that allows videos to be interactive.
- A 3D graphical toolkit that allows for transformations, animations, etc.
So what does this mean to the Videophile Community?
So far Silverlight seems like a huge positive and a very clever solution to a problem that has been in need of a fix for quite sometime. Because of these advantages, it is very likely that this technology will be rapidly adopted. Here is the rub though. There are really two use case scenarios that need to be addressed. The first use case represents viewers who are surfing the web and viewing something like youtube for example. In this scenario the user wants fast response and they may be heavily interacting with the stream - Starting, stopping, replaying segments, skipping ahead, etc. In this scenario where fast response and quick playback is the norm, Silverlight is a huge boon to both the users and the CDNs. But let’s examine the use case represented by the videophile who wants to stream and view a movie in its entirety. In this scenario, user interactions will be very limited and the viewer is most interested in viewing their movie with as high of quality as possible. In this scenario quick playback will take a back seat to video quality and the user may be willing to queue up the video 30 minutes or an hour before hand, if this means that the quality of the video will be the best that it can be. In this scenario, all forms of adaptive streaming that rely on transient switching to a lower quality stream may come up short.
Unfortunately, the videophile community is small and it’s unlikely that the second use case described above will weigh much on the minds of content providers. In the spectrum of video quality, this means that most of these adaptive streaming tehnologies will fall well below the current king of consumer video quality the Blu-Ray Disc. It’s likely to fall well below even broadcast HD quality. The videophile community has few options other than to vote with your pocketbook and use your collective purchasing power to favor those content providers who are streaming movies with the best video quality. Hopefully at some point videophiles can have the best of both worlds. Fast, responsive, streaming video when surfing the web and high quality streaming video when watching an uninterupted 2 hr movie.