Remote display protocols for VDI: will RDP be enough?

It seems like hundreds of VDI solutions are popping up now. Some are more complete than others, but all share one common fact: they are server-based computing.

It seems like hundreds of VDI solutions are popping up now. Some are more complete than others, but all share one common fact: they are server-based computing. In other words, they all involve the remote execution of a Windows instance that sends screen updates across a network to a client display device. For years, this protocol was either RDP or ICA. Moving forward, however, this might not work. Quoting Provision Networks’ Peter Ghostine from a recent conversation:

Historically, ICA and RDP were designed to flush the video framebuffer to the client roughly once every 100 milliseconds, which is fine for most Windows GDI apps, but not suitable for graphics-intensive apps, the Aero experience, 3D apps, and especially apps that require audio-video synchronization. We hardly had much use for more than this functionality until recently. But now that VDI is being promoted as a "desktop replacement," remote display protocols will have to rise to the occasion.

In this world of VDI desktops replacing traditional local desktops, how much does the protocol matter? Where are ICA and RDP today? Who are the others players? This is what we’ll look at in today’s article.

Desktop remoting techniques

Fundamentally there are several different ways that a desktop running at one place can show up on a screen of a client at another location:

  • The “screen scrape” method
  • Screen scrape + multimedia redirection
  • Server graphics system virtualization
  • Hardware acceleration on the server and client


The general idea with “screen scraping” is that whatever graphical elements are painted to the “screen” on the host are then scraped by the protocol interface and sent down to the client. This can happen in two ways:

  • The client can contact the server and pull a new “snapshot” of the screen from the frame buffer. This is how VNC works.
  • The server can continuously push its screen activity to the client. This can be at the framebuffer level, the GDI / window manager level, or a combination of both. (This is how RDP and ICA work.)

Login Consultants' Benny Tritsch adds a note of caution regarding the term "screen scraping:" 

Over the years, this screen-scraping has become very advanced. RDP, ICA, and other protocols don't simply look at pixels on the screen and compress them into graphical images. Instead this process is enhanced by analyzing the screen content and identifying screen regions that are being reused (such as icons, fonts / glyphs, dialog boxes, etc. Those graphics elements can be cached at the client side, so if the host needs to send one of these elements, it only transmits the reference number of the cached element and the new coordinates. This dramtically reduces the amount of data transmitted and thus increases performance and user experience. This cached information can even be used for enhanced local echo effects, like Citrix's Speedscreen Local Text Echo for the standard GDI output.

So even though the specific term "screen scraping" is no longer an exact literal representation of what is happening, the term is used more broadly to describe this general concept.

Screen scrape + multimedia redirection

As most people reading this article know, any screen scraping-like approach works fine for applications that don’t have a lot of graphically intensive screen elements or where relatively low frame rates (~10 fps) are acceptable. But these approaches are not good with multimedia content.

The “screen scrape” method can be combined with “multimedia redirection,” a technique whereby server-side multimedia elements are sent in their native formats down to the client devices. Then the client can play the multimedia streams locally and dynamically insert them back into the proper position on the screen.

This works well if (1) your client has the technical capability and hardware specs to render the multimedia, and (2) your client has the proper codec installed so that it knows how to render the multimedia content. In effect, this means that your clients can’t be “too thin.”

This is what Citrix does in ICA with their “SpeedScreen” multimedia acceleration enhancements. It’s also what Wyse does in RDP with their TCX enhancements.

Server graphics system “virtualization”

"Virtualizing" the entire graphics system of the host was also explained to me in the conversation I had with Peter Ghostine, so I'll quote / paraphrase his explanation here:

In “virtualizing” the graphics system of the host, software on the host captures all possible graphical layers (GDI, WPF, DirectX, etc.) and renders them into a remote protocol stream (like RDP) where they’re sent down to the client as fast as possible. (Certainly much faster than the default of 10x per second.) This will give the client an experience which is very close to local performance, regardless of the client device (even on very low-end WinCE and Linux clients).

The challenge here is that GPU capabilities must exist on the server side where the rendering is taking place. This is fine if you plug a physical graphics card into physical hardware running a physical OS. But in a VDI scenario, your hypervisor must be able to virtualize the GPU just like any other piece of hardware. This means that the Windows desktop OS running inside the VM be able to detect the “virtual” GPU so that it can enable all of it’s cool graphical features.

This is what Calista Technologies does today: full desktop-like remote experience to any RDP client, even low-end ones, over the regular RDP protocol.

In the future, it's even conceivable that you could somehow hook this in to those GPU computing servers that are starting to hit the market now. (NVidia has a Tesla series of hardware which is basically 1U servers stuffed full of GPUs). (And fans. Lots of fans.)

Proprietary chipset-based solution

The final remote desktop option requires special hardware on the host and on the client side. Screen and video content is captured on the host via a special chipset and sent across the network in a proprietary way to a client device with a matching special chipset.

This is what Teradici does. Today their solution works with physical blades (with their special TERA chips) and their clients (also with TERA chips), but in the future something like this might (in theory anyway) work with something like the NVidia Tesla GPU server (except with a Teradici chip server instead).

What about bandwidth?

This "server-based computing" technology is of course also known as "thin client computing" technology. But what does "thin" refer too? The client device? The protocol? (The LCD screen? :)

In the early days of RDP and ICA, it could be said that the protocol was the "thin" part, and in fact many people used Terminal Server and Citrix to make three-tiered apps work across WAN links. But now that we're talking about remoting full and true desktops, that whole "20kbps" per session thing can be thrown out the window.

Regardless of protocol, regardless of technique, a true “desktop-like”  experience is only going to happen with bandwidth. Some of these approaches require more bandwidth than others. As Peter Ghostine said, “no one is going to be able to squeeze an elephant through the eye of a needle. While compression algorithms will always advance, if a user wants to watch a video at 24 frames per second, that’s a lot of data that needs to go across the network. Period."

Delivering a few business apps in 1998 via RDP or ICA is very different than delivering a whole and completely functional desktop in 2008!

Where does this leave RDP and ICA?

Microsoft’s RDP protocol is sort of the standard for a lot of remote computing conversations since it’s been built-in to Windows for the past eight years. RDP is a good protocol. RDP version 6, built-in to Windows Server 2008, will support all those new-fangled features like seamless windows, RemoteApps, TS EasyPrint, etc.

Fundamentally, Citrix’s ICA protocol is not that different than Microsoft’s RDP protocol. In practical use, yes, connections over ICA typically have a better experience than the same connection over RDP, but that’s because Citrix has only chosen to enable their advanced features (SpeedScreen, UPD printing, compression, virtual channel limits, etc.) when using the ICA protocol. It’s not because ICA is any different than RDP. This is why RDP was classified as "screen scraping," while ICA was classified as "screen scraping + multimedia enhancements.

Why does Citrix even bother with ICA today? Remember that Citrix actually developed ICA (and the multi-user kernel technology that eventually became Terminal Server) as an add-on to Windows on their own in the mid-1990s. When Citrix licensed their core “MultiWin” technology back to Microsoft in 1997, Citrix kept ICA for themselves. Microsoft went out and developed RDP on their own (actually based on some of the work they’d been doing with NetMeeting).

Sure Citrix could have just used RDP back then, but in 1997/1998 it was really important that Citrix had a hard-core feature to differentiate themselves from Terminal Server. (Remember this was before the days of application publishing.)

Over the years, RDP got better and better, but Citrix couldn’t ever “switch” because they spent so many years telling people how crappy RDP was. Plus, other companies like Provision and Ericom and Jetro came out with ICA-like extensions to RDP, and Citrix wanted to keep the ICA brand in order to discredit their competition as using “just” RDP.

The other (and lesser-known) driver that really required Citrix to hang on to ICA was that in versions of Windows before Server 2008, (even including Windows Server 2003), Microsoft didn’t expose everything that Citrix needed to integrate ICA with Terminal Server. This meant that Citrix had to find their own way of doing things, which in turn meant that the way ICA hooked into TS was very proprietary.

But in Windows Server 2008, Microsoft (with the help of Citrix I’m sure) finally created all the “proper” and fully-documented interfaces Citrix needs to snap ICA into Windows. This is great for Citrix! And it’s also great for Provision and Ericom and HOBlink and everyone else who wants to enhance RDP. (Ironically this also means that moving forward there is absolutely nothing holding Citrix to ICA except for marketing and backwards-compatibility.) But if they wanted to, Citrix could transfer all of their "+ multimedia" away from ICA and into RDP.

What will VMware, Microsoft and the other VDI vendors do?

By now I hope you understand that if you want to do VDI with a “real” local desktop experience, you need more than pure RDP to make it happen. (Well, perhaps I should phrase that as "I hope you understand that I think this is more than traditional RDP.")

Citrix has announced their VDI strategy based around their XenDesktop product—a combination of Citrix Desktop Server, Citrix Provisioning Server (Ardence), and XenServer. XenDesktop will use the ICA protocol with direct connections into workstation VMs. Citrix will leverage the same SpeedScreen multimedia acceleration technologies as Presentation Server to deliver a decent remote desktop experience beyond what “pure ICA” could do. So they’re all set.

Teradici and Calista Technolgies are doing some interesting things with regards to server-side hardware and software, so they’re all set too.

Wyse and Provision Networks are enhancing the RDP protocol in much the same way that Citrix is enhancing the ICA protocol, so they’re all set.

Several other VDI vendors are doing interesting things in the protocol space too. Qumranet (creators of the KVM “kernel virtualization module” for Linux) have developed a purpose-built remote desktop protocol called SPICE, so they’re all set.

DeskTone is using something they call the “dynamic best fit” protocol for remoting desktops, so (one assumes) they’re all set too.

Who’s missing from this “all set” list? The two biggest vendors are VMware and Microsoft, who both (at least today) are basing their VDI solutions around an unmodified RDP.

I want to reiterate that I’m not suggesting that RDP is “bad” for VDI or that it can’t be used. My point is that RDP will work fine for a lot of line-of-business apps where 10fps is acceptable and not too much changes on the screen. But for companies looking to replace their desktops (and all of their apps), RDP by itself is just not going to cut it.

What will VMware do?

Good question.

There’s a new standard that’s working it’s way through the Video Electronics Standards Association (VESA) called “Net2Display.” The basic idea is that Net2Display will be a remote display protocol (like RDP, ICA, X, VNC, etc.) that’s purpose-built for remoting entire desktops to remote clients (with full USB support even!). This standard is being developed now (and should be ready very soon), and will be available to for basically any company to use when they build their VDI software/server/client/whatever.

There’s not too much available on Net2Display right now. (There's a four-page overview that's very good. Direct link to PDF.) This paper is very recent and even talks about RDP 6. What’s interesting is that the people on the Net2Display committee work for companies we all know in this space: IBM, Teradici, DeskTone, Avocent (the IP-KVM people who are probably nervous about all this remoting), and...VMware!

Who knows where this Net2Display standard will go, but the people behind it are no dummies. The idea of an open standard for this kind of thing as opposed to a proprietary vendor-controlled protocol is extremely interesting.

Of course VMware could also just buy one of the other companies mentioned already instead of trying to develop something from scratch.

What will Microsoft do?

Maybe they’ll ditch RDP and modify Terminal Server and Vista to also meet the Net2Display standard?

Ok, maybe not. But they're also no dummies. Microsoft have a lot of developers and a lot of motivation to make sure users' Windows desktop (and especially Areo) experiences are as good as they can be. And you can bet they're not going to do that via an old RDP. Benny Tritsch suggests some things that Microsoft could do:

  1. Microsoft changed the window manager architecture in Vista and Server 2008. The new GDI framebuffer is embedded in the WPF hierarchy instead of being a single standalone layer of pixels. Today, Server 2008's RDP only uses the single GDI framebuffer instead of remoting the entire hierarchy. What if RDP were extended to remote the entire WPF framebuffer tree?
  2. .NET includes concepts like .NET Remoting which allows the communication of application components over the network. Will Microsoft sort of merge RDP and .NET remoting? (Today, .NET remoting does not include the transmission of graphics objects as we use them for SBC.)
  3. Microsoft uses real-time protocols (RTP) for their conferencing software. Can this be used to “tunnel” RDP in a better way? This reminds us a little bit of what Citrix does via Session Reliability / CGP for improving the stability of the ICA protocol when it’s being used on lower-quality networks. The same could be done for reducing delays by using a real-time protocol as a transport media for RDP.

Fun times indeed! And big thanks to Peter Ghostine and Benny Tritsch for contributing their ideas and visions as to where we're headed.

Dig Deeper on Virtual desktop infrastructure and architecture