Skip navigation

Monthly Archives: January 2011

Over the last year or so I have been (very) quietly working on a new project. It is a complete media center environment similar to XBMC, Boxee and MythTV, but with some twists which I think will add a lot to the domain. The system is using GStreamer for handling the details of media files, and OpenGL (of course!) for displaying the user interface and video.

It is interesting to note that 2 out of the three big players in the open source HTPC domain are based on the same code base. Building an application of this scale is a difficult process. It’s taken quite a lot of time and effort to get as far as I have, and there is still a lot of work left to do.

One of the first big technical challenges was welding my chosen media framework (GStreamer) with OpenGL efficiently. Since I am using Mono and C# to write the application, I at first aimed at separating Mono’s performance profile from the potential performance of the video system by passing off the video frame conversion / OpenGL texture uploading to the experimental Gstreamer-OpenGL plugins, which are as of yet not shipped with Ubuntu out of the box. While I knew this would add a dependency, I assumed (incorrectly as you will see) that the performance boost would more than make up for the additional installation complexity.

The mechanism that facilitates texture sharing between the Gstreamer GL plugins and the larger OpenGL application is GL context sharing. Basically, the application process is allocated multiple OpenGL “contexts” which can be switched between using a platform-specific API. On Linux it is GLX context sharing and on Windows it is WGL. Mac OS X has a similar context sharing feature in their AGL APIs. Basically my plan was to use the “glupload” element to convert video/x-raw-yuv video frames from GStreamer to video/x-opengl frames, which consist merely of a 4 byte texture identifier. “glupload” launches a new thread that reads a YUV buffer, converts it to RGB (using GLSL shaders if possible), reuses/creates a new texture to hold the data, and uploads it. Then it sends this identifier down the line to the next element. GLUpload (and apparently the rest of the GL elements) are additionally able to drop the video onto a private video window, though this was of little importance for my purposes.

It didn’t take a long time to come up with the initial code to implement this scheme. And it didn’t take long before I could watch garbled texture memory racing across the otherwise pristinely rendered scene. In between studying at university I tried to determine what I was doing wrong… why wasn’t I getting a picture!? After some time I began to suspect that the problem was not my doing. No matter how much synchronization I did between the glupload thread and the renderer thread, I could not get a clear picture. At about this time I began to look around for other people making use of texture sharing, particularly on the Intel drivers I was using for my laptop. It turns out that since context sharing is almost never used these days, the Intel drivers have severe synchronization bugs of their own which prevent the feature from working at all. The same code that gave me problems on Intel worked fine on a system with an NVIDIA GPU. Unfortunately I did not, and do not today, have a system at this time which has a discrete GPU. I would suspect that many users of my software won’t have such hardware as well.

I decided that my application would require an alternative method of converting video frames to useful OpenGL textures. Since I knew that this code would have to become a branched implementation that could use different techniques on different hardware, I decided to challenge my assumption that the straightforward path would be too slow to be useful. After all, if it WAS too slow we could instead use the threaded GLupload code path for hardware that supports and call it technical limitation.

With my newly found extra time over winter break I have managed to write the new code path. It is both capable of using GStreamer’s autoconvert element or using a YUV-to-RGB GLSL shader for hardware conversion inside the rendering pipeline itself.  The playbin feeds its video frames to an appsink, and my app then passes the frame buffer directly to OpenGL to bring the video data into a texture. This process can have a bit of CPU grinding if the GLSL shader or autoconvert operation might not be fully hardware accelerated.

Once all of this was fully implemented I could assess the level of performance that the app could deliver on my mid-range 2010 laptop (Thinkpad SL410 Core 2 Duo @ 2.0ghz with 3GB DDR2 and Intel Mobile 4 graphics). SD video played adequately, but 720p and up plays jaggy compared to Totem and other video players which can easily dole HD with swagger.

I set out to optimize this code path to get HD into the fold. I could not offer an HTPC application that uses OpenGL for fancy flashy graphics and then deliver substandard video performance compared to less flashy, more vanilla media players. After playing with the details a bit I discovered that my use of mipmaps on the video frames was both wholly unnecessary and tripling (or more) the time needed to upload a frame to OpenGL. Mipmaps are miniaturized copies of the image data which are used to speed up textured rendering when the texture needs to be scaled down significantly. This is most useful in 3D games (where OpenGL came from) where the constantly changing camera angle and distance can easily lead to rendering a large amount of textures in a single frame. Mipmapping attempts to rectify this by imitating a property of real life:  We can’t see the same detail from far away as we see up close.

The video frames in my application are either drawn at full size (or somewhere very near it) or in a small minitiaturized form in the corner of the window. Additionally, the texture will be drawn no more than 2-3 times for every frame, unlike many uses of mipmapping where a texture is tiled or drawn many many times in a single frame. Once gluBuild2DMipmaps() was replaced with glTexImage2D(), HD video in my application reached almost identical performance levels as Totem and the rest of the optimized, stable video player applications. Performance isn’t _perfect_ on this little laptop, but it’s clear that this code path will scale nicely up to the stronger CPU/GPU combos found in HTPC and enthusiast/gaming rigs which my app is most likely to be deployed on.

So I’ve now got several pluggable implementations for creating OpenGL Gstreamer video sinks, which should allow for faster video for capable drivers using GST-GL while falling back to the regulo ole manual pump-based solution. Additionally, this will also allow us to make the gstreamer-opengl package optional, since as mentioned earlier, it is not included in all the Linux distributions yet.