Skip navigation

Since I last updated this blog, I got a job at 906 Technologies, a local computer repair and website development shop. This was driven by my need to get out of my childhood home and become independent, but it has turned into one of the most useful learning experiences of my life. We primarily work on web projects using the LAMP paradigm (Linux, Apache, MySQL, and PHP). I have worked on a number of web projects before I came to work here, but this is where I learned of the true power of the web. While the web is still young and has it’s share of difficulties, it has become clear to me that HTML5 is the development platform of the future. It is simply fruitless to start a piece of software on any other platform. By the time it’s ready, the world will be done with your platform, and it will be demanding your software to work on devices you never intended it to work on. HTML5 lessens this burden extensively.

HTML5 goes hand in hand with the cloud. And while the cloud has its share of both patrons and detractors, HTML5 is now reaching levels of performance which make it attractive as a local, native platform. And it can easily become a local, native platform as soon as you equip a computer with a (secured) web server.

And this was the foundation for my decision to reboot Silverscreen, my HTPC solution. Silverscreen before was a C# app written to utilize GStreamer and OpenGL to deliver video in gorgeous fashion to the television and film enthusiasts among us. C# as a platform has serious issues in terms of ease-of-deployment, and is very philosophically limited in it’s use on non-Microsoft platforms. C# also has less support out of the box from content vendors. The most ironic situation was my near inability to interface directly with Netflix, despite the fact that Netflix is delivered using Microsoft Silverlight, which is a platform based on C# and .NET.

Yes, an HTPC application built in HTML5, but also deliverable as an application just like Boxee or XBMC. This means Silverscreen will work without installation from the web, or it can be downloaded and installed and enjoyed when no network connection is around. 

I have replaced C# with the Web, and it opens new possibilities which will change the course of entertainment history forever.

Over the last year or so I have been (very) quietly working on a new project. It is a complete media center environment similar to XBMC, Boxee and MythTV, but with some twists which I think will add a lot to the domain. The system is using GStreamer for handling the details of media files, and OpenGL (of course!) for displaying the user interface and video.

It is interesting to note that 2 out of the three big players in the open source HTPC domain are based on the same code base. Building an application of this scale is a difficult process. It’s taken quite a lot of time and effort to get as far as I have, and there is still a lot of work left to do.

One of the first big technical challenges was welding my chosen media framework (GStreamer) with OpenGL efficiently. Since I am using Mono and C# to write the application, I at first aimed at separating Mono’s performance profile from the potential performance of the video system by passing off the video frame conversion / OpenGL texture uploading to the experimental Gstreamer-OpenGL plugins, which are as of yet not shipped with Ubuntu out of the box. While I knew this would add a dependency, I assumed (incorrectly as you will see) that the performance boost would more than make up for the additional installation complexity.

The mechanism that facilitates texture sharing between the Gstreamer GL plugins and the larger OpenGL application is GL context sharing. Basically, the application process is allocated multiple OpenGL “contexts” which can be switched between using a platform-specific API. On Linux it is GLX context sharing and on Windows it is WGL. Mac OS X has a similar context sharing feature in their AGL APIs. Basically my plan was to use the “glupload” element to convert video/x-raw-yuv video frames from GStreamer to video/x-opengl frames, which consist merely of a 4 byte texture identifier. “glupload” launches a new thread that reads a YUV buffer, converts it to RGB (using GLSL shaders if possible), reuses/creates a new texture to hold the data, and uploads it. Then it sends this identifier down the line to the next element. GLUpload (and apparently the rest of the GL elements) are additionally able to drop the video onto a private video window, though this was of little importance for my purposes.

It didn’t take a long time to come up with the initial code to implement this scheme. And it didn’t take long before I could watch garbled texture memory racing across the otherwise pristinely rendered scene. In between studying at university I tried to determine what I was doing wrong… why wasn’t I getting a picture!? After some time I began to suspect that the problem was not my doing. No matter how much synchronization I did between the glupload thread and the renderer thread, I could not get a clear picture. At about this time I began to look around for other people making use of texture sharing, particularly on the Intel drivers I was using for my laptop. It turns out that since context sharing is almost never used these days, the Intel drivers have severe synchronization bugs of their own which prevent the feature from working at all. The same code that gave me problems on Intel worked fine on a system with an NVIDIA GPU. Unfortunately I did not, and do not today, have a system at this time which has a discrete GPU. I would suspect that many users of my software won’t have such hardware as well.

I decided that my application would require an alternative method of converting video frames to useful OpenGL textures. Since I knew that this code would have to become a branched implementation that could use different techniques on different hardware, I decided to challenge my assumption that the straightforward path would be too slow to be useful. After all, if it WAS too slow we could instead use the threaded GLupload code path for hardware that supports and call it technical limitation.

With my newly found extra time over winter break I have managed to write the new code path. It is both capable of using GStreamer’s autoconvert element or using a YUV-to-RGB GLSL shader for hardware conversion inside the rendering pipeline itself.  The playbin feeds its video frames to an appsink, and my app then passes the frame buffer directly to OpenGL to bring the video data into a texture. This process can have a bit of CPU grinding if the GLSL shader or autoconvert operation might not be fully hardware accelerated.

Once all of this was fully implemented I could assess the level of performance that the app could deliver on my mid-range 2010 laptop (Thinkpad SL410 Core 2 Duo @ 2.0ghz with 3GB DDR2 and Intel Mobile 4 graphics). SD video played adequately, but 720p and up plays jaggy compared to Totem and other video players which can easily dole HD with swagger.

I set out to optimize this code path to get HD into the fold. I could not offer an HTPC application that uses OpenGL for fancy flashy graphics and then deliver substandard video performance compared to less flashy, more vanilla media players. After playing with the details a bit I discovered that my use of mipmaps on the video frames was both wholly unnecessary and tripling (or more) the time needed to upload a frame to OpenGL. Mipmaps are miniaturized copies of the image data which are used to speed up textured rendering when the texture needs to be scaled down significantly. This is most useful in 3D games (where OpenGL came from) where the constantly changing camera angle and distance can easily lead to rendering a large amount of textures in a single frame. Mipmapping attempts to rectify this by imitating a property of real life:  We can’t see the same detail from far away as we see up close.

The video frames in my application are either drawn at full size (or somewhere very near it) or in a small minitiaturized form in the corner of the window. Additionally, the texture will be drawn no more than 2-3 times for every frame, unlike many uses of mipmapping where a texture is tiled or drawn many many times in a single frame. Once gluBuild2DMipmaps() was replaced with glTexImage2D(), HD video in my application reached almost identical performance levels as Totem and the rest of the optimized, stable video player applications. Performance isn’t _perfect_ on this little laptop, but it’s clear that this code path will scale nicely up to the stronger CPU/GPU combos found in HTPC and enthusiast/gaming rigs which my app is most likely to be deployed on.

So I’ve now got several pluggable implementations for creating OpenGL Gstreamer video sinks, which should allow for faster video for capable drivers using GST-GL while falling back to the regulo ole manual pump-based solution. Additionally, this will also allow us to make the gstreamer-opengl package optional, since as mentioned earlier, it is not included in all the Linux distributions yet.

So this semester I took CS322 Principles Of Programming with one John Sarkela, who was there at Xerox Parc in the days when object-oriented programming and the graphical user interface were being invented. We studied Forth and Scheme before the final section of the class where we learned Smalltalk. Smalltalk is Mr. Sarkela’s area of expertise, so I was enthused to be learning the basics from someone so well versed in the idioms of the language and community.

One of the first things I noticed was how much I liked the environment. Those who know will tell you that Smalltalk was where most of the fancy development tools we use today originated. Things like refactoring, IDE code intelligence, class browsing and builtin debugging were firsts when Smalltalk introduced them, and they are in excellent form.

Since Smalltalk is an image-based system, all code and objects live literally forever, until you lose or delete the image which contains them. You can use a workspace to assign a variable an object, save the image and exit, and when you start it up later, the object will be waiting for you, just as you left it. With a system like this, the environment can let you do strange things like modify the structure of a user interface on the fly. Also, since code is just another object in the system, the source code is carried around with it, and each image could have vastly different implementations of the system APIs or indeed any code in the system. This is very, very useful for molding the system libraries and frameworks to do exactly what you need, without having to maintain the housekeeping details like you would with other languages (ie, a branch in DVCS or the directory structure where you keep your modified libraries).

So the image-based nature of Smalltalk does great things for the system as a development environment, as these inclinations tend to allow the programmer to organize his/her environment in just the way they want. But the disparity between “the image” and the “the executable” has become, in my opinion, one of the biggest stumbling blocks for teaching the virtues of message-oriented objects to programmers trained with Java/C# and C++.

Dear KDE 4, we need to talk.

Don’t get me wrong KDE 4, you are an awesome beast.

But I have some big gripes with you, and they are very sad gripes indeed. I don’t know who’s responsible for them and I don’t really care. I just hope they get fixed by the time 4.4 is official.

For one thing, KDE wallet needs to go or be seriously rehauled. It is not uncommon when you first log in and your session is restored from your previous boot that you will be baraged with 7 or so KDE wallet password windows. KNetworkManager asks for my wallet password presumably to store a security key even when we’re connecting to an unsecured network. Then, even if I entered my password on the first window, Kopete asks for my wallet password because the window was lurking behind the first one, and if you don’t want to type it and didn’t enter the password on the first window (because say, you’re trying to just get something done without answering the stupid barrage of password windows in the center of your screen) it will proceed to pop up password requests for all the IM accounts you have configured. Personally I have six of them. I also use some pretty fancy window effects which make clicking through those password windows really painful.

At one point when reconnecting after losing my wireless connection, I literally was hit with about 14 password windows for no apparent reason. I never want to see that stupid window modal and center again. If you must have it put it in the damn corner where it belongs.

Naturally these are just two applications (KNetworkManager, Kopete) which have managed to throw 7 password windows at me. I just want them to go away at this point, I don’t care if I can’t log in to my IM services.

Second point, the folder view in Plasma is just broken out of all hell. The whole point of having folder view (afaict) is to allow you to have multiple views of your filesystem, not just a single limited Desktop folder. However, folder views are notoriously unable to remember where they are supposed to be, nor what size they are. So you come to a situation like this when you log in:

KDE 4: We add new stuff for a specific purpose and then leave it broken so that purpose is left unfulfilled.

No matter how much you try, they never stay in place.

What’s worse, oftentimes you will be sizing your folder views and they will suddenly bounce into the wrong position and become stuck in their place. No plasma dongle on the side of the folder view at all. You can still interact with it but you cannot move it around.

KRunner is so great. I got rid of my KDE launcher button in the corner because I just don’t use it and both launchers suck pretty bad (ESPECIALLY Lancelot: possibly the worst possible design evar). Problem though is that the autocomplete functionality is broken. If you type, say ‘kcha’ and you see in gray ‘rselect’, then press enter, nothing happens. You must first press arrow left to complete the text. This is so stupid it’s mind-numbing, and makes a lot of the utility of KRunner just *disappear*.

Also, though probably not KDE’s fault at all, kde4-window-decorator for compiz has the most vicious bug where hundreds of fake square windows are launched for no reason continuously until there’s 336 windows open, all grouped on your taskbar (thankfully!!). Each time the system launches one of these windows it steals focus from what I’m typing in, so for awhile every time I logged in I had to fight it for control of the KRunner window to fervently type “emerald –replace” and press enter. Luckily I found out what setting compiz was using to launch the kde4 decorator and switched it to Emerald permanently.

On a different note, good luck with the stupid KDE SC branding crap. If you want to run a marketing campaign at least come up with a good one.

In conclusion, I hope you get your act together KDE (and to an equally large extent Canonical for Kubuntu). KDE 3.5 was fantastic for it’s era, and I just want a version of KDE 4 with that same rock solid stability. Please!?

This is the third part of my Python iPhone development series. Here are the first and second parts.

Some performance testing
I’ve done some preliminary performance testing with results you can seethe at. It takes approximately 7 seconds to start iPhone Python apps. This is excruciatingly slow, especially for some of the apps I’m writing which replace builtin apps that load damn-near instantly. I’ve observed that almost 100% of this time is spent loading PyObjC and performing bundle/function loading (objc.loadBundle/obj.loadBundleFunctions)

There’s virtually nothing that can be done about this except to somehow make PyObjC faster. On the bright side, defining a large amount of classes (like enumerations) adds almost no startup time, which is good for the toolkit I’m writing (see below).

Performance within the apps is pretty decent, but here it is obvious that there is some overhead in the translation between Objective C and Python. I created a simple custom dialer control using CoreGraphics, implemented with a single image displayed nine times (one for each digit) and 9 seperate renderings of a single character each. When you tap each button, the whole view is repainted with the selected button highlighted, and then repainted again when the finger is lifted. There is a noticeable lag when redrawing. This is a naive way to implement a control (drawRect gives you a clipping rectangle for a reason, use it!), but the performance of such simple operations should presumably be better. I’m well aware that CoreGraphics is not the fastest public graphics API available and have not drafted a similar test in Objective C for lack of a toolchain to do so, so please let me know if this operation in CoreGraphics just sucks.

Memory usage is actually quite reasonable, which is very good considering the paltry 50-60M of RAM apps have to work with when you subtract the amount used by the iPhoneOS and Springboard. A simple app I wrote to display debug logs weighs in at an RSIZE of about 10M (VSIZE 67M). Comparatively, MobileSafari in it’s native backgrounded state with two tabs open weighs 16M RSIZE, 95M VSIZE. MobileAddressBook (a small, simple application similiar in scope to my debug logs app) weighs 12M RSIZE, 68M VSIZE when in the foreground. These are not scientifically gathered numbers, but it’s not unreasonable to assume this is pretty typical. Even with the additional weight of the Python runtime, memory usage is not significantly increased.

Springboard App Launching Restrictions

Perhaps these are documented elsewhere but it’s worthwhile to get another copy out onto the web. Springboard exercises a number of restrictions on the apps that it starts. These restrictions do not apply to normal processes in so much as that matters considering you cannot run UIKit apps from the command line anyway.

  • The process which calls UIApplicationMain must lie within the application’s bundle. This is why the example Python app bundles contain a symbolic link to /usr/bin/python. This means you cannot directly start a Python app using the interpreter directive (“#!/usr/bin/python”)
  • The process ID must be the same as the one spawned by Springboard. This is why the BASH script does ‘exec’ and it must be this way. You cannot spawn a new process to take care of the app’s UI, though it seems to be perfectly acceptable to start processes to do other things.
  • When Springboard launches an app which uses a script as it’s executable, it checks that the interpreter directive points to an executable within ‘/bin’ (there may be other allowed applications but ‘/bin’ is the only one I’ve confirmed). Scripts using /usr/bin interpreters (or any other directory) will not be executed.

So why does this matter? Well currently Python apps on the iPhone are a bit clunky. You’ve got a BASH script which execs your symbolic link to Python with your app’s main python script as an argument, and optionally does some logging redirection. I discovered most of these limitations by trying to take BASH out of the equation. I failed at this for these reasons, though it seems entirely possible to create a simple C app which does the setup much quicker than loading BASH. For now though I’ve found a clever solution, which at least cleans up an app’s directory structure. Toss this header onto your app’s main Python script, make it executable, and set it in your Info.plist’s CFBundleExecutable key:

# This section is a BASH initialization script which sets up logging of stderr
# to /var/mobile/.err.log to catch Python errors
exec "$(dirname "$0")"/Python -u $0 2>/var/mobile/`basename $0 .py`.err.log
# -----------------------------------------------------------------------#
# End BASH, Begin Python """
import sys

As you might be able to see, the script is first executed by BASH. The line “”””: is both a no-op in BASH and the beginning of a
multi-line comment in Python (the last double quote and colon are ignored as part of the comment).
Since we do an ‘exec’ within BASH on ourself, nothing past the long dash line is executed. When Python takes over, it starts interpreting from the ‘import sys’ line. This approach is clean and simple, allows you to free up slots in your editor, and makes editing your logging directives easier. As in my previous BASH app launcher examples, this will create a log in /var/mobile with the name MyCoolApp.err.log, where MyCoolApp is the name of your app (at least, according to the bundle’s directory name).

An iPhone Toolkit for Python

As I briefly mentioned in the last installment, I am working on a Python toolkit for iPhone app development. The toolkit will do all the PyObjC wrapping for you, and will provide a large number of convenience classes which ease the process of writing your apps. The toolkit will also include a set of scripts which make testing and debugging your code much easier.

A surprising number of iPhone APIs are C-oriented, probably owing to the limited storage and memory of the devices which employ iPhoneOS. These APIs will be offered in their awkward C glory in addition to Python-specific APIs which are much easier to use. The toolkit already contains a Python class for CoreGraphics contexts, and (more or less) most of the API is already covered. The toolkit will strive to create bindings for private frameworks as well (to the extent possible, given the near complete lack of community documentation on them). Why would we not provide them considering Apple has already banned your Python apps from being listed in the app store?

Having log files of your app’s standard output and python errors is great, but having to type a command to print the log or in some way refresh your view of it is tedious. It would be better if we had a view of the output log as it happened.

Initially I tried to create a cradle of sorts for launching Python apps in a “debugging mode”. This mode would detect exceptions and Python crashes and pop into the debug log viewer application I wrote. This MIGHT still be possible, but has so far thwarted my discovery due to the Springboard restrictions listed above. If anyone has more information on how to facilitate this I would love to know.

It is possible however to have a live view of Python’s output. Like any decent UNIX implementation, iPhoneOS supports named pipes, also known as FIFOs. Named pipes are special files which allow two processes to send data to each other. All we need to do is modify our logging code to write to a FIFO, and at the same time read it from another process.

So one of the scripts included in my new toolkit does just this. Once launched, it clears the screen and starts reading the FIFO. This blocks until your app opens the FIFO for writing. The data is then written to the screen, and when your app quits and closes the file descriptor, the script stops reading the data. Stick it in an infinite while loop and you have an incredibly useful error console. No more printing log files!

Setting up your development environment every time you boot can be tedious. I’ve created a script to do the lifting for you automatically. All tools provided with the toolkit use a unified method of determining if your iPhoneOS device is connected, and allows you to mount it on your own if you please without errors.

All this and more will be released as alpha software in my next installment, so stay tuned!