Skip navigation

Monthly Archives: September 2009

I am writing a media metadata subsystem capable of understanding a vast amount of detail about the content that a media file contains. It uses the file name information to produce a metadata hint, then uses pluggable services to retrieve potential matches as to the content. The result (at least so far) is a tool that can examine a media file and tell you exactly what movie, television episode, etc it contains with very good accuracy, including all the appropriate metadata (even individual episode descriptions!)

If this sounds a lot like what Boxee does, that’s because it does do this. Unfortunately though, Boxee analysis is rudimentary, and requires particular ordering and formatting of the file name elements (title, year, episode number, etc) to achieve decent reliability of listings retrieved by the client.

Can’t find a listing? Those Scene Tags Are In The Way.

MediaExpert’s approach is different: it filters the “title string” (initially, the file name without the extension, and all non-alphanumeric characters replaced with spaces) until it has something it believes is very likely the title of the content. It does this by recognizing common scene tags and removing them from the title string which will be searched by metadata services. The tags are stored in the hint information, allowing the application or other filter plugins to make use of the information. It already recognizes a great deal of tags which are formatting or source related (ex. 480p, DVDRip). It also recognizes some of the most popular scene release groups. Finally, it treats any words found within square brackets ([, ]) as scene tags. It looks for a four digit number which starts with either 1 or 2 and if found, considers it the year the content was released.

The system supports pluggable “scraper” filters as well, with one builtin one: TelevisionScraper, which looks for season/episode information in a variety of formats including sNeN, eN, NxN, and more. Unlike Boxee, a minimum of 1 digit is allowed with a maximum of 3 for seasons, and 3 for episodes. This information is stored within the metadata hint.

Now the metadata services query their respective providers and return the results. The system is then able to narrow down the possibilities via merit, that is, a best-score-wins heuristic based on things like how close the actual year is to the one provided in the file name, whether the content type (episode, movie, etc) matches the metadata provided in the hint (like season/episode numbers), how close the expected title is to the real title (using Levenschtein distance).

Altogether this makes for a powerhouse of media detection capability, without any compelling need to compulsively rename your media collection (hey, feel free if you want to).

The system will also work with many other media types like music, news, adult content, etc. The media-expert tool will also be capable of exporting the metadata in XML format for caching and distribution.


When I was young, in the days before live CDs and Linux netbooks, I lost Windows and all of my data. I had already been upset with the poor craftsmanship, sportsmanship, and lack of competitive decency shown by the players at Redmond and although it wasn’t really Microsoft’s fault that I had just lost a few years worth of projects, writings, and data, I was ready for a change. As I started from ground zero, with a Slackware 7(ish) install not yet able to drive my computer’s video, sound, or network and zero experience with Linux, UNIX, or even DOS, it was a very steep battle but I pressed on.

I read about how this whole Free Software thing got started. I learned about Richard Stallman, and his early work developing a ridiculously impressive text editor. He decided to steward it’s source code differently than anyone else at the time. Instead of charging for the program binaries or merely freely distributing the source, he would instead codify a new set of principles which would ensure the software remained free and accessible for all users.

Merely open source would not do, no one would be allowed to shackle his code. In this act, a community was born. A user could enjoy the use of a huge library of software, a user-developer could work to improve it, and everyone benefits. Oftentimes the user-developers would be so enamored by the grace of the gesture and the sweat of hard work invested by the developers, and they would begin their own separate projects to expand the coverage of what would quickly become a veritable ecosystem of code. Stallman made no assertions as to the sale of the software, only that all modified source code would be available to all users. It did not take long for the economically-minded to craft business models which resonated with the principles of Free Software.

Though I learned of the power of free software long after the movement was long underway, my commitment to the community and tenants of free and open source software is very strong. A great many others feel this way, most even more than I with a resume of community participation and contribution which greatly outweighs my own. I myself have contributed free code to such edge projects as Slicker and Y, and have from time to time submitted patches to a few of the larger desktop projects such as Xorg.

But the work which I am most dearly fond and proud of was my work on the SharpOS kernel. We did something incredibly novel and intriguing, and we managed to meet and exceed our initial goals. The project was truly open source in it’s creation, evolving from a shared discussion of a number of open source C# developers participating in the Mono Project at the time. During my tenure there I also introduced the foundations for a new windowing system (SharpWS), BASH-like command shell (Nash), and a set of UNIX base utilities to complement the SharpOS base system.

For those who weren’t in-the-know during it’s development (and I commend you, then, for reading my blog anyway), SharpOS became dormant because of a clash between the commitments to free software of some of the developers, and the pull away toward an open source model. Mircea Cristian Racasan (Chriss) led the majority of the core developers in the notion that our considerable work should remain protected under the terms of the General Public License, but outside influences wanted to see the code become open source instead. I did offer to switch licenses for the secondary projects I had contributed (SharpWS, Nash, and SharpOS-CoreUtils), but could not advocate the same for the SharpOS kernel, nor the AOT which Chriss had mostly developed.

As a result of the unavoidable schism, more projects with similar goals to SharpOS appeared, first with a fellow Chad Hower, much-aligned to Microsoft (and thus non-copyleft open source) and then with one Scott Balmos, with whom we merely had design disagreements (licensing was still an issue, but to a lesser extent). Considerable effort was applied to these initiatives, and the scene fragmented. Harsh words were exchanged, a lot of difficult effort was duplicated, wheels were reinvented. I advocated for friendship and respect between the factions, and I among others helped push for a joint forum where code could be shared, topics discussed openly, and mutual respect could eventually lead to more progress for all of the groups involved.

And here is where my conflict begins, o friends, for despite my dedication to the ways of Free Software, the church of Richard Stallman has made my fellow C# developers and I to be extant. I am a principled man, once a radically anti-Microsoft boy. My principles include that of tolerance and community. Another of my principles is recognizing the accomplishments of a particular technical/engineering solution. Microsoft made an excellent development platform with their .NET efforts which far surpasses Java in design, coherence, usability, and features. The CLI is easier to extend and to implement. The behavior of Microsoft towards CLR licensing and standardization has been more than adequate, and any of Microsoft’s patents which cover the standardized portion of the CLI have been waived under a royalty-free[sic]* license required by the ECMA. This leaves the nonstandard portions like System.Windows.Forms, System.Data, and others which are distinct and easily removed if Microsoft decides to return to it’s dirty bastard ways. Listen, Microsoft is not trustworthy, they are not our friend, and they are competition for the Linux desktop. But for the most part Microsoft is reaching out in a positive way, and this needs to be acknowledged.

The truth of the matter is that the code that we write in C# cannot be taken away from us. Microsoft cannot sue anyone for writing software in C#. They cannot sue anyone for including a Common Language Runtime, and they cannot sue anybody for using software written in C#. The FSF knows this enough to implement the CLI in their DotGNU project. Yet still Stallman preaches not using C# software and not developing C# software. To me, this means some people who might find my software useful will fail to try it out because Richard Stallman told them to avoid C#. What do you have against me, RMS? This is incredibly insulting to True Believers of FOSS who code C#, and I am left hurt.

Microsoft can sue people for using patents via unofficial clones of their proprietary APIs, which the community acknowledges. Why isn’t this where Richard focuses? He should be advocating the blacklisting of Microsoft’s extra APIs which do offer an actual *threat* to the Linux companies which are distributing them right now. It seems he has grown out of touch with what is relevant, opting instead to merely draw a black line between himself and anyone vaguely associated with Microsoft, taking no exception to hurling insults and generally ignoring the voices of the Free Software fellows who see C# and the CLR for what they are: an excellent programming language on top of an excellent development platform.

With his recent comments about Miguel de Icaza and his continued effort to make pariahs of C# supporters in the FOSS community, Stallman is alienating the entirety of the Mono, MOSA, Axiom, Grammatica, Tomboy, and Banshee communities among many others.

Look us all in the eyes, Stallman, and tell us that our contribution means nothing.

* UPDATE: Having run into a bit more reading around forums and stuff I realized that I was wrong about ECMA requiring royalty-free patent licenses for patents related to the standards. While this certainly produces some doubts in my mind, I still have to stand by the CLR, as the wonderful platform that it is. Microsoft does have the Community Promise, which basically says we won’t sue you if you implement enough of the specs, but I was left with a sense of ambiguity after reading into it. The solution though is pressing Microsoft to follow the tried and true open source patent licenses like those used by the Open Invention Network and IBM. The solution is not to just push Microsoft away and ignore the progress, alienating the developers who code with Microsoft’s frameworks.

So my dad gave me some cash for my 21st birthday and as soon as I was near a store I went to work figuring out what I wanted to spend it on. I floated through the aisles of the local Walmart electronics section, looking at remotes for my PC, shiny new chargeable wireless keyboard/mouse combos, bluetooth dongles and more. I spent so long there that the (bored) associates asked me if I needed help finding something, despite the usual “I know what the fuck I’m doing so move along” look on my face. They really wanted to ask, “sir are you alright?” Undoubtedly half way through the mad search for the perfect gift a hint of desperation could be seen lurking beneath the sure-as-hell geek exterior.

I finally settled on a new portable hard drive. My last portable was a 1TB Hammer moreSpace, a big, clunky drive which required a hefty amount of additional external power to function. It lasted all of 2 weeks, topping out at about 45GB of data before it broke in a terrible coffee table -> floor transfer which I have dubbed “The End Of The Beginning” of my data storage renaissance. Here I am six months later getting another one! But the difference is huge. For one, this new drive (the WD My Passport) does not require external power. Yup, it runs on less than 5V from a USB cable. Second, this drive is very small. It fits nicely in my pocket or backpack and is easy to transport. Finally, its 320G, not 1TB. This is good! Why? How do you even begin to back up a terabyte-grade storage device?? Unless you have the cash to pick up another one, the answer is: you don’t. But a 320GB drive is much more manageable (plus it fit into the amount of money I wanted to spend).

One of the cool things I was looking forward to doing was installing Ubuntu on it. This would enable me to run multiple OSes without dealing with the University tech crew to get a dual boot partition set up (or go it alone and install my own Windows, the only copy of which I currently own is installed on my desktop box). With this in mind, I split up the hard drive into a number of partitions.

Here’s where the Ubuntu Jaunty review part of this post comes in. As always, Ubuntu booted into a graphical desktop from the LiveCD perfectly, with a few caveats. For one, it didn’t bother recognizing the highest resolution when showing the ubuntu boot screen, so it was a little blurry. But X itself did, and all was good. But when Ubuntu played it’s well-known startup sound, I was horrified to hear that Ubuntu overstepped the nominal PCM levels on my Intel HDA card, meaning I heard a terribly-distorted mess of a startup sound.

Then I opened the Ubuntu installer. I made it to where the partition/disk setup page should be and the Ubuntu installer froze. Everything else worked, and the busy cursor was shown fine, but nothing happened. Rebooted and tried again with the same result. Then I rebooted and selected Install from the boot menu. It still took awhile to get to the disk setup screen but it worked this time. Ubuntu told me I had no operating systems (there was, obviously, there was Windows on my laptop HDD), but underneath the “Take over the disk” option it said “Warning, Windows XP Professional will be deleted”. Haha. Anyway I’ll chock that up to an Ubuntu dev forgetting that people have multiple hard drives at all. In any case the install completed and I was able to boot Ubuntu from the drive with no problems. So: only a *very* minor audio problem to get Jaunty installed on this Thinkpad R61.

Now, prior to this I have been having issues with the disk performance on my laptop HDD. Speed has been gradually slowing down- I figured this was a lack of defrag runs. Trying to run the defragmenter yielded “No, there is a disk check scheduled for next boot, do that first.” But rebooting did not make chkdsk run. I ran chkdsk /f from the command line, both in normal and safe mode. “Chkdsk needs exclusive access, would you like to schedule for reboot?” That did nothing as well.

So, with this back story in mind, I went to reboot into Windows. Unfortunately though, I immediately received about 25-30 alert notifications upon logging in that “Such and such file is corrupted. Please run chkdsk”. Riiight. I fumbled with awhile before I found out about chkdsk /x which tries to force unmount a volume, but I didn’t get a chance to use it. I went to bed.

Now this morning I attempted to boot, only to find that Windows would not exit it’s startup screen, both in safe and normal mode. I brought the box to the University help desk so I could use a Windows disk to get at the Recovery Console, so I could run chkdsk. The first one (normal chkdsk) froze at 25%. The second one (chkdsk /r which checks for bad blocks) froze at 25%. I was running out of time and didn’t want to use up the help desk guy any longer so I quickly booted up Ubuntu from my portable and did a quick emergency backup of whatever I could before doing a reimage of the hard drive, which finished without problems.

It seems whenever I upgrade my storage capabilities I am downgraded by fate at about the same time. In this case I didn’t wind up losing any capacity but it was still a hassle, and I probably lost some minor data. Maybe this round was a return on the bad karma I accumulated when I broke my 1TB drive :-\. If so, I hope I’ve paid that all off now because the storage renaissance is far overdue.

When I heard about Novell’s new MonoTouch platform for bringing software written using C# (and evidently other .NET-centered languages) to Apple’s iPhone, I was naturally excited and glad to hear it. But looking closer you see that it has almost entirely avoided any semblance of coherence with the usual Mono Project (or open source, in general) principles. The platform is in fact proprietary, offers no source, makes no attempt to broaden OS platform support (it is not unfeasible to make it possible to do basic app development without the iPhone SDK in this context), and above all adds a huge tax on top of the already expensive iPhone development costs. I remember when we were working on SharpOS and Miguel (the leader-guy from Mono; also of GNOME fame) suggested that we use the MIT license. As you may remember, our work revolved chiefly around AOT technology which is very similar to what MonoTouch uses.

Well, I bet he would have liked the SharpOS AOT under the MIT license, because he could’ve integrated that right into MonoTouch and then sold the basic version for $400 a pop. I don’t think Chriss (the main dev of our AOT engine) would’ve seen a penny of that.

I’m glad we never took that advice, and I think it’s just cold that something so attractive to the community and so heavily based on the community’s work would be quite this closed. This is why I don’t use the MIT license.

I can only hope that once Novell feels they’ve recouped their losses and made a nice chunk of change that they will consider letting us FOSS peons take a crack with it.