bounding brokenness

End of the Google Summer of Code

The Google Summer of Code 2008 coding phase ended a week ago, and final evaluations are currently on. My GSoC project was about integrating Thunderbird with Windows Search, so that people can search for emails on Windows as conveniently and quickly as they search for documents and other files.

What’s been done?

How much of what I initially set out to do has been completed?
  • There’s a patch up at bug 430614 that adds almost full fledged support for Windows Search integration on Windows Vista to Thunderbird. It seems like it should land for Thunderbird 3 beta 1.
  • A lot of the backend code that sends notifications of messages or folders being added, copied, moved or deleted, has been fixed. These notifications are essential for the instantaneous indexing I hoped to achieve. Now, with the profile on an NTFS file system, and the Thunderbird and Windows indexes in their steady state, new messages should be indexed within seconds of arrival. Right now, the indexing code is the only core code that uses the notifications, but I understand that gloda will use it too for its incremental indexing. This should also be useful for extension developers looking to keep track of messages and folders. Fixing the notifications and writing tests for them did take a substantial amount of time, though.
  • Spotlight integration on OS X, which was introduced in Thunderbird 2, has also received several fixes because of a lot of code being shared between the two.
  • More code that has been rewritten includes GetMsgTextFromStream and the code to open Thunderbird from a search result.

What hasn’t been done so far?

  • A good UI within Thunderbird to manage indexing. The plan right now is to merge this with the default client dialog, presenting a unified “OS Integration” dialog. Let’s see how it works out. If you have a suggestion, please let me know!
  • Windows XP support. Windows Search for Windows XP lacks a “property handler” which can be used to parse MIME messages and present useful data about the headers in a UI, and also allow filtering results based on them. (This is built into Vista, and we make use of it. So, you can type in from:xyz in the search box and get results based on this.) An option is to write our own property handler.
  • Proper support for indexing newsgroups. Specifically, notifications for deleted and copied messages.
  • I guess that if support for indexing newsgroups lands, support for turning off search integration for particular accounts should also be present. Right now it’s a global setting, but adding per-account support shouldn’t be too hard. One thing to figure out is how messages moved between indexed and non-indexed accounts should be handled. Question: should the granularity be per-account or per-folder? Per-folder gives more control but has bigger problems than per-account.
  • One concern that asuth had is that we currently send “message added” notifications for individual messages, as soon as the message is added to the database. If a new folder with thousands of messages is added, we’ll have thousands of notifications, and those many XPConnect boundaries being crossed. One idea is to implement a token bucket system which batches multiple messages in one notification. This, like all asynchronous programming, has its own set of issues, of course: what if a message is added, but moved or deleted before its notification can be sent?
  • Another aspect that deserves at least a brief look is the possibility of presenting results from Windows Search within the Thunderbird UI.

What have I learnt?

This was my first time participating in a large open source project, and I’m grateful to Google for the incentive to do so. My job definitely isn’t over yet, and I’d like to continue to work on this until it gets into a very usable state. (Apologies for any rambling below.)
  • The final product might be completely different from the initial plan. The plan I’d put forward in my GSoC application just couldn’t work.
  • There are surprises everywhere. Every time you accomplish something and think that it’s finally about to be in a usable state, you realize another part needs fixing. First it was the notification code. Right now my concern is with the UI, and clarkbw has some great ideas, one of which (the unified default client/search integration dialog) looks to be the way to go. Right now, though, apart from the UI, there don’t seem to be any major, debilitating issues with the integration – though I’ve hit upon a few edge cases, which seem to be caused when the account settings are absolutely mauled, where the search result handling breaks, and I’m looking for fixes to them.
  • The old Windows developers had it easy. A separation of your program into administrative and non-administrative parts forces you to think carefully about what to add to each section, and about how to go about doing this with a minimum of hassle to the user. Right now we require one UAC prompt to enable integration and one to disable it, which looks like it’ll be the minimum required.
  • Wherever it can be used, automated unit testing works. It makes you be a bit more confident about the code than if you had relied on manual testing, and it protects against regressions that others or you yourself introduce. (I still go “doh” at that. :( ) Full credit to Standard8 for driving automated testing for mail/news.
  • Tougher reviews are generally better. Old code that doesn’t seem to follow any particular coding style is just plain harder to fix. Good, readable code is also easy to modify. Of course, there’s the issue of patches being held up for too long.
  • You can’t assume any code that interacts with the OS platform, however old and well-tested, will work the same across platforms. Apple should – no, must – allow its OS to be virtualized, because debugging over IRC just isn’t fun.

Thanks to…

(hopefully I’m not forgetting anyone) Jeff Beckley, my GSoC mentor, for spending a ridiculous amount of time with me (I guess far more than I deserved), with guidance in both overall direction and specific implementation details. David Bienvenu and everyone else at #maildev and #developers, for answering every potentially stupid question I had (though I hope the SNR wasn’t too low :) ). Leslie Hawthorn and everybody else in charge of GSoC at Google, and Gervase Markham, the Mozilla GSoC administrator, for making this possible. … and Arun Raghavan, a GSoC 2007 veteran from my college, for holding a small lecture about GSoC (I think on March 13) that got me interested in it in the first place. What a wonderful experience GSoC was!