Thursday 9 October 2008

fuzzziness

I've been using topic maps in my day job, so I decided to try out http://www.fuzzzy.com/, a social bookmark engine that uses an underlying topic map engine.
I tried to approach fuzzzy with an open mind, but the increasingly stumbling on really annoying (mis-)features.
  1. This is the first bookmark engine I've ever used hat doesn't let users migrate their bookmarks with them. This is perhaps the biggest single feature fuzzzy could add to attract new users, since it seems that most people who're likely to use a bookmark engine have already played with another one long enough to have dozens or hundreds of bookmarks they'd like to bring with them. I know this is non-ideal from the point of view of the social bookmark engine they're migrating too, since it makes it hard to do things completely differently, but users have baggage.
  2. While it'd possible to vote up or vote down just about everything (bookmarks, tags, bookmark-tags, users, etc), very little is actually done with these votes. If I've viewed a bookmark once and voted it down, why is it added to my "most used Bookmarks"? Surely if I've indicated I don't like it the bookmark should be hidden from me, not advertised to me.
  3. For all the topic map goodness on the site, there is no obvious way to link from the fuzzzy topic map to other topic maps.
  4. There doesn't seem to be much in the way of interfacing with other semantic web standards (i.e. RDF).
  5. The help isn't. Admittedly this may be partly because many of the key participants have English as a second language.
  6. There's a spam problem. But then everywhere has a spam problem.
  7. It's not obvious that I can export my bookmarks out of fuzzzy in a form that any other bookmark engine understands.
These (mis-)features are a pity, because at NZETC we use topic maps for authority (in the librarianship sense), and it would be great to have a compatible third party that could be used for non-authoritative stuff and which would just work seamlessly.

Sunday 5 October 2008

Place name inconsistencies

I've been looking at the "Dataset of New Zealand Geographic Place Names" from LINZ. This appears to be as close as New Zealand comes to an Official list of place names. I've been looking because it would be great to use as an authority in the NZETC.

Coming to the data I was aware of a number of issues:
  1. Unlike most geographical data users, I'm primarily interested in the names rather than the relative positions
  2. New Zealand is currently going through an extended period of renaming of geographic features to their original Māori names
  3. The names in the dataset are primarily map labels and are subject to cartographic licence
What I didn't expect was the insanity in the names. I know that there are some good historical reasons for this insanity, but that doesn't make it any less insane.
  1. Names can differ only by punctuation. There is a "No. 1 Creek" and a "No 1 Creek".
  2. Names can differ only by presentation. There is a "Crook Burn or 8 Mile Creek", an "Eight Mile Creek or Boundary Creek" and an "Eight Mile Creek" (each in a different province).
  3. There is no consistent presentation of alternative names. There is "Saddle (Mangaawai) Bivouac", "Te Towaka Bay (Burnside Bay)", "Queen Charlotte Sound (Totaranui)", "Manawatawhi/Three Kings Islands", "Mount Hauruia/Bald Rock", "Crook Burn or 8 Mile Creek" and "Omere, Janus or Toby Rock"
  4. There is no machine-readable source of the Māori place names with macrons, and the human readable version has contains subtle difference to the machine-readable database (which contains no non-ASCII characters). For example "Franz Josef Glacier/Kā Roimata o Hine Hukatere (Glacier)" and "Franz Josef Glacier/Ka Roimata o Hine Hukatere" differ by more than the macrons. There appears to be no information on which are authoritative.
Right now I'm find finding this rather frustrating.

(grammar edit)

Tuesday 2 September 2008

Does anyone publish the Dataset of New Zealand Geographic Place Names already in XML form?

I've been playing with the Dataset of New Zealand Geographic Place Names which is a set of CSV files published by Toitū te whenua / Land Information New Zealand (LINZ). The data takes quite a bit of massaging, and I was wondering whether anyone else had already done the work of making acceptable XML out of the data rather than doing all the work myself.


I've attached the script I have so far, but it's not perfect. In particular:


  1. It doesn't include place names with Macrons
  2. It makes lots of ASCII-type assumptions
  3. Many of the element names are poorly named and map non-obviously to fields in the CSV files.
  4. The script isn't very generic and does little or no checking

Anyway, here's he script, hopefully it's successfully escaped. The basics are that it creates an sqlite database and streams the CSV files into it direct from the zip (which it expects to have been downloaded into the current directory). It then streams each point out using awk to transform it to XML.




#!/bin/bash
# script to import data from
# http://www.linz.govt.nz/placenames/search/place-names-dataset-download/index.aspx
# into an XML file.
# this script licensed under the GPL/BSD/Apache 2 licences

echo \(re\)creating the database, expect DROP errors the first time you run this
sqlite nzgeonames.db << EOF
DROP TABLE name;
CREATE TABLE name (id, name, east, north, pdescription, district, sheet, lat, long);

DROP TABLE district;
CREATE TABLE district (district, description);


DROP TABLE pdescription;
CREATE TABLE pdescription (pdescription, short, description);


DROP TABLE sheet;
CREATE TABLE sheet (edition, map, sheet);

VACUUM;
EOF

echo importing the names
unzip -p nznames_6Aug08.zip namedata.txt | sed 's/\r//' | sed 's/`/","/g' | awk -F^ '{print "INSERT INTO name VALUES (\"" $0 "\");"}' | sqlite nzgeonames.db

echo importing the districts
unzip -p nznames_6Aug08.zip landdist.txt | sed 's/\r//' | sed 's/`/","/g' | awk -F^ '{print "INSERT INTO district VALUES (\"" $0 "\");"}' | sqlite nzgeonames.db

echo importing the point descriptions \(expect two lines of errors\)
unzip -p nznames_6Aug08.zip pointdes.txt | sed 's/\r//' | sed 's/`/","/g' | awk -F^ '{print "INSERT INTO pdescription VALUES (\"" $0 "\");"}' | sed 's/:/","/' | sqlite nzgeonames.db

echo importing the sheet names
unzip -p nznames_6Aug08.zip sheetnam.txt | sed 's/\r//' | sed 's/`/","/g' | awk -F^ '{print "INSERT INTO sheet VALUES (\"" $0 "\");"}' | sqlite nzgeonames.db


# pick up the ugly duckling
sqlite nzgeonames << EOF
INSERT INTO pdescription VALUES ("MRFM","MARINE ROCK FORMATION","Marine Rock Formation");
EOF

echo exporting points as xml
echo "<document source=\"Sourced from Land Information New Zealand, [date]. Crown copyright reserved.\">" > nzgeonames.xml
sqlite nzgeonames.db "SELECT name.id, name.name, name.east, name.north, name.pdescription, name.district, name.sheet, name.lat, name.long, district.description, pdescription.short, pdescription.description AS descriptionA, sheet.edition, sheet.map FROM name, district, pdescription, sheet WHERE name.district = district.district AND name.pdescription = pdescription.pdescription AND name.sheet = sheet.sheet;" | awk -F\| '{print "<point><id>" $1 "</id><name>" $2 "</name><east>" $3 "</east><north>" $4 "</north><pdescription>" $5 "</pdescription><district>" $6 "</district><sheet>" $7 "</sheet><lat>" $8 "</lat><long>" $9 "</long><description>" $10 "</description><short>" $11 "</short><descriptionA>" $12 "</descriptionA> <edition>" $13 "</edition> <map>" $14 "</map> </point>"}' | sed 's/&/&amp;/' >> nzgeonames.xml
echo "</document>" >> nzgeonames.xml

echo formatting the points nicely
xmllint --format nzgeonames.xml > nzgeonames-formatted.xml


Library of Congress flickr experiment

While processing the photos from my parent's ruby wedding anniversary, I ran into the Library of Congress's flickr experiment.



I probably shouldn't have been, but I was astounded. It looks the the bastion of old-school cataloguing is coming to bathe in the fountain of social tagging.



This is part of a larger effort described at http://www.flickr.com/commons/


Tuesday 26 August 2008

Saxon joy!

I've just moved to saxon from libxml for some XSLT stuff I'm doing, and I'm really loving it.

Not only does saxon run take much less memory, it also speaks XSLT 2.0.

Sunday 10 August 2008

Tuesday 5 August 2008

moving back to google reader from bloglines

A couple of months ago I migrated to bloglines from google reader, not because I was necessarily unhappy with google reader, but because I was interested in seeing what else was available and how it might differ. I've just moved back to google reader.

OMPL just worked. I was able to move my RSS "reading list" from google reader to bloglines and back again with no fuss, no hassle and no duplication.

The advantages of google reader over bloglines are:
  1. AJAX - whereas bloglines marks all items on a page as read when you browse to it, google reader marks them as read when you scroll past them.
  2. Ordering - google entwines items from all feeds in time order, bloglines presents items feed by feed
  3. Better integration with other services
The advantages of bloglines over google reader are:
  1. Fast scanning of voluminous feeds
  2. Fast browsing (it seems _much_ faster when there are thousands of items)
  3. Less integration with other services
You'll notice that better integration is both a positive and a negative.

The fact that I have several google accounts and and only one of them is tied to my RSS reading means that there are tasks I can't multi-task between, even at the coarsest of levels and also means that contacts from the google account almost never get forwarded articles I discover via RSS.

The fact that my blogger.com account and my google reader accounts magically know about each other is great, as is being able to sign in once to a whole suite of tools.

In the end the reason for changing back was ordering. I read too many RSS feeds that cover the same topic for reading them out of order to make sense.

I've also just culled some of my RSS feeds, with the a prime criterion being the quality of their RSS. A number of web comics require one to click a link to read the strip and I no longer read them, but I still read Unshelved, which has the strip (and an ad) in the RSS.

Monday 4 August 2008

Decent editor for blogger.com?

Can someone recommend a decent replacement for the default editor for blogger.com?

Before it drives me insane...

KDE/Gnome Māori localisation on the rocks?

It looks like Maori localisation has been removed from the KDE 4.0 repository:

stuartyeates@stuartyeates:~/tmp/mi$ svn co svn://anonsvn.kde.org/home/kde/trunk/l10n-kde4/mi/messages
svn: URL 'svn://anonsvn.kde.org/home/kde/trunk/l10n-kde4/mi/messages' doesn't exist
stuartyeates@stuartyeates:~/tmp/mi$ svn co svn://anonsvn.kde.org/home/kde/trunk/l10n-kde4/mi/docmessages
svn: URL 'svn://anonsvn.kde.org/home/kde/trunk/l10n-kde4/mi/docmessages' doesn't exist
stuartyeates@stuartyeates:~/tmp/mi$ svn co svn://anonsvn.kde.org/home/kde/branches/stable/l10n-kde4/mi/messages
svn: URL 'svn://anonsvn.kde.org/home/kde/branches/stable/l10n-kde4/mi/messages' doesn't exist

Things don't look good for the upcoming 4.* releases, with the stats for translation at 0%: http://l10n.kde.org/stats/gui/trunk-kde4/team/

Gnome Māori localisation is not much better: stable at 1%: http://l10n.gnome.org/teams/mi

In the medium/long term there is hope that much of this localisation can be bootstrapped by application-centric localisation that appears to be thriving, particularly with respect to firefox, thunderbird and OOo.

Sunday 3 August 2008

Leaving catalyst :( joining NZETC :)

Last week I gave notice at my current employer (Catalyst.net.nz) and accepted a job at Victoria University's New Zealand Electronic Text Centre. The NZETC is primarily a TEI/XSLT/Cocoon-house which publishes digital versions of culturally significant works. It also runs a number of other digital services for the university library (into which it is currently being integrated). As such it's significantly closer to what I've been doing previously in terms of environment, content and technology.
Exciting things about the NZETC from my point of view:
The commute to work will be slightly longer, with me either getting off the bus one stop earlier and catching the cablecar up the hill, or getting off at my current stop and walking up. I'm hoping to do mainly the later.

Wednesday 9 July 2008

Who should I nominate for the NZ Open Source Awards?

So nominations are open for the New Zealand Open Source Awards and I have to decide who I should nominate. There doesn't seem to be anything stopping me nominating several, but picking one contender and throwing my weight behind them seems like the right thing to do. The ideas I've come up with so far are:

Kiharoa Dear for excellent work in getting firefox, thunderbird and open office working in Māori contexts:

http://kiharoa.dear.maori.nz/

Standards New Zealand for sanity control in the OOXML fiasco:

http://www.standards.co.nz/news/Media+releases/NZ+maintains+negative+vote+on+OOXML+Standard.htm

Hagley Community College for rolling out Ubuntu in a secondary school:

http://computing.hagley.school.nz/about/opensource

Who should I nominate? Is there someone I've missed?

Monday 30 June 2008

I'm confused about hardy heron and default applications

Back in the day you told your linux system which applications you wanted to use with environmental variables things like:
export EDITOR=/usr/bin/emacs
Then along came the wonderful debianness of the apt-family and the alternatives system.
update-alternatives --config vi
Now this system too is being undermined by various systems, leaving me uncertain where to set things. What I'm trying to do is:
  • have Sound Juicer and not Music Player (RhythmBox) launched when a CD is inserted. There is an entry for "Multimedia" under the "Preferred Applications" menu option, but this seems to be about opening files, not responding to newly-mounted media and Sound Juicer is not listed as an option. There doesn't seem to be anything about CDs under the "Removable Drives and Media Preferences" (although this is where the setting are that automatically load F-Spot when I attach my camera, which seems like the same kind of thing).
  • configure which applications I can launch on the .cr2/TIFF/Canon RAW files produced by my digital camera I want the same applications to appear in both the file browser and F-Spot (which look like they're presenting the same interface but apparently aren't). ufraw seems to be the tool of choice here (either standalone or as a gimp plugin), but I'd like to pass it some command line args. I can find no entry or this under the "Preferred Applications" menu option.
There are lots of menus with a "Help" as an option, but very few of them seem to be.

Mike O'Connor at Friday drinks


Mike O'Connor
Originally uploaded by Stuart Yeates
I took some photos at Friday drinks, trying to do the whole wide-aperture-to-isolate-visual-elements thing. I wasn't really aware of just how much it is dependent on the relative position of the photographer, subject and background.

Some of them turned out better than others.

Sunday 29 June 2008

What should the ohloh homepage look like?

In a previous post I criticised ohloh homepage for being completely useless for current users of the site. This was somewhat unfair, since I provided no concrete constructive suggestions as to what should be on the page. This blog post, hopefully, fixes that.

To my mind there are two classes of information that should be on the homepage: (a) things that lots of users are confused about and (b) things that are 'new' (think customised rss feeds) (c) combinations of both.

Finding out what people are confused about is easy, just look in the forums, where people are most confused about:

  1. changes in their kudos
  2. why their enlistment hasn't been updated
  3. why version control system of choice isn't supported

The list of 'new' things is:

  1. new / updated projects
  2. new / updated users
  3. new / updated enlistments
  4. new forum posts
  5. new RSS items in projects RSS feeds

These (1), (3) and (5) can be filtered by the users connection to the project (contributor/user/none).

So the trick now is to find combinations which help users understand what's going on and encourage users to engage with ohloh and the projects.

Idea X: A feed of updated enlistments a user is a contributor or user of:

  • Project A's enlistment at http://example.com/git updated 24st June 2008 at 24:50 GMT. A, B and C are the biggest commiters to this project, which is in Java and XML. Last updated 1st Feb 1970.
  • Project B's enlistment at http://example.com/git updated 24st June 2008 at 24:50 GMT. D, E and F are the biggest commiters to this project, which is in C and shell script. Last updated 1st Feb 1970.
  • Project C's enlistment at http://example.com/git failed at or about revision 12345. Click here for instrustions on what to do about this. Last updated 1st Feb 1970.
  • ...

This not only tells user the status of their projects, but that enlistments are being processed, the expected time between each processing of enlistments, that some processing fails and that there's a link to find out more information. Such a feed also focuses attention on the processing of enlistments---which is the heart of ohloh and the key differentiating factor that seperates ohloh from 15 billion other open source sites.

Idea Y: A mixed feed of upstream bugs that effect ohloh performance and functionality:

  • Ticket "support for .xcu file format" updated in ohcount by user "batman"
  • Post "jabber message length" updated in Help! forum by user "someone else"
  • Ticket "svn branch support" updated in ohloh by user "robin"
  • Ticket "bzr support in ohloh" created in ohloh by user "joker"
  • Post "jabber message length" created in Help! forum by user "someone"

This lets people keep up with the status of ohloh progress on issues such as the implementation branch support for svn and support for hg.

Monday 23 June 2008

New ohloh look and feel

ohloh have changed their look and feel, and I've got to say I hate it.

Once you're logged in, almost nothing above the scroll cut on the front page is useful---we already know what ohloh is and don't need bandwidth-hogging ads to tell us. What we need are deep links into new stuff---projects, users and forum posts.

How about logged in users see content rather than ads on the homepage?

Sunday 15 June 2008

Kernel Hell and what to do about it

I've been in kernel hell with my home system for the past couple of days. What I want to build is a custom kernel that'll do xen, vserver, vmware, selinux, support both my wireless chipsets and support my video chipset. Ideally it should be built the Debian/Ubuntu way, so it just works on my Ubuntu Hardy Heron system.

So far I've had various combinations of four or five out of six working at once.

I'm not a kernel hacker, but I have a PhD in computer science, so I should be able to at least make progress on this, and the fact that I can't is very frustrating. At work I grabbed a kernel off a co-worker, but it wasn't built the Debian/Ubuntu way.

Standing back and looking at the problem, there seem to be two separate contributing factors:


  1. There are a huge number of organically-grown structural layers. I count git, the kernel build scripts, make, Linus's release system, the Debian kernel building system and the ubuntu kernel building system. I won't deny that each of these service a purpose, but that's is different points a which each of the six different things I'm trying to make work can begin their explanation of how to make them work and six different places for things to go wrong.
  2. There are about many Linux distributions and each of the things I'm trying to get working caters to a different set of them.
In many ways the distribution kernel packagers are victims of their own success, most Ubuntu, debian and RedHat kernels just work because they're packagers keep adding more and more features and more and more drivers to the default kernels. With the default kernels working for so many people, fewer and fewer people build their own kernels and the pool of knowledge shrinks. The depth of that knowledge increases too, with the each evolution of the collective build system.

Wouldn't it be great if someone (ideally under the auspices of the OSDL) stepped in and said "This is insane, we need a system to allow users to build their own kernels from a set of <git repository, tag> pairs and a set of flags (a la the current kernel config system). It would download the git repositories and sync to the tags and then compile to the set of flags. Each platform can build their own GUI and their own backend so it works with their widget set and their low level architecture, but here's a prototype."

The system would take the set of repositories and tags in those repositories and download the sources with git, merge the results, use the flags to configure the build and build the kernel. Of course, sometimes the build won't work (in which case the system sends a copy of the config and the last N lines of output to a central server) and sometimes it will (in which case the system sends a copy of the config and an md5 checksum of the kernel to a central server and optionally uploads the kernel to a local repository), but more than anything it'll make it easy and safe for regular users to compile their own kernels. The system would supplant "building kernels the Debian way" or "building kernels the RedHat way" and enable those projects working at the kernel level to provide meaningful support and help to their users on distributions other than slackware.


Potential benefits I can see are:

  1. increasing the number of crash-tolerant users willing to test the latest kernel features (better testing of new kernels and new features, which is something that's frequently asked for on lkml)
  2. easing the path of new device drivers (users get to use their shiny new hardware on linux faster)
  3. increasing the feedback from users to developers, in terms of which features people are using/interested in (better, more responsive, kernel development)
  4. reduce the reliance on linux packagers to release kernels in which an impossible-to-test number of features work flawlessly (less stressed debian/ubuntu/redhat kernel packages)
  5. ease the path to advanced kernel use such as virtualisation

You know the great thing about that list? Everyone who would need to cooperate gets some benefit, which means that it might just happen...

Macrons and URLs

Macrons are allowed in the path part of URLs, but not currently in the machine-name part (or at least, not yet), so http://extensions.services.openoffice.org/project/māori-papakupu is good, but http://www.taiuru.Māori.nz/ is not (use http://www.taiuru.maori.nz/).

A review of how lots of programs handle macrons is at http://research.elabs.govt.nz/macron-support-in-open-source-web-applications/.

Saturday 14 June 2008

Exporting firefox 3.0 history to selenium

In the new firefox 3.0 they've completely changed he way history is recorded, using a SQL engine to record it (details here).

I wrote a quik hack to export the history as a series of selenium tests:

sqlite3 .mozilla/firefox/98we5tz3.default/places.sqlite 'select * from moz_places' | awk -F\| '{print "<tr><td>open</td><td>"$2"</td><td></td></tr>" }'

sqlite3 locks the file, so you'll need to close firefox (or take a copy of the file) first. Cut and paste the results into an empty selenium test.

Obviously, your profile will have a different name.

Sunday 4 May 2008

OpenID - Everybody Wants To Go To Heaven, Nobody Wants To Die

I've been looking recently at OpenId a little more closely. It's a great system, that I've been using for a number of years.

My first free openid provider (http://stuartyeates.videntity.org/) went belly up for reasons I'm still not too sure about, but the life expectancy of internet startups and free services has never been that long. I have a new identity (http://stuartyeates.myopenid.com/) which does me just fine.

One thing I've been noticing is that while a host of the internet services that I use want to be my identity provider, very few of them want to let me login using an openid provided by another identity provider. From a business point of view I can completely understand this: they want to know everything about the user so they can (a) off better, more integrated services and (b) see more customised advertising. I have no problem with (a), but (b) is one of the reasons I'm eying up openid in the first place.

They don't understand that to get to openid heaven, they're going to have to die. By giving away the user-authentication-and-give-us-your-personal-information step, they can drive significantly more logins and significantly deeper interaction with their website and with their content. Sure, they can't necessarily get access to the users email address and whatever fake personal details they submitted at sign up, but I've yet to see this used particularly well anyway. The personal details certainly don't seem to be used to anywhere near the same effect as analysis of user behaviour.

What would be really good is an openid identity provider with (a) shared no information with any of the large advertising groups or dominant internet companies, (b) really clear information about privacy expectations (and I'm not talking here about a page of legalese titled "privacy policy," but discussion and disclosure of things which jurisdictions user data is stored in, steps taken to avoid collection of user data, etc) and (c) a clear sustainability model to prevent it being bought-out for the user data and to ensure it's continuity (I can imagine paying a subscription for it).

I can imagine that this is the kind of thing that google or yahoo might sponsor in their efforts to promote their next generation of web authentication standards. The existance of such an outfit would hugely boost the reputation of such a standard and lacking the huge advertising and existing lockin it would hardly challenge their own identity providers while answering a whole range of independence, privacy and monopoly questions.

Tinkering with ohloh

I've been tinkering with www.ohloh.net, which is probably best described as a web 2.0 freshmeat. Rather than tracking manually-updated releases it relies on automatically detected updates to version control repositories, RSS and geo-urls. It relies on wiki like editing rather than the strict ownership rules of freshmeat. ohloh does automatic language detection and source code analysis based on the version control repository and attributes individual commits to specific developers and their ohloh user account.

I've added and am actively curating a group of go/baduk projects. The overall goal is to encourage reuse and reduce the willingness of hackers to rewrite go/baduk systems from scratch.

My next step on the technical side is to write some GRDDL (Gleaning Resource Descriptions from Dialects of Languages) to transform the XML returned by the API into RDF, which I can them import into simal.

My next step on the social side is to mention what I'm doing in some of the go/baduk mailing lists, but I want to wait until I've got something concrete to provide that Sensei's Library (the current repository of information about go/baduk programs) hasn't already got.

Friday 2 May 2008

Happiness is tests passing


I've just (re-)attained the happy state of the unit tests passing in the key package of jgogears. I've invested a surprising amount of time and energy in the unit tests, mainly as a form of requirements analysis, and I'm really pleased with the result.


Sadly, the core package that contains much of the code has some outstanding, longstanding failures which are going to be challenging to fix. They are represent failures in jgogears' ability to round-trip board states between GNU-Go ASCII and SGF files and will require a bug-for-bug reimplementation of the GNU-Go ASCII board printer.


jgogears has reached the stage where it now plays games that are obviously meant to be go, but is not yet a serious contender.

Friday 18 April 2008

Resetting vmware networking

I had a hardware glitch (apparently chained failure of on-board network card, auto-negotiation on switch, auto-negotiation on larger switch) which left networking on my ubuntu edgy box somewhat randomised. Ubuntu recovered after a reboot, successfully finding the new network card (but calling it eth11). vmware-server didn't recover, and no amount of fiddling with the magic of vmware-config-network.pl would cause vmware to forget all it knew about networking and look only at the current state of play.

In an attempt to erase it's memory I removed /etc/vmware/locations but this caused vmware-config-network.pl not to run at all.

Eventually I worked out that the trick was to remove everything in /etc/vmware/locations other than the lines near the start telling it where to find directories and executables and rerun vmware-config-network.pl selecting all the defaults.

Suddenly everything worked.

Thursday 20 March 2008

Quality explanations of technical issues

I'm a big fan of clear explanations. If you want to explain something to someone (and given the alternative is letting everyone learn from their own mistakes, this has got to be good), clear explanations are really important. I've tutored computer science at uni and I've explained open source concepts to a whole range of people as part of my work at OSS Watch and come to learn that an analogy can be very useful.

Imagine my pleasure at reading this analogy of a really rather complex compiler / interrupt issue.

For the record, I know nothing about the x86 interrupts, since we were taught interrupts using the much simpler RISC SPARC system.

Tuesday 18 March 2008

Tinkering with suffix-trees and algorithms

I've been tinkering with learning algorithms for my computer-go player, jgogears.

It linearises board positions and then uses classic string processing techniques, principally a large suffix-tree. Suffix-trees are widely used in information processing, information theory and compression fields of computer science. I also used them extensively in my recent Ph.D.

Currently I'm training with about 200 go games (~40k moves), giving me about 950K nodes in my suffix tree.

I've just switched my linearisation method from a strict distance measure to one which capitalises on adjacency much better.

There are a number of tuning parameters for the rate at which I grow the tree. I'll be tinkering with them as I increase the number of boards I'm using for training.

Friday 14 March 2008

RedHat's MugShot and lockin

I've been playing with MugShot, the RedHat's venture into the social networking sphere. I was initially impressed by the site, which has builds a dynamic (if HTML-only) website while asking for really very little information and not asking for any confidential information (i.e. no asking of gmail passwords to pester friends to join up). The website is slick and glossy, if a little heavy on my dialup connection.

The more I use MugShot, however, the more I see it supporting rather than undermining lockin in the social networking sphere.

1 Login with Gnome account or local username/password

While many of MugShot's Gnome account initial target audiance may have had Gnome accounts suitable for logging into to MugShot, I'm guessing that a really small proportion of people they knew did. Login using openid or similar would be great.

2 Supporting only a small number of services, and not flexibly

It's great that mugshot supports delicious. However, by only supporting a single service of this kind, not only are my magnolia bookmarks unsupported, but the current delicious control over the social bookmarking market is strengthened. I understand that the differences between the delicious and magnolia APIs are largely cosmetic, and find it hard to believe that much effort would have been required to support it. Similarly, there are a number of exciting competitors to youtube, amazon, facebook and other MugShot supported services which could be supported with very little time and effort on the part of MugShot.

The services that are added, use a one-size-fits all approach, when I know of no two long-term users of (for example) facebook who use it in the same way.

3 No machine-readable export

There is (that I can see) no constructive machine readable export of any kind from MugShot. Two basic RSS feeds (which aren't advertised in the GUI). No RDF/FOAF, no blogroll for integration into the next generation of social networking and web services. MugShot may see itself as the top of the social networking heap, but until that's evident to the rest of us, they need to play nicely with third parties, both below them and above them on the heap. Failing to export anything useful in a machine-readable format is not playing nicely.

4 No "deep connection" back to the data sources

Having configured MugShot, I can see lots of books, photos and links that those in my network find interesting. But even though Mugshot can tell the difference between books, photos and links when it's displaying them and knows about the accounts I have on services for bookmarking, favouriting or commenting on books, photos and links it doesn't offer functionality to bookmark, foavourite or comment on them. By not driving content and information back to the underlying datasources, MugShot undermines the underlying services and devalues them, in so doing it also (a) devalues the commitment and investment I've made to those systems and (b) reduces the likelihood that those services will go out of their way to help MugShot.

5 "Invitation to tinker" only covers look-and-feel

MugShot has a great system for creating skins for music players, and several hundred people appear to have accepted this invitation to tinker and made come great skins. Unfortunately only cosmetic changes are possible. What I would like to see is a generic feed subscription creator which let me add new services that MugShot could listen to. Just like the music player skins, only a very small proportion users would bother, and most of those who did would not produce anything to shout about, but with a feed subscription creator, each success would lead to a whole new service that Mugshot could access and channel to their users. Such a effort would completely alleviate issue problems with the small number of supported services.

MugShot has an invitation to tinker in the traditional open source web 1.0 sense, with a wiki and version control system, but it doesn't have an invitation to tinker in the web 2.0 sense in which users can scratch their itch (and make the project better) from within their browser in the way the yahoo pipes, for example, does.