What's so interesting about URLs anyway?
If I'm honest we're a bit obsessed with URLs and what makes a good one. You might think this is a little odd, a bit geeky or simply pointless - but you would be wrong. and are at the heart of Web 2.0.
As two of the key principles of Web 2.0:
- It's the data stupid (Formerly "Data is the Intel Inside")
- Small pieces loosely joined
Much of our current work here, at the Future Media and Technology bit of Audio and Music (where I work), is about trying to make our data available in loosely coupled bite size chunks. We are doing this because we want to build applications, and allow others to build applications, that support collaboration and innovation. We want to make our data accessible and mashable so that we can make it easier for people to find our programmes, discover new music and generally ferret out more information about the subjects that interest them.
So what's this to do with URLs? Loads. URLs are the mechanism we all use to point at stuff on the Internet this means that the URLs need to be persistent - they need to reliably point to the same resource - and ideally that resource is a single thing or concept so that the data can be easily aggregated and combined with other data sources.
So what are we doing?
We started with programmes by providing a single page for every episode, series and brand (e.g. The Today Programme, or the Chris Moyles Show) the ´óÏó´«Ã½ broadcasts. There's more about this work and here. But what I wanted to talk about here is our URL design.
We have two classes of URL - 'editorial objects' and 'aggregations'.
The editorial objects are represented with an eight digit alpha numeric key - no title, date, channel etc. just the key. For example, this URL www.bbc.co.uk/programmes/b0088x4z represents an individual episode of Heroes and is loosely joined to its parent series (www.bbc.co.uk/programmes/b007xfkw) and brand (www.bbc.co.uk/programmes/b007vcf4).
The idea is that these URLs are persistent - they won't need to change if the programme title, broadcast date, channel or anything else changes. Each URL also only represents a single concept so its easier for people to point to it if they want to discuss or share the programme with others. It also makes it easier for software engineers to write applications to mash-up our data to create more context about a programme.
The aggregation views provide a RESTful interface into these editorial objects. Initially this is restricted to a-z (with some nice url hacks like this ), genre and format aggregations but we're working on some more like schedules and tags.
Were also working on additional views so that in the near future by adding .json, .mobile, .rss, .atom, .iCal or .yaml to the end of the URL will give you that resource in that format.
So that's programme support - unique eight digit keys for our programmes - but what about our other primary objects? Well we think we have two other significant areas to deal with events (e.g. festivals and the like) and music. For music we are working with to build a spine.
MusicBrainz provides unique ID (e.g. ) for artists, albums, tracks etc. Using these IDs as the bases for our URL schema for music (e.g. bbc.co.uk/music/b10bbbfc-cf9e-42e0-be17-e2c3e1d2600d) we can, as with programmes, provide persistent identifiers for our music information. We are currently working to assign these IDs to both our track listings on programme episode pages and also to replace the technology under pinning the bbc.co.uk/music site to hang off the MusicBrainz metadata.
When we launch the new music service built around MusicBrainz we will be able to map from music information to programme information very easily. So for example, we will be able to automatically link from track listings on an episode to the relevant artist page and from there to other programmes featuring that artist. Now OK, strictly speaking we don't need good URLs to enable this - but because our URLs have been designed to be RESTful, expose our domain model and provide persistent identifiers for our editorial objects (programme episode, album, artist etc.) it makes it much easier to mash the systems together.
The other nice thing about these IDs is that anyone can use them. Other developers can point to the ´óÏó´«Ã½'s programme and online music information and build their own applications around it. For example someone might want to mash-up, in a RDF triple likely way, something like:
Episode of Heroes 'features'
Episode of Hands on Nature 'isabout'
'influencedby'
The editorial objects are represented with an eight digit alpha numeric key - no title, date, channel etc. just the key. For example, this URL www.bbc.co.uk/programmes/b0088x4z represents an individual episode of Heroes and is loosely joined to its parent series (www.bbc.co.uk/programmes/b007xfkw) and brand (www.bbc.co.uk/programmes/b007vcf4).
I'd be interested to hear why you rejected the "brand/parent/series/episode" format.
What's simpler than www.bbc.co.uk/programmes/heroes/s01/e20 ?
Hi Ceedee
It's not as perverse as it first appears - honest. We thought long and hard about the best way to make programmes addressable and, as ever, there's no perfect solution. So...
...no channel cos not only do episodes get broadcast on multiple channels they can also change "channel ownership" over time. Also networks change over time as a result of marketing decisions: /radio5 > /fivelive
and no brand > series > episode cos so many programmes don't fit this model. Many episodes are one-off commissions. Many one-off series are commissioned and if they work new series come along and the original series becomes part of a brand (so the url would change). Also many brands, orphan series, orphan episodes have ambiguous titles and we didn't want to get into the territory of /programmes/eastenders1
A slightly less noble reason: the data on /programmes takes a very circuitous route to the site. Along the way mistakes happen and the data quality suffers: episodes of dr who turn up in the middle of in the night garden etc. If we went with brand > series > episode the urls would shift as these mistakes were rectified.
We'd love to have made human readable/hackable AND persistent urls (and have on the aggregation pages) but it just wasn't possible
so we made a call that persistent trumped human readable. And recognise that not everyone will agree...
This is interesting, I was just planning what to record with radiotimes.com and I thought it would be good if I could see recommended programs based on bands I like (much like last.fm recommends gigs based on listening habits). Bands pop up for short periods for one song and/or an interview on so many weird programs that there's no real way to keep track of them.
Sounds like you're going to solve at least half the problem and in a way that others can build on the info in specialised ways, starting from Atom feeds of Your Favourite Band's upcoming TV appearances and building from there.
Neato.
It would be interesting to see whether these URLs reach beyond the web platform, e.g. That a consistent URI can be returned as part of TV-Anytime data, or ETSI DAB EPG data - represented using the same CRID.
Although the question then is, does that URI then point to the same resource?
Tom,
we inside the ORF (Austria's PSB) think about assigning DOI's to the individual persistent elements (audio, webdocuments etc), as URN's do not have a central, company-independent resolving mechanism.
Did you consider DOI and is there something which speaks against this approach (except the costs...)?
thank you and greetings from Vienna,
karl.
How will third party developers be able to programmatically discover that, for instance, Heroes is represented by the id 'b0088x4z'? Will some kind of lexicon be made available?
Karl
I don't think DOIs are the solution to the problem we wanted to solve.
In essence DOIs provide an additional layer of indirection to enforce the persistence in otherwords - DOIs are persistent because they are designed to provide persistent identifiers. Apologies if that sounds tautological. You can also just 'decide' to make sure your URLs don't change - and as long as they are designed sensibly there's no reason why they should need to change. And if you do that and you are working with online resources DOIs doing add anything.
On the web URLs are more 'useful' everyone can easily use them, DOIs just provide that extra hurdle, for no additional benefit. Or put another way DOIs are a solution to a particular problem, just not the problem we needed to solve, persistent URLs are better.
Ben - no we're not planning on publishing a separate lexicon because the site currently can be used as a lexicon.
So with your Heroes example you could go to this url:
/programmes/a-z/heroes
Which gives you all ´óÏó´«Ã½ programmes starting with 'heroes' and from here follow the 'Heroes' URL to this url:
/programmes/b007vcf4
Now right now the data is a bit ropey which means there are 3 instances of 'heroes' in the a-z list. This needs to be cleaned up.
In terms of getting the data - you are currently limited to using the microformated html to identify and extract the data, but in due course we'll be making other representations available (under license).
(sorry I didn't reply sooner).