We met Hilary Bishop and Jake Berger from ´óÏó´«Ã½ Archive Development to get a behind-the-scenes take on ´óÏó´«Ã½ Genome.
What is ´óÏó´«Ã½ Genome?
The ´óÏó´«Ã½ Genome project digitised the Radio Times magazines between 1923 and 2009. You can find ´óÏó´«Ã½ broadcast information – ‘listings’ - extracted from each of those editions.
You can also search individual programme titles, contributors and synopsis information. You can also help clean up the data by editing mistakes in the listings and adding information about the programmes.
Why make it?
Until this data was digitised, the ´óÏó´«Ã½ had no comprehensive record of day-to-day broadcast history in a searchable machine-readable form.
How was it made?
Each page of the Radio Times was scanned and a high resolution TIFF image was produced. The programme listings were then ‘zoned’ into blocks so they could be ‘read’ using optical character recognition (OCR) techniques.
The web application was designed from the outset to allow speedy browsing and searching of over four million records.
You can read more about the .
Can you explain more about the crowdsourcing element to ´óÏó´«Ã½ Genome?
Our first step has been this digitisation of the ´óÏó´«Ã½ radio and TV programme schedules from the Radio Times magazine; the next phase of the project is to incorporate what was actually broadcast.
It’s a crowdsourcing project - anyone can join in and become part of the community that is improving this resource. As a result of the scanning process there are lots of spelling mistakes and punctuation errors and you can edit the entries to accurately reflect the magazine entry. You can also tell us when the schedule changed and we will hold on to that information for the next stage of this project.
The response to the call for crowdsourcing the edits surpassed our expectations: in less than 4 months, we’ve accepted more than 60,000 edits to the listings made by members of the public.
What’s next?
Our next step is to match the records in our archive catalogue (the programmes that we have a copy of in our physical archives) with the Genome programme listings. This helps us identify what proportion of the broadcasts exists in a potentially ‘playable’ form, and highlights the gaps in our archive.
It is highly likely that somewhere out there, in lofts, sheds and basements across the world, many of these ‘missing’ programmes will have been recorded and kept by generations of TV and radio fans. So we’re hoping to use Genome as a way of bringing copies of those lost programmes back in to the ´óÏó´«Ã½ archives too.
But, even if we don’t have an actual copy of the programme, we’ll also look to publish related items in our archives, such as scripts, photographs and associated paperwork. We’re looking in to the logistics of making some of these items available via Genome.
Also, during the process of building Genome, we’ve identified a few ‘chunks’ of data that are missing from the database, but due to the way in which OCR works, didn’t get picked up in the original scans. We will be adding this in.