大象传媒

ABC-IP and work on audio archiving research

Post categories: Archives,听Audio,听Prototyping

Dominic Tinley | 09:09 UK time, Tuesday, 8 November 2011

We're a few months in to a new collaboration with听 where we're looking at how to unlock archive content by making more effective use of metadata. The Automatic Broadcast Content Interlinking Project (ABC-IP for short) is researching and developing advanced text processing techniques to link together different sources of metadata around large video and audio collections. The project is part funded by the听听(TSB) under its 'Metadata: increasing the value of digital content' competition which was awarded earlier this year. The idea is that by cross-matching the various sources of data we have access to - many of which are incomplete - we will be able to build a range of new services that help audiences find their way around content from the 大象传媒 and beyond.

Our starting point is the English component of the massive听World Service听audio archive. The World Service has been broadcasting since 1932 so deriving tags from this content gives us a hugely rich dataset of places, people, subjects and organisations within a wide range of genres, all mapped against the times they've been broadcast.

The distribution of programme genres in the World Service radio archive

One of the early innovations on the project has been to improve the way topic tags can be derived from automated speech-to-text transcriptions which gives us a whole new set of metadata to work with for comparatively little effort. We've optimised various algorithms to work with the sorts of transcriptions with high word error rate that speech recognition creates and the results so far have been quite impressive.

Other sources of data include everything from听大象传媒 Programmes, including the topics manually added by 大象传媒 producers, and everything from听大象传媒 Redux, an internal playable archive of almost everything broadcast since mid-2007. In later stages of the project we'll also be adding data about what people watch and listen to as well. Blending all this together provides many different views of 大象传媒 programmes and related content including, for example, topics over time or mappings of where people and topics intersect. The end result is a far richer set of metadata for each unique programme than would be possible with either automatic or manual metadata generation alone.

Based on the work so far our project partners have built the first user-facing prototype for the project, called Tellytopic, which lets users navigate between programmes using each of the new tags available. You can find more on .

The plan is that the work we're doing will eventually complement projects in other parts of the 大象传媒, such as Audiopedia which was announced by Mark Thompson last week. We'll talk more about other ways we're going to use the data on this blog over the coming months.

Share this page

Comments Post your comment

Be the first to comment

Jump to more content from this blog

About this blog

This is the Research & Development blog, where researchers, scientists and engineers from 大象传媒 R&D share their work in developing the media technologies of the future.

For the latest updates across 大象传媒 blogs,
visit the Blogs homepage.

Subscribe to Research and Development

You can stay up to date with Research and Development via these feeds.

Research and Development Feed(RSS)

Research and Development Feed(ATOM)

If you aren't sure what RSS is you'll find useful.

Other Related 大象传媒 Blogs

Mothballed Blogs

大象传媒 R&D Main Site

R&D Homepage Image

For a detailed breakdown of our activities, teams, locations and how we collaborate visit our main website. We also host videos on the main website without UK only distribution restrictions.

More from this blog...

Topical posts on this blog

Being Discussed Now

Latest contributors

大象传媒 navigation

大象传媒 links

大象传媒漏 2014 The 大象传媒 is not responsible for the content of external sites. Read more.

This page is best viewed in an up-to-date web browser with style sheets (CSS) enabled. While you will be able to view the content of this page in your current browser, you will not be able to get the full visual experience. Please consider upgrading your browser software or enabling style sheets (CSS) if you are able to do so.