大象传媒

The World Cup and a call to action around Linked Data

John O' Donovan | 13:37 UK time, Friday, 9 July 2010

Underneath the surface of the there is a revolution going on in the technology and workflow being used to manage and publish our content. To some extent we have been doing this in stealth mode as we figure out a lot of challenges, but as we approach the World Cup Final we'd like to explain some more about what we have changed and why this is an important engagement for us in the development of the and support for the use of .

For some time, we have been working on utilising Metadata and Linked Data to organise and manage the site dynamically, culminating in the World Cup 2010 site which uses Linked Data to manage how content is published. We have also had some with other news organisations thinking about how to bring a critical mass to the development of the Semantic Web and what benefits it can bring.

The World Cup site is our first major statement on how we think this can work for mass market media and a showcase for the benefits it brings.

First some background on the World Cup site.

The World Cup site is a large site with over 700 aggregation pages (called index pages) designed to lead you on to the thousands of story pages and content which make up the whole site. Examples of index pages range from the through to or .

Normally, managing all these index pages for the World Cup would not be possible as each of these needs to be curated by an editor, setting up automation rules or keeping it up to date with latest stories and information. To put the scale of this task in perspective, the World Cup site has more index pages than the rest of the !

So how is this possible? Clearly some form of automation is required, but search technologies and previous methods for doing this have proven to be inaccurate and there is no point in having all these pages if the quality of them is perceived to be low. You don't want to get content mixed up between different players with the same surname, for example.

The key change is we are using some advanced methods for analysing content and deciding how to tag this content with precise metadata linked to uniquely identified concepts (a concept usually being a person, place or thing). In the case of the world cup we are interested in players, teams, matches, etc... but the principle can be easily applied to anything. To do this we are using some technology from IBM (Languageware) and Ontotext (BigOWLIM) and a high level view of the process is shown in Fig 1, but we will be following up this post with more details about how this all works.

Pushing the Boundaries

Though there are lots of dynamically published sites on the internet, the difference here is in the use of to build and manage the site. This is incredibly flexible and we are only just starting to explore the possibilities of how this allows us to present and share content. Though we have been using RDF and linked data on some other sites (such as 大象传媒 Programmes, 大象传媒 Wildlife finder, Winter Olympics) we believe this is the first large scale, mass media site to be using concept extraction, RDF and a Triple store to deliver content.

Another way to think about all this, is that we are not publishing pages, but publishing content as assets which are then organised by the metadata dynamically into pages, but could be re-organised into any format we want much more easily than we could before.

So why is this important?

The principles behind this are the ones at the foundation of the next phase of the internet, sometimes called the Semantic Web, sometimes called Web 3.0. The goal is to be able to more easily and accurately aggregate content, find it and share it across many sources. From these simple relationships and building blocks you can dynamically build up incredibly rich sites and navigation on any platform.

There is also a change in editorial workflow for creating content and managing the site. This changes from publishing stories and index pages, to one where you publish content and check the suggested tags are correct. The index pages are published automatically. This process is what assures us of the highest quality output, but still saves large amounts of time in managing the site and makes it possible for us to efficiently run so many pages for the World Cup.

To make all this possible there has been fantastic support from the Sport team, engaging with new tools and workflows. We are all looking forward to the London Olympics, where there will be over 12,000 athletes and index pages to manage and so without this type of technology, we will not be able to showcase and maximise all the content we have.

A call to action

We'd like to engage further in the development of Linked Data and feel we have a role to play in supporting this important new view of how content is published and shared. The methods talked about here will become the basis for more and more of our content publishing and we fully appreciate the work many people are doing in this area to make this possible.

There is a vision for the future here with more time spent on creating and sharing content and less on managing it. However we have had to overcome many problems in getting this far and many of these issues are related to organising and cleaning up data. Due to all the technical and data challenges we have not yet been able to expose all our data as RDF, for example, though we will start doing this soon.

As more content has Linked Data principles applied to it (as , then these problems will become less significant and the vision of a Semantic Web moves closer. Importantly, what we have been able to show with the World Cup, is that the technology behind this is ready to deliver large scale products.

This is more than just a technical exercise - we have delivered real benefits back to the business as well as establishing a future model for more dynamic publishing which we think will allow us to make best use of our content and also use Linked Data to more accurately share this content and, a key goal for the 大象传媒.

We look forward to seeing the use of Linked Data grow as we move towards a more Semantic Web.

John O'Donovan is Chief Technical Architect, Journalism and Knowledge, 大象传媒 Future Media & Technology. Read the follow up post, 大象传媒 World Cup 2010 dynamic semantic publishing on the Internet blog.

Share this page

Comments Post your comment

Comment number 1.
At 9th Jul 2010, guyvalerio wrote:

Great summary John and congratulations to all concerned; this is an excellent effort.

Guy

Complain about this comment (Comment number 1)
Comment number 2.
At 9th Jul 2010, samsethi wrote:

Thank you 大象传媒. You are pioneers for making new tech mas market. I think RSS, Podcasts, Online Video, Backstage and much more. This move to support the semantic web and RDFa is massive.

"publishing content as assets which are then organised by the metadata dynamically into pages"

I would love to see more posts and detail about how this has helped the 大象传媒.

Complain about this comment (Comment number 2)
Comment number 3.
At 9th Jul 2010, Andy Mabbett wrote:

What about microformats?

hCard for teams/ players and (with Geo) venues
hCalendar for fixtures

Complain about this comment (Comment number 3)
Comment number 4.
At 9th Jul 2010, Seth Grimes wrote:

John, thanks for describing the content-publishing technology and workflow. This could make for a great presentation at a conference I'm organizing... if you can make it to New York. The Web site doesn't formally launch until Monday, but check it out: . Drop me a note or submit a speaking proposal if you'd be interested in speaking.

Seth, [Personal details removed by Moderator]

Complain about this comment (Comment number 4)
Comment number 5.
At 10th Jul 2010, Gareth Adams wrote:

This article is great, it's good to see the 大象传媒 embracing metadata so well. Have you looked at sharing the data via an open content database like Freebase (?

My only complaint is your perpetuation of the myth of a "versioned" Internet. Please never say "Web 3.0" again ;)

Complain about this comment (Comment number 5)
Comment number 6.
At 10th Jul 2010, Anand CV wrote:

I appreciate 大象传媒's efforts to use the Semantic web technologies. There is a need for players like 大象传媒 to leverage this excellent technology. It is helping the creation of the next web.
Though it is actually helping organisations with large amount of data to benefit from it in the back end, the end users are still not having any perceivable benefits yet. It will be great if some applications are created which will clearly demonstrate the power of the Semantic Technologies to the end user. Which will be a turning point. I hope 大象传媒 team can play a big role in that.
Cheers and best wishes for the efforts.

Complain about this comment (Comment number 6)
Comment number 7.
At 12th Jul 2010, jod wrote:

Samsethi: Thre are more details being published today on how this works

Andy Mabbett: We are more focused on RDFa at the moment though would consider microformats if there was enough demand. We still have a lot of work to do to expose the data.

Gareth Adams: I won't mention Web 3.0 again. Oh, whoops...

Complain about this comment (Comment number 7)
Comment number 8.
At 12th Jul 2010, Paul Murphy wrote:

There's a follow up post that's just been added on the blog 大象传媒 World Cup 2010 dynamic semantic publishing that you may also be interested in.

Complain about this comment (Comment number 8)
Comment number 9.
At 16th Jul 2010, Alpesh Doshi wrote:

Great, pioneering work from your teams! Congratulations for showing what is possible with semantics. It shows that you can gain business benefits from applying these technologies and paradigms. Would love to talk more with the team. We are doing similar work with publishers, media companies, corporates etc...

Complain about this comment (Comment number 9)
Comment number 10.
At 12th Aug 2010, Joenade wrote:

I'm not sure I fully understand the topic being discussed, I have some knowledge about microformats and I can understand the use of creating a semantically described web, but I followed through to this page via a 大象传媒 blog link which talked about essentially doing away with urls, but I don't see how that is really possible.

In any case, the semantic web is going to be the next general evolution and having sites like the 大象传媒 getting behind the effort will ensure that process is driven forward faster.

Complain about this comment (Comment number 10)

听

This entry is now closed for comments

Jump to more content from this blog

About this blog

Staff from the 大象传媒's online and technology teams talk about 大象传媒 Online, 大象传媒 iPlayer, and the 大象传媒's digital and mobile services. The blog is reactively moderated. Posts are normally closed for comment after three months. Your host is Eliza Kessler.

For the latest updates across 大象传媒 blogs,
visit the Blogs homepage.

Subscribe to 大象传媒 Internet Blog

You can stay up to date with 大象传媒 Internet Blog via these feeds.

大象传媒 Internet Blog Feed(RSS)

大象传媒 Internet Blog Feed(ATOM)

If you aren't sure what RSS is you'll find useful.

大象传媒 Online in the news and blogs

大象传媒 weather..
From 大象传媒 Points of View Messageboard
From Boards IE
From Media UK
From Nick Reynolds at Work
From Digital Spy Forum

Links to conversations and stories about the 大象传媒's online activities. The links on this blog and its stream are chosen by Eliza Kessler and Nick Reynolds. Follow on Twitter.

Follow on Twitter

大象传媒 Blogs and Boards

Find all 大象传媒 bloggers on the

The World Cup and a call to action around Linked Data

Comments Post your comment

Comment number 1.

Comment number 2.

Comment number 3.

Comment number 4.

Comment number 5.

Comment number 6.

Comment number 7.

Comment number 8.

Comment number 9.

Comment number 10.

About this blog

Subscribe to 大象传媒 Internet Blog

大象传媒 Online in the news and blogs

Follow on Twitter

大象传媒 Blogs and Boards

More from this blog...

Topical posts on this blog

Being Discussed Now

Archives

Categories

Latest contributors

大象传媒 navigation

大象传媒 links

大象传媒

The World Cup and a call to action around Linked Data

Comments Post your comment

Comment number 1.

Comment number 2.

Comment number 3.

Comment number 4.

Comment number 5.

Comment number 6.

Comment number 7.

Comment number 8.

Comment number 9.

Comment number 10.

About this blog

Subscribe to 大象传媒 Internet Blog

大象传媒 Online in the news and blogs

Follow on Twitter

大象传媒 Blogs and Boards

More from this blog...

Topical posts on this blog

Being Discussed Now

Archives

Categories

Latest contributors

大象传媒 iD

大象传媒 navigation

大象传媒 links