Archives for July 2010
Prototyping Weeknotes #24 (23/07/10)
The week starts with an all team meeting with Matthew, it's our chance to give feedback on the departments plans and strategy. Some of us are spending this week defining the second screen work area once and for all. We're doing a research trawl for existing and new projects inside and outside the ´óÏó´«Ã½, we're consolidating all the ideas we've had or been told about, we're gathering research requirements and we're drawing the tech landscape. Tony's back from his trip and straight away gets stuck into getting the mind-reading headset working; Theo is continuing with some wire framing for an iPad passive display idea. We've received more positive feedback on Twitter Zeitgeist, appropriately enough, in the form of tweets.
Prototyping Weeknotes #23 (16/07/10)
Monday. We really want to launch Zeitgeist but we’ve got to just get some final ´óÏó´«Ã½ clearance. Apart from that it’s working brilliantly and Theo’s blog post is ready to publish. Duncan’s having to fix up an old prototype because it’s being taken on by Audio & Music and they want to use our prototype as a guide. In the afternoon he’s installing our smart energy kit in the office and connecting it to his laptop, the printer, the water cooler and the fridge. Plus we’ve got a wireless doorbell. I’m writing some notes for Wednesday when I’ve agreed to speak on camera about innovation. Regretting it. Sean writes an XMPP bot to relay to the MythTV socket interface so it can change channel using IM and Duncan converted his Strophe client to talk to the bot. So now we can remotely control MythTV from our iPhones and other web clients.
Zeitgeist - the most shared ´óÏó´«Ã½ links on Twitter
is a prototype to highlight the most shared ´óÏó´«Ã½ webpages on Twitter, a digest to link people to the hottest ´óÏó´«Ã½ pages. The project is part of a larger area of exploration to see how the ´óÏó´«Ã½ can use real-time trending data to enrich user experiences. One of our recent projects shows how the artists played on ´óÏó´«Ã½ radio are trending on other music services, such as and .
We developed Zeitgeist as a simple information source for users and to provide insight into users' interests and behaviours for our production teams. There are some interesting commercial alternatives available such as , , and , which are worth checking out but we had some specific requirements for our prototype.
The system combines a custom built ingest chain using to search for tweets containing a ´óÏó´«Ã½ URL. As it's running in real-time these links come and go depending on what Twitter users are talking about. You can see the 'liveness' in the view or take a broader view of the .
Zeitgeist uses the web page's URL and metadata to determine where it comes from and assign it a category, e.g , , or . These give links a context for the user and a means of navigating deeper.
The links are ranked by a tweet count (including retweets) for the chosen time period. Each entry details the page title, category, media type, short description and when it was first tweeted. The date of publication is indicated where available as it's not just new links that seem to get picked up on Twitter.
We have a different view for ´óÏó´«Ã½ employees (shown below), which allows us to see; the tweet history of each page, a full list of tweets, most retweeted messages, hashtags and keywords. We are unable to show this to everyone as the messages would need to be moderated.
We use the Twitter streaming API to access the Gardenhose sample stream, which provides a subset of the full Twitter message stream, at a rate of about 100 messages per second and to track "´óÏó´«Ã½" as a keyword. These messages are then fed into a pipeline of processes written in connected by queues provided by , a fast and reliable messaging server.
These are the stages that each incoming tweets goes through:
- Twitter combines retweets with it's original tweet, these are split to deliver both messages to the pipeline
- A tweet from the API contains a lot of extraneous data which needs to be removed, such as the user's page background colour
- Links in the message are extracted and resolved following through redirections and expanding shortened links, provide a for this
- Only tweets containing links to ´óÏó´«Ã½ pages are kept. Automatically generated ´óÏó´«Ã½ tweets from accounts such as are filtered out and links to the are also removed as they skew the results
- These are saved to the database
- The link category is determined by its domain and in-page metadata
We split these steps into separate processes for two reasons: it's easier to develop and test a process if it does only one thing; and more importantly, it allows us to balance different parts of the system depending on load. For example, there is only one process required to strip data out of tweets, but ten to resolve the URL. By load balancing this way, we can maintain a steady throughput of messages that does not get overloaded at any point.
To make Zeitgeist, we have had to handle large data sets at high speed. As a rough guide, the Zeitgeist ingest chain handles about 300,000 tweets an hour, of that 900 contain links, 500 of which link to the ´óÏó´«Ã½. Finally, short lists work well as there's a steep drop-off of tweets lower down the chart and as you might expect the majority of links point to ´óÏó´«Ã½ News articles.
Zeitgeist is now up and running for a limited period and we trust that you'll find it an interesting resource. We think a system like this could feed into ´óÏó´«Ã½ Search as a ranking algorithm, as an additional real-time feed for News recommendations, or as a 'news on the move' mobile service. In any case it shows how audiences can help shape and prioritise content.
Visit the ´óÏó´«Ã½ prototype
Cross Post: Sports Metadata post on Internet Blog
The World Cup and a call to action around linked data
In the first post John O'Donovan outlines a call to action around the way that linked data can be used to drive the website, and not just this sports one- this really is a statement of intent to make the web work better at the ´óÏó´«Ã½.
´óÏó´«Ã½ World Cup 2010 dynamic semantic publishing
In the second post Jem Rayfield details the semantic capabilities of the new system, and the changes it makes to the work flow. He also does us the huge favour of including a glossary of semantic web terminology as used in our development work.
Recruiting: Lead Technologist (Audio)
This is an exciting opportunity for someone with a background in audio R&D and a proven track record in leading R&D teams to play a key role in the development of the new lab and to shape the ´óÏó´«Ã½'s future audio R&D work. The initial areas of research are likely to include periphony, spatial audio and Ambisonics, and the related areas of room acoustics, but could expand to include any aspects of media-related audio R&D.
As the ´óÏó´«Ã½ R&D team in our North lab builds out its capabilities and facilities we need a world class technical leader to focus our audio work. This role is critical to developing the excellence in audio we want at the heart of our operation, and it'll sit right in the nexus of industrial and academic partnerships that will span the region, the UK and the wider industry.
Prototyping Weeknotes #22 (09/07/10)
We discuss exactly what we're going to build for RadioDNS, the hardest bit looks to be the DNS parts and we've got a couple of options for that. Duncan and Tristan head over to W12 to meet Jigna and scope out the TV Power Meter project for an energy meter display for IP-connected set-top boxes. Looks pretty straightforward, Duncan will be doing all the development work, Tristan is just helping out on the scoping.
Prototyping Weeknotes #21 (02/07/10)
The week starts with a team planning session. It's agreed that we'll finish off Zeitgeist and write it up, implement Theo's child-friendly designs for Digital Friendship and start on the communications material for that project. There are a lot of small bits of work happening - either new spikes or wrapping up old work - but probably only enough to keep people busy this week. George, Tris and I meet to do some longer term planning; it's our first in a while so the focus is on capturing all the projects and debating their status. There's some lively debate and we eventually reach a consensus on what we'll do next. Fortunately, the connectivity issues we were suffering with last week have improved; something somewhere has clearly been turned off, but we're not sure what.