The ever-growing amount of media content published each day makes it extremely challenging for human editors to consistently segment and annotate it. However, segmentation and labelling of media content are necessary to make short-form content available. For example, if someone is looking for a particular piece of news from a programme aired one month ago, some pre-segmentation and/or annotation of the news story in the show would be a massive help!
So how can we segment and annotate media content without direct human effort? It probably is no surprise that the answer is artificial intelligence (AI). PhD student, Iacopo Ghinassi, has been working with 大象传媒 R&D on ways to solve this problem as part of our Data Science Research Partnership.
I have been working on a fascinating project that uses AI to segment and annotate TV and radio programmes automatically. The project is part of my that the 大象传媒 sponsors. The 大象传媒 has provided valuable data and continuous support that allowed me and my supervisors (Dr Huy Phan and Prof Matthew Purver) to investigate new ways of automatically understanding the content of media.
'Understanding' is, in fact, crucial to solve the problem of segmenting an otherwise undivided piece of content, such as a news show or a podcast. We aim to segment content by topic, meaning that an automatic system needs to 'understand' when the topic changes. To achieve this, we turn to the branch of AI that is concerned with understanding human language. Sounds and acoustic elements are also explored, but understanding language is crucial if we want to isolate a self-contained section of the programme on one topic and, eventually, label the segment with the topic itself.
In a sense, this is not too different from what a search engine does when trying to return results relevant to your query. That's why our research takes a different direction from previous research on the topic - by investigating models and techniques from AI that are closely connected to . A general understanding of language like this could be a unique way to segment and label content - recognising different topics and the way they appear within the programme. If our algorithm has a good understanding of the content, we can then potentially adapt it for things like automatic summarisation at little or no cost!
at an important academic workshop about the broadcasting industry鈥檚 use of data science, which led to . Another paper is on its way documenting the latest system built with this approach that managed to correctly segment a set of 270 news programmes from the 大象传媒 News Channel more than 90% of the time. This system has been adopted by R&D in a prototype news segmentation system called Yuzu, which will be used to explore potential applications for automatic segmentation.
Much more has yet to come, though! The potential that AI and data science have in helping shape processes and media consumption is, if not limitless, very far-reaching. I鈥檓 glad to have had an opportunity to lay a (small) tile on that path.
- -
- 大象传媒 Media Centre - 大象传媒 and UK universities launch major partnership to unlock potential of data
- 大象传媒 R&D - Artificial Intelligence & Machine Learning
- 大象传媒 R&D - Natural language processing
- 大象传媒 R&D - Developing automated user generated content filtering tools for news events
- 大象传媒 R&D - Creating automatic video summaries with text queries
- 大象传媒 R&D - Using Algorithms to Understand Content
- 大象传媒 R&D - Content Analysis Toolkit
- 大象传媒 R&D - Snippets