大象传媒

A new dataset to improve TV production using artificial intelligence and machine learning

A TV shoot designed to be edited by a machine learning program or artificial intelligence. Our open-source dataset is for AI production and intelligent cinematography researchers.

Stephen Jolly

Stephen Jolly

Senior R&D Engineer
Published: 17 May 2022

In November 2021, the AI in Media Production team at 大象传媒 Research & Development carried out a television shoot with a difference. A typical shoot produces audio and video that are then edited together to make a programme.

This time, we intended to put that material into a dataset for academic researchers. Greater access to TV programmes' raw material would help many academic fields. One that is of particular interest to us is 'Intelligent Cinematography'. Its researchers are looking at how artificial intelligence and machine learning could help with production tasks like shot framing and editing.

Of course, the 大象传媒 makes TV programmes all the time, so it is reasonable to ask why we needed to do a special shoot. We have joined other 大象传媒 productions in the past and gained valuable material for our own research. This is a less useful approach when it comes to creating a dataset to share with others, though, for a few reasons:

  • We need to ensure that we have permission to share the content with others. We require consent from all the people involved in making the content - which can be complicated with material we don鈥檛 own or commission ourselves..
  • We want to control the script and direction of the programme. We want it to contain scenes that are easy for AI systems to process, and some more challenging ones.
  • Most importantly, we needed to shoot the programme in a very different way to normal television.

To explain this last point - we wanted our dataset to support research into ways to frame shots. In a normal TV production, the director tells the camera operators how to frame the shots. This bakes the framing decisions into the recording, and it is not possible to revisit them.

Instead, we used four static ultra-high resolution cameras with wide-angle lenses. We set these up to record the whole scene at once. This approach allows the framing of the shots to happen later on by cropping. Using four cameras lets users and algorithms select different perspectives in post-production.

Camera crew using equipment in masks

The programme is a comedy about a celebrity quiz show, with a script by a professional writer. The quiz show host and contestants are played by professional actors. The participants in the show are school friends who have become famous in later life.

They carry out challenges and talk about their pasts and their relationships with each other. The early scenes are calm, making life easier for any algorithms trying to make sense of them. The action builds throughout the show, and the final scenes are chaotic. This is an intentional challenge for the algorithms that will analyse them.

We have included the following artefacts from the shoot into our dataset:

  • The video from every camera, with synchronised audio from on-camera microphones. We have provided the video in its original resolution and quality. We have also included a low-resolution, low-quality version. We hope this will be easier to store and review on less capable computers.
  • Audio from the microphones worn by the actors. We have provided this in unprocessed, processed and stereo down-mixed forms. (The processing is noise reduction, plosive suppression, loudness levelling and trimming to length.) The unmixed audio may be useful for identifying the people who are talking to frame them. The mixed audio can act as the soundtrack for edited videos.
  • The script. (Users of the dataset should bear in mind that some dialogue was ad-libbed or improvised.)
  • A human-edited version of the programme for reference and benchmarking purposes.
  • Various useful kinds of metadata. One example of this is 鈥渟hot logging鈥, which identifies the audio and video from each take. It also provides basic guidance about which takes to use. We have also included AV sync metadata to help align the audio and video.
  • Documentation to help users better understand the material and the shoot.

We have compiled a summary of the terms of use under which we have released the dataset. If you would like to use the dataset in your own research, you must accept those terms by downloading and returning the form on that page. The first release of the dataset focuses on Intelligent Cinematography research, and the license terms reflect this. In future, we would like to open it up to support researchers in other fields. We want to hear from people at universities, reputable academic institutions and other relevant public organisations where this data would help. If that's you, please email us at oldschool.dataset [at] bbc.co.uk and let us know.

Rebuild Page

The page will automatically reload. You may need to reload again if the build takes longer than expected.

Useful links

Theme toggler

Select a theme and theme mode and click "Load theme" to load in your theme combination.

Theme:
Theme Mode: