´óÏó´«Ã½

Prototyping for Voice UI

Some of the work we've been doing to explore devices with conversational and spoken interfaces.

Henry Cooke

Henry Cooke

Senior Producer & Creative Technologist
Published: 4 November 2016
A woman is holding a phone in front of her mouth and is speaking into it.

is a project which is exploring spoken interfaces - things like , or . Devices whose primary method of interaction is voice, and which pull together data and content for users in a way that makes sense when you’re talking to a disembodied voice.

We started at the beginning: what are the elements we need to be able to construct stories for voice UI? We identified a few key elements that, when combined, would allow us to build some example scenarios.

Sticky notes on a wall with a variety of different values under columns Person, Context, Emotional State, Programme, Devices and Tone of Voice.

Some of these categories are familiar – people, context, devices. The really interesting ones when thinking about VUI are the emotional state of the person and the tone of voice being used by the device. Speech, unlike screen interaction, is a social form of interaction and comes with a lot of expectations about how it should feel.

Get it wrong, and you’re into a kind of social uncanny valley. In fact, while we were developing this method, more than once we ran into a ‘creepiness problem’ – some of the scenarios, when worked through, just felt, well, creepy. An attentive, friendly personal assistant, present throughout the day, eventually felt invasive and stifling.

A speaking toy for a child, connected to an AI over the internet, swiftly throws up all sorts of possible trickiness. Setting the context and tone for a conversation allows us to think more clearly about how it could unfold.

We filled out the categories with some useful sample elements and set about combining and recombining across categories to create some trial scenarios. This step is intended to be quick and playful - given the building blocks of a scenario, which can you mix and match into something which looks worth developing further?

Script Planning

The next step was to write example scripts for each scenario. This was another quick step; using different coloured postits for the parts of human users and talking devices we constructed a few possible conversations on a board. We’re not copywriters by any stretch, but sticking the steps up allowed us to swiftly block out conversations and move and change things that didn’t look right.

Text to speech testing

Once we’d mapped out our scripts, the next step was to test how they sounded; we needed to listen to the conversations to hear if they flowed naturally. Tom created a tool using Python and the OS X system voices which allowed us to do just this. This tool proved extremely useful, allowing us to quickly iterate versions of the scripts and hear where the holes were - some things which looked fine in text just didn’t work when spoken.

Using text-to-speech does end up sounding a bit odd (like two robots having a conversation), but it meant that we didn’t have to get a human or two to read every tweaked iteration of the script and allowed us to listen as observers. It also made us think about the intent of the individual scenarios - is this how an expert guide would phrase this bit of speech? How about a teaching assistant?

The text-to-speech renderings of the scripts (auralisations?) were giving us a good sense of how the conversations themselves would sound, but we realised that in order to communicate the whole scenario in context, we’d need something more. Andrew developed a technique for rapidly storyboarding key scenes from the scenarios which allowed everyone in the team (not just those talented with a pencil) to draw frames of a subject. More on that technique in a later post.

Building on the storyboards and audio, we were able to create lo-fi video demos – – which simply and effectively communicate a whole scenario. Using video enables us to illustrate how a scenario works over time, and allowed us to add extra audio cues which can be part of a VUI – things like sound effects, alerts and earcons.



Final thoughts

We’re really happy with the prototyping methods we’ve developed – they’ve given us the ability to quickly sketch out possible scenarios for voice-driven applications and flesh out those scenarios in useful ways.Some of these scenarios could be implemented on the VUI devices available today, some are a bit further into the future and that’s great! We want a set of tools that helps us think about the things we could be building, as well as the things we could build now.

We think we’ve got the start of a rich and useful set of methods for thinking about, designing and prototyping VUIs and we’re keen to develop them further on projects with our colleagues around the ´óÏó´«Ã½. We’d also love to hear from anyone who finds them useful on their own projects!

Rebuild Page

The page will automatically reload. You may need to reload again if the build takes longer than expected.

Useful links

Theme toggler

Select a theme and theme mode and click "Load theme" to load in your theme combination.

Theme:
Theme Mode: