If you look at the Alexa Skill store, you'll find plenty of games. Most of them are however either simple trivia games or text-based adventure games from the 70s ported into voice interface.

For some time we were wondering if it's possible to build something beyond that.

When we attended the Talk to Me, Berlin hackathon last week, we came up with a concept (and with a working prototype) that showed how VUIs can fit into video gaming.

When Voice Works Best

Due to their invisible nature, VUIs are very limited compared to graphical interfaces.

One of the things that we discovered when working with them is that they do a really great job when they provide a hands-free experience to existing mobile or desktop apps.

This allows us to multitask and perform simple actions without shifting the whole attention into them.

Whether you're driving a car or making yourself a sandwich, it's usually more convenient to talk to a virtual assistant rather than to stop what you're doing and grab your phone.

And if the assistant fails to understand you... well, you can always fall back to the phone option.

Now, what's the hands-free experience for video games like?

The Marshall Experience

A common theme for strategy games is commanding an army.

As a player, you almost always see game's world through the eyes of highly ranked army officer, like a general for example.
You maintain full control over your units and buildings and it's up to you what strategy you'll use.

But there's one thing that's stopping you from feeling as a general - the chain of command doesn't exist.
You have to manage all the units by yourself, without the possibility to delegate executing parts of the strategy to your subordinates.

We decided to "fix" this by turning Amazon Alexa into a smart in-game assistant.

To explore idea further, we picked StarCraft II. There were a few great reasons for that:

  • It's futuristic, just like virtual assistants.
  • It's a strategy game that's focused on military, so giving commands fits in perfectly
  • It requires a lot of multitasking, with great opportunities for bringing in hands-free interactions
  • And finally, it has an API that allows us to connect to the game

Picking the Right Use Cases

The key to building a great voice user interface is focusing on building the right features. To keep it simple, we focused on one particular type of situation.

When fighting with the enemy, the player needs to focus on the combat.
At the same time they need to be actively involved in things that are happening at their base.
They need to keep training new soldiers, constructing new buildings and checking if workers are collecting the right resources.

Even though such multitasking is the domain of professional StarCraft players, VUIs bring a huge opportunity to let the casual players focus on what's important.

Simple Actions vs Complex Tactics

During research, we quickly learned that executing single in-game actions with voice, while appearing quite cool, is not very handy.
It's way faster to perform actions like selecting units or directing them to attack with mouse and keyboard.

What made a lot more sense was using voice as a shortcut for more complex actions. Sounds a bit like Apple's new feature for Siri, doesn't it?

Example shortcuts that we came up with were:

Training more troops and sending them to the battlefield once ready. That's especially helpful when you realize that you realize that the army you sent to the battle is too small. You can focus on the fight, knowing that more of your soldiers will show up soon.

Evacuating workers to the base. When you see the enemy marching towards you, you can quickly order the most vulnerable units to back off to the safety. This can also extend to moving marines into bunkers, to give them advantage over enemy's units.

With such shortcuts, the player can use voice to execute tactics without focusing on every single step.

Meet Kate Blackwater

Finding the right use cases is a part of designing a good VUI.
Another aspect worth mentioning was designing the personality of our virtual assistant.

When people use you product on desktop or mobile, they perceive the product through its colors, graphics and copy.
With voice interfaces, it's all about how they speak, what words they use and what personality they have.
While such details aren't that important for utility voice applications, they turn out to be crucial for entertainment products.

Building the Right Persona

Alexa didn't fit well into the game universe. People see her as a warm and cheerful "person", which is exactly the opposite of what you'd expect from a military officer.

To better fit our Voice User Interface into StarCraft, we designed an alternative "virtual assistant" that replaces Alexa when you're controlling the game.

Designing a VUI persona

Kate Blackwater is your subordinate, a captain that you can delegate some tasks to.
She's stationed at your base, executing your tactics and keeping track of what's happening there.
All you have to do is to tell her what needs to be done.

Alexa, tell captain to send more soldiers
Alexa, tell captain to evacuate everyone
Alexa, ask captain what's the status

She's very professional in her job. She shows you the respect that you'd expect from a subordinate and doesn't try to be funny.
She's a bit cold and replies in a short and precise manner.

Roger that Marshal
Affirmative

Alexa's Voice is Too Warm

When testing the initial prototype with other people, the greatest finding was that Alexa's voice wasn't right for Captain Blackwater.
With such a rich personality, it became confusing for people when replies sounded like warm and funny Alexa.

Fortunately, Alexa allows to play short recordings, instead of reading out loud response text with her voice.
This allowed us to use other Text-to-Speech services and prepare responses that sound more like Kate Blackwater - they feel cold and precise.

Live Demo

Pretty awesome, isn't it?

If you're interested in implementation details, see how to build this Alexa Skill using Python and AWS Lambda.

Lessons Learned

To sum up our takeaways:

  1. Providing shortcuts for complex commands works a lot better than simple voice remote control. The latter can often be achieved faster in other ways.
  2. The right assistant personality can make a difference, especially in VUIs made for entertainment. Without visual clues, it's the major aspect of how users will perceive your voice assistant.
  3. Multimodal experiences have a huge potential. While voice assistants enable a lot of new possibilities, it's worth to think of them as complementary to other types of interaction rather than the only way to interact with a product.

Do you have any other inspirations worth sharing?
Feel free to post them in comments!