The rising adoption of Amazon's Alexa and Google Assistant brings a lot of amazing possibilities for developers.

I'm going to show you the basic concepts of building voice user interfaces and how to build a simple Alexa skill.

And since there's plenty of "hello world" Alexa tutorials on the internet, we're going to build something more interesting. Something that you can literally play with.

The Idea

While voice-only apps bring a lot of limitations that make them difficult to use, adding voice interface to existing products can give amazing results.

A few weeks ago, we came up with a concept of connecting Alexa with a PC video game. You can see this article to learn more about how we designed voice user interface for StarCraft.

Ever since we presented the working prototype, a lot of people have been asking to see the code.
Let's implement it!

What's covered:

  • building an Alexa skill using Alexa Skills Kit
  • building backend for the skill using Python and hosting it on AWS Lambda
  • setting up infrastructure in AWS cloud
  • connecting it to StarCraft using pysc2 library

What's not included:

  • to keep it short, we'll build it only for our own testing - publishing it in Alexa store requires a bit more polishing and more complex infrastructure
  • complex dialog flows, we're focusing on simple commands instead

Architecture

Amazon Alexa Skill Architecture

Alexa skills are built of two parts:

  • "Frontend" responsible for initial processing of user's input and translating it into certain actions in our code. This is handled by Alexa Voice Service, but we need to provide it with interaction model and sentence samples.

Alexa Voice Services

  • "Backend" invoked by AVS when it recognizes an action to be performed. That's where the more complex processing happens. We're going to run it using AWS Lambda, for simplicity.

In order to connect to a StarCraft game, we'll also need an agent app running on our computer.
It's going to connect to a game using its API and execute commands as the player.
To connect the lambda function with the agent app, we're going to use Amazon's SQS. For demo purposes it will do more than fine.

Setting Up a New Skill in Amazon's Console

To build the skill, we're going to need an AWS Account and Amazon Developer Account linked to it.
You can simply sign in into Amazon Developer site with the same email address you're using for AWS and it will guide you through the initial setup.

Once logged in, go to Alexa Skills Kit tab and click "Create Skill" button.
The initial config is pretty straightforward - all you have to do is to enter a name and confirm creating a new skill (make sure to keep the model set to "Custom Skill").

Creating an Alexa Skill in Alexa Skills Kit

Once created, you'll be redirected to skill's editing page.

Skill Interaction Model

Setting up Invocation name

In order to recognize which skill it should invoke, Alexa requires the user to include skill's "invocation name" when giving a command.

"Alexa, tell <> to train 5 more SCVs"
"Alexa, tell <> to order a pizza"

When the skill is recognized by invocation name, the remaining part of the input (the actual command) is interpreted in the context of the skill.
Side note: Alexa currently provides an experimental feature that lets the user drop invocation name, but it's in early beta so we're going to omit it for now.

First, let's go into the Invocation tab on the left.
That's where we can set up the invocation name for our skill.
To keep it in StarCraft universe, let's use "captain Kate" and save the model.

Setting invocation name for Alexa skill

Setting up Intents

Skills are built around intents, which are roughly actions that the user can perform.

For each intent, we need to provide a few sample sentences (called utterances) that the user can invoke it with.
Utterances don't need to exactly match what the user says, but the more of them we provide the higher the chance that AVS matches users' request into the right intent.

Intents can also contain named slots (parameters) that will be extracted from the sentence if possible and provided to the backend in a parsed form.

Let's create a sample intent for creating Marines and sending them out to the enemy's base (it will come in handy when you're attacking enemy's base and you realize that your army is not enough).
Click on the "Add" button next to "Intents" on the left.
We're going to create a Custom Intent called "BuildMoreInfantry" (that's the intent identifier that we're going to get in the backend).

Creating an intent for Alexa skill

Now, we need to build a list of sample sentences that can be used to invoke this intent.
We're going to allow the user to optionally provide the number of units to create.
In applicable utterances, we're going to include {count} in places where we expect the number of units.

Let's add a few.

send help
send backup
send {count} more marines
train {count} more marines

Also, in the "Intent Slots" section, make sure to add a slot called "count" and mark it as AMAZON.NUMBER. This will instruct Alexa to look for numbers in this slot and parse them as integers before passing to our backend.

Adding utterances to Amazon Alexa Skill

For now, this will be enough.
Click the "Build model" button at the top to compile the interaction model.

Setting up AWS Lambda function

Now, it's the time to add a backend to our skill.
Sign in into the AWS Console and go into Services -> Lamba.

We're going to go with "Author from scratch" option.
Name the function whatever you'd like and select "Python 3.6" runtime.
When asked about "Role", select "Create a custom role" and create a new IAM role for the lambda function. We'll use it later to set up permissions for the lambda to access SQS.

Creating AWS Lambda function for Alexa Skill

Once in lambda function editor, we can add "Alexa Skills Kit" as a trigger.
You'll can configure it to accept requests only from the skill we're building by filling the skill id field with id you can get from Amazon's console.

Add Alexa Skills Kit as Lambda trigger

Once that's configured, we'll need to provide function ARN as an endpoint in Alexa Skills Kit.
Go back to our skill in https://developer.amazon.com, open the "Endpoints" tab on the left and enter ARN into the "default region" field.

Setting up AWS Lambda endpoint for Alexa Skill

Setting up AWS SQS

To communicate with the agent app ran on our computer, we're going to use Amazon SQS.
To create the queue, let's open up AWS Console once again and go to Services -> SQS
We don't need anything fancy, so we can just create a queue with default settings.

Then, save the queue URL for later use (available in the details section at the bottom of the page).

Setting up AWS Lambda permissions

Once we have AWS SQS and Lambda set up, we need to configure proper permissions for lambda function to access SQS.

To do that, go to IAM section inside AWS console, select the role that you're created together with lambda function and click "Attach Policy" button. Then, pick "AmazonSQSFullAccess" from the list and click save to apply new permissions.

Building Lambda function

Now, its time to build a lambda function that will handle Alexa requests and transform them into messages sent to the queue.

Let's open up our lambda editor and start building the backend.
Alexa triggers lambda function on multiple events, but we don't need to handle them all.

What's interesting for us is the situation when user asks our skill for a certain action. That's when we'll get and IntentRequest

def lambda_handler(event, context):
	request = event['request']

	if request['type'] == 'IntentRequest':
		intent = request['intent']
		if intent['name'] == 'BuildMoreInfantry':
			build_more_infantry(intent)
			return build_response('Roger that')

		# Default intents provided by Amazon
		# We need to implement them
		elif intent['name'] == 'AMAZON.FallbackIntent':
			return build_reponse('I didn\'t understand that')
		elif intent['name'] == 'AMAZON.HelpIntent':
			return build_response('Ask me to send more marines')
		elif intent[name'] == 'AMAZON.CancelIntent' or intent['name'] == 'Amazon.StopIntent':
			return build_response('Goodbye')

We'll also define a utility function to provide simple response, using the schema required by Alexa. It instructs Alexa to speak given text and terminate session (it means no further dialog, just execute the command).

def build_response(text):
	return {
		'version': '1.0',
		'response': {
			'outputSpeech': {
				'type': 'PlainText',
				'text': text
			},
			'shouldEndSession': True
		}

And finally the function that passes the command to our message queue.
It also takes care of pulling the number of marines to be created (and defaults them to 5 if not provided)

import boto3
import json

def build_more_infantry(intent):
	count = intent['slots'].get('count', {}).get('value', 5)
	request = {'command': 'build_more_infantry', 'count': count}
	
	boto3.client('sqs').send_message(QueueUrl='<<QUEUE_URL_HERE>>', MessageBody=json.dumps(request))

You can now save the function and go into Amazon Developer console to give your skill a try. If you have an Alexa-enabled device connected to the same Amazon account you're using for development, you can use it for testing.

Just say "Alexa, tell captain kate to send 5 more marines".
She should respond with "Roger that".

When you go into the SQS console, you should see messages arriving in the queue.
Now it's time to build the agent app and consume them.

Building the client application

StarCraft provides a protobuffers-based api for connecting bots to the game.
We're going to use a very nice wrapper around it https://github.com/Dentosal/python-sc2

pip3 install sc2 boto3

To run the game with bot enabled, we're going to use features provided by the library.

import os
from sc2 import run_game, maps, Race, Difficulty
from sc2.player import Bot, Computer

run_game(maps.get("Abyssal Reef LE"), [
    Bot(Race.Terran, Assistant()),
    Computer(Race.Zerg, Difficulty.Easy)
], realtime=True)

class Assistant(sc2.BotAI):
	async def on_step(self, iteration):
		pass

To run it, you first have to install the Abbysal Reef LE map into your StarCraft directory.
You can download it from here, it's in "Ladder 2017 Season 1" pack.
https://github.com/Blizzard/s2client-proto#downloads

When you run the assistant using:

python3 assistant.py

you're going to enter a game against computer.

Our "bot" won't do much though. It's time to implement the logic for our intent.

import boto3
import json
from sc2 import constants

class Assistant(sc2.BotAI):
	def __init__(self):
		super().__init__()
		self._sqs = boto3.client('sqs')
		self._queue_url = '<<QUEUE_URL_HERE>>'
	        self._remaining_infantry_to_send = 0

	async def on_step(self, iteration):
		command = self._receive_command()
		if command is not None and command['action'] == 'build_more_infantry':
			await self._build_more_infantry(command['count'])

		if self._remaining_infantry_to_send > 0:
			await self._send_more_infantry()

	async def _build_more_infantry(self, count):
		# Find player's barracks and schedule training of {count} infantry
		barracks = self.units(constants.BARRACKS).owned
		if not barracks.exists:
			return

		for i in range(count):
			await self.do(barracks.random.train(constants.MARINE))
		self._remaining_infantry_to_send = count

	async def _send_more_infantry(self):
		# If there are remaining soldiers requested by players
		# Find the idle one nearest to the barracks and send them to the battlefield
		barracks = self.units(constants.BARRACKS).owned.first

		idle_infantry = self.units(constants.MARINE).owned.idle
		if idle_infantry.exists:
			idle_infantry_to_send = idle_infantry.closest_to(barracks)
			target = self.known_enemy_structures.random_or(self.enemy_start_locations[0]).position
			self.infantry_to_send -= 1
			await self.do(idle_infantry_to_send.attack(target))

Now just run it from the terminal again

python3 assistant.py 

and have fun playing with your Alexa :)

Running StarCraft game together with Amazon Alexa

That's just a skeleton that you can easily extend with more features.

Do you have any other VUI inspirations worth sharing?
Feel free to post them in comments!