Interaction Scenario Design and Implementation

Many times it is easier to reach the “Oh!” moment by just watching somebody else do it. For us visual learners out there, this tutorial breaks down the process of creating and implementing a conversation based interaction scenario for a pizza delivery app. Make your app development stand out with a conversational component.

Here are the steps that we cover:

1) Outline an example dialog with a user

2) Populate relevant entities

3) Set up single intent recognition (Single query and response)

4) Use context to support conversation and remember parameters

Whether you’re an iOS or android developer, you should create an app with a voice interface to thrill your users.

A Speech Interface in 3 Steps

Even a capable developer can get fumbled by the components of a full speech interface. Some might think, “Voice recognition sounds good enough,” but that’s just the first step. Voice recognition can fulfill your dictation needs, but when applied to use cases that call for a full voice interface, it’s not only inefficient, it’s the lead cause of users screaming and cursing at their technology. It’s our privilege and sacred duty to set the record straight.

The overall processing flow of a comprehensive speech interface consists of 3 key components:

Speech recognition, speech to text, voice recognition, voice to text and automatic speech recognition (ASR) all refer to the same function: transcribing your spoken words. This first step converts your speech to text so that it can be processed (e.g. dictating an email or text to your mom). Devices need that component before anything else can happen.

(Side note: When working with you have a couple of different options…)

Option 1) You can use our ASR. This technology is based on statistical and acoustic data from over 2.5 billion processed requests to date. Once our clients have an implementation with, we can provide custom language models based on domain knowledge. Dynamic language models can also be generated or selected based on conversation and dialog focus.

Option 2) You can use any 3rd party ASR provider.

The second and most overlooked step is leveraging natural language understanding (NLU) technology. NLU cultivates natural interactions between you and a device; rather than being limited to very basic voice commands or needing to know the magic words to make it work. Let’s break it down:

	Intent Recognition: Needed to understand your user’s intent or meaning.

	Context Management: Needed for multi query cases.

	Conversation Management: Needed to support back-and-forth dialogue.

The speech that has been transcribed into text is still foreign to your technology and it won’t know what to do with it. Step 2 takes your text, which is your natural language input, and turns it into a structured object so that your product will understand. In other words, it uses Intent Recognition to understand what the user is asking for.

(Side note: Through advanced techniques in Machine Learning, can predict variations of what the user might say and still translate the meaning into the appropriate structured object.)

For example, a user could say, “Set thermostat to 72 degrees,” or “Thermostat to 72,” and the words are transcribed and then translated into an easy to rea d – for the device – structured object.

That’s an example of a single query interaction scenario. So what if you want to have some back-and-forth conversation with your device? You’d need context and Context Awareness. A user might say, “Set alarm for 7am,” and their device would respond, “Alarm set for 7am.” When they realize how unpleasant the 7am hour is, they could say, “That’s too early. Set it for an hour later.” And the device would know what their user was asking and respond, “Ok, alarm set for 8am.”

(Side note: has a sophistocated context managment toolset and engine implementation that supports both internal and external contextual data (e.g. GUI, gestures, diagnostics etc.).)

Conversation Management allows for the engine to seamlessly switch between conversation topics, while remembering what you were talking about (just like you wish your girl/boyfriend would). Users can also have clarifying conversations with your product in the same session / conversation across devices.

Now the device can transcribe your words, translate them into a readable structured object so that it can understand the user’s intent and context, and support back-and-forth dialogue – what’s missing? Fulfillment, implementation, follow through. After your device has a clear understanding of what you’re asking, it may need to take action to fulfill your request. Speech interface without fulfillment would look something like this:

	User: Turn on TV.
	*Nothing happens*
	Device: TV is on.
	User: …

So ask yourself, “Am I looking for transcription technology (speech to text)? Or for a back-and-forth dialogue interface that can deliver on my users’ requests?” Because afterall, who wants to live in a world without fulfillment?

Now you are equipped with an understanding of what a full speech interface requires and why it would crumble without each component. Our only hope is that you will take this knowledge and share it with those who think, “Voice recognition. That sounds good enough to me.”

Onboarding Users in the 21st Century

You spend months developing a great app for users to fall in love with. You went the extra mile and made a Getting Started Tutorial to help onboard users that goes beyond the typical UX design. But then those overly anxious users skip the helpful tips, get lost in the options, and then complain that the app was too difficult to use or didn’t work properly.

When it comes to new, exciting technology, who has the patience to read through a tutorial? Our instincts are to explore first and ask later. So why not onboard new users with a they-ask-a-question-they-get-an-answer interface? Here’s what onboarding new users with a full voice interface looks like:

1) Person asks app how to do X. We turn that into actionable data in the form of a JSON.

	resolvedQuery: “How do I create a new task”
	parameters: { helpTopic: “new tasks” }

2) App tells person how to do X

3) Person wants to know more, they ask.

Imagine: your user turns on their new device, they are greeted by a virtual assistant that allows for an interactive question and answer learning session. They ask just enough to get started and then they’re off – enjoying your creation! If they get stuck later on, that virtual assistant is always there to answer with how-to’s and tips.

The assistant can also be updated anytime to include more answers for updates and changes to the app or device. That way you never lose a user when adding new functionality or updating UX interfaces.

What’s the best part? You don’t have to imagine – the technology for an intuitive, affordable speech interface, for any app, device, or website, is here at Raise your user acquisition and customer loyalty today.

User Defined Entities

As an API.AI super user, you already understand the power behind the platform. Your implementation is reaching its final stage, and you have started to ask for more advanced capabilities. Most frequently, developers just like you are interested in being able to customize entities on the user/request level.

Let’s take a look at a quick scenario. You are a developer with an application to control all of the devices and settings in your smart home. Out of the box a user can refer to their objects in generic ways like “turn the lights on in my office”. But - what if the user wants a more personalized experience? If this is the case, the end user must be able to add their own names to refer to specific objects. For example, rather than referring to a generic office entity, they could say “turn on the lights in my mancave”.

In order to work with user entities:

1) Create an entity (e.g. @houselocation and @device entities for a smart home app) Populate the entity with default values (e.g. @houselocation has:Kitchen, Living Room, Bedroom and @object has Lights)

2) Create an intent that uses this entity (e.g. “Turn on the @device in the @houselocation”)

3) Utilize the new entities parameter in the query endpoint. The entities object accepts an array of entities, following the format used here.

The submitted entities will replace entities defined in the developer console just for this one request.

Here’s the curl example:

The specified entity needs to exist in the developer console. The UI does not allow for creating an empty entity, so you must have some default value there.

On the Android SDK level: To use the SDK please use the directive below instead of pulling it from the repo:

Usage example:

Finer grain controls for the entities are coming soon.

Speech Interface: The Remote You Can’t Lose

There’s a word for having a remote control for every possible electronic in your life – convenient. Whether you’re settling down to finish some work or tucked in for the night, remote controls add even more convenience to the technology they’re paired with. Until you lose them, their batteries die, or one system requires 3+ remotes just to turn it on. Remotes were the solution for the 20th century, but now we have something better: speech interface.

What could be simpler than saying what you want to happen and then it happening? Let’s look at the - inevitable - smart home use case. We like the different control panels and various apps for each type of appliance we want to control, but what we would love, what would thrill our socks off, is to just say, “turn off the lights and lock the doors,” from our beds.

This is not a science fiction pitch for future generations. You can enjoy this first-class service today with products like the Ubi and Smarter Shade. Imagine walking into your home and saying, “Turn off the garage light, preheat the oven to 375, and turn on the SF Giants game,” and having the relevant electronics comply with your requests. That would be the real smart tv we’ve been waiting for. Speech interface is beyond convenient – it’s efficient and offers a whole new peace of mind. Try out today to make your all in one remote that’s handsfree and convenient.