65% of people who own an Amazon Echo or Google Home can’t imagine to going back to the days before they had a smart speaker


Did this article interest you?

Want to chat about it?

Breaking the code on voice experience terminology

by | Oct 14, 2018

With any new wave of technology and its adoption by the market, there is a slew of terminology that comes with it. Often different groups use different terms, though they intend to apply the same meaning.  Sometimes it’s a land grab, to become the Kleenex of a space, to make their name the default association of a new experience.

With voice, two players are dominant: Google and Amazon. Though Microsoft and Apple both have their own experiences, Google and Amazon are leading the market, with Google rapidly gaining market share.

This article is meant to bring some clarity, or at least, some understanding for those who are growing their knowledge of the space and those terms that drive planning and execution in the voice channel.

The base is a virtual assistant (Siri, Cortana, Alexa, Google Assistant), without this, many of these devices are rendered useless, or at least significantly less impactful.

Method of interaction is the way in which the virtual assistant can work.  Some devices are capable of several interaction types. These include:

  • Text – words on a screen
  • Voice – oral response from the virtual assistant
  • Images – visual displays on devices
  • Video – motion displays on devices

Devices are the physical things that virtual assistants engage through, using methods that are appropriate or capable by the device itself. Though devices tend to be virtual assistant specific (i.e. revenue generating) they can be generalized into a few key categories:

  • Smart Speaker – Echo, Dot, Google Home, Bose Headphones etc…
  • Smart Display – Amazon Show, Google Home Hub, Google Lenovo Duo
  • Computer Operating system – MSFT, Google, Apple
  • Mobile Operating System – MSFT, Google, Apple,
  • Mobile Apps independent of OS – Google Assistant on iOS, Alexa on iOS etc…
  • Appliances – microwaves, refrigerators, ovens, thermostats
  • Cars – Echo Auto in VW, Toyota and Lexus for instance
  • Clothing – Android Wear
  • Instant Messaging – Facebook Messenger, SMS, Skype, web bots etc…

Beyond this terminology are the platforms specific language on how to plan and execute with them.  We focus on the two primary market players Google and Amazon in this article, though other virtual assistants maintain their own lexicon.

(Below this line is a table with one to one mappings of platform specific terms, if you do not see it, you may be viewing this page on a mobile device where tables don’t layout well for reading.  You can see the table on all tablet and desktop devices.)

Google Amazon
Definition Term Term Definition
An interaction you build for the Google Assistant that supports a specific intent and has a corresponding fulfillment that processes the intent. Action Skill A robust set of actions or tasks that are accomplished by Alexa. Alexa provides a set of built-in skills (such as playing music), and developers can use the Alexa Skills Kit to give Alexa new skills. A skill includes both the code (in the form of a cloud-based service) and the configuration provided on the developer console.
Input that the user provides when interacting with a surface. User Query Utterance The words the customer says to Alexa to convey what they want to do, or to provide a response to a question Alexa asks. For custom skills, you provide a set of sample utterances mapped to intents as part of your custom interaction model. For smart home skills, the smart home skill API Message Reference provides a predefined set of utterances.
A goal or task that users want to do, such as ordering coffee or finding a piece of music. In Actions on Google, this is represented as a unique identifier and the corresponding user queries that can trigger the intent. When using Dialogflow, this refers to the intent mappings you define in your agent. Intent Intents A representation of the action that fulfills a customer’s spoken request. Intents can have further arguments called slots that represent variable information. For example, Alexa, ask History Buff what happened on June third. In this statement, …what happened on June third maps to a specific intent that can be handled by a particular Alexa ability. This tells Alexa that the user wants the skill History Buff to get historical information on a specific date.
An invocation where users utter an Action phrase without an Actions project name Implicit invocation Partial intent invocation A customer’s request that contains the customer’s intent, but is missing a required slot.
The act of starting an interaction with an Action by the user. invocation invocation The act of beginning an interaction with a particular Alexa ability.
When using Dialogflow, this refers to a single turn of a dialog, which consists of a single user query and an agent’s response. Dialogue turn Turn A single request to or a response from Alexa.
Any device that provides users with access to the Google Assistant, including Wear OS devices, Assistant-enabled headphones, Chromebooks, Android TV, Android phones and tablets, Smart displays and speakers, and iPhones. Surface Smart home endpoint An endpoint identifies the target for a smart home directive and the origin of an event from a smart home skill. A smart home endpoint can represent a physical device, virtual device, group or cluster of devices, or a software component.
When using Dialogflow, you can attach a follow-up intent to an intent when you expect some specific user input (for example, “yes”, “no”, or “cancel”) after the parent intent’s response. When Dialogflow receives one of these expected user inputs, it automatically triggers the corresponding follow-up intent. follow-up intent Prompt A string of text that should be spoken to the customer to ask for more information. You include the prompt text in your response to a customer’s request. Types of prompts include: Open endedMenu styleRe-prompt, and Implicit confirmation (Landmarking).
An invocation where users use the Actions project name explicit invocation Full intent invocation A customer’s request that contains all information Alexa needs to make the request actionable.
How all these terms for interaction or experience translate into our modern technology can be confusing.

One of the many benefits of leveraging Voicify as your voice experience platform is being able to focus on the experience, not on the devices, interactions and technical requirements.

Voicify’s power is being able to take your content – whether stored in our system or referenced form another – and delivering it across the virtual assistants and the devices they execute on.  Deciphering the matrix of device, assistant, method, content and engagement is not where we think marketers should be spending their time, let us do that. You work on making the experience valuable and exciting for your audience, we’ll make it work everywhere.

We thrive on establishing your voice experience