How Voice UI’s Entity Extraction removes Cognitive Friction in Digital Experience

by | Jan 12, 2023 | Brand marketing, Content management, Insights, Voicify News

Companies spend millions of dollars to understand how customers think so they can design digital experiences that are simple and intuitive to use. This was the case when desktop web was the dominant screen and even more so after mobile overtook the desktop in 2014.

When the screen size is reduced, the thought put into the user experience design needs to increase.

Until recently, the only consideration for the desktop or mobile interface has been graphic. A significant enhancement to the DOS line of the early computers, Apple introduced the first commercial, personal computer with a GUI in 1983.

Yes, 1983. For 40 years, digital experience has been dominated by the graphic user interface. And while GUI’s have become elegant and generally easy to (or we have all been trained), it isn’t always the most efficient or intuitive option.

A voice interface, however, is intuitive and natural. Speaking is one of, if not the oldest form of human communication. And now, technology (like Voicify) is bringing it to the digital experience.

So what is entity extraction?

According to

Entity extraction, also known as entity name extraction or named entity recognition (NER), is an information extraction technique that identifies key elements from text then classifies them into predefined categories. This makes unstructured data machine-readable (or structured) and available for standard natural language processing (NLP) actions such as retrieving information, extracting facts and answering questions. 


So how does it work with voice?

When the user speaks to the digital asset (website, mobile, kiosk, telephony) their words are turned into text (a process called STT or speech to text). Once this process is completed, Named Entity Extraction is applied.

Consider the process of adding grocery items to your digital cart, whether Instacart of a specific grocery brand, the process is tedious. This isn’t to say the GUI hasn’t been thought through, only that there are limited options when you employ a graphic user interface.

 Using your voice, you can say things like:

 “Add five ripe bananas, JIF crunchy peanut butter, a pound of Tillamook unsalted butter and a box of rigatoni to my cart.”

Without entity extraction, this statement cannot be utilized by software in a meaningful way. But entity extraction is able to map these terms to a predefined natural language model and put the utterance to use, illustrated below.


For the user, the process of stating what they need is simpler using speech, thus removing cognitive friction entirely. It is exponentially faster and more convenient for the user to speak their needs than navigate a UI like this:



So a natural language model is needed?


Yes, it is. And the good news is most brands have the beginnings of this already. For instance, when I do a text search on Home Depot’s site or mobile app for a refrigerator, I am provided with over 2k results. I then need to use filters to narrow the options more. These are the filters and they are a lot to cognitive process:



This well-thought-out structure and can be translated into a natural language model easily (especially in Voicify).



But what if people don’t use these exact terms?

Synonyms will likely be needed for some or all of the base entities in the model, but those, too, can be easily added or modified over time. For instance, one might model the cost range of $0-$100 as ‘cheap’ or ‘inexpensive’, so when that term is used, it is understood.

The data to complete the model can be determined through search analytics and data or even customer interviews. Some models exist in Voicify already for you and pre-made lists or data can be found or purchased to be imported.

The process isn’t complex.

And if the user doesn’t provide enough information to do anything with?

Another UX strategy can be applied to voice, it is progressive disclosure, and applied to voice, it is simply asking for the data you need, when you need it, and not before.  You can read about this approach in a recent post titled “Voice UI + Progressive Disclosure Maximizes Multi-Modality and Speeds Order Building Processes for QSRs

 It’s worth noting that when the configuration is complete, layering the voice UI and digital assistant into channels like the web or mobile is also very simple. Voicify’s SDKs and APIs allow you to take the voice application you built and deploy it anywhere it makes sense. For some industries, web and mobile are the primary endpoints. For others, drive-thru, telephony, or even metaverse make the most sense. With Voicify we are both equally passionate and agnostic about where the application runs – we make sure it does.

Keep in touch

Please subscribe to our monthly newsletter if you’d like to keep up with the work Voicify is doing and our most current content and events. If you have questions about how Voicify can help you deliver custom voice experiences, please don’t hesitate to contact us for a no-cost, no-commitment conversation.

Time to talk? We’d love to. slot gacor