Interested in speaking with one of our Voice experts? Fill it out and we’ll be in touch shortly.

For tech, 50 Million is a special number and voice has blown through it. This is why it’s important.

by | Mar 14, 2019

Last week Voicify and released the 2019 Smart Speaker Adoption Report.  There are tons of metrics and analysis for you to sift through, I highly recommend it. Among other salient insights, we report that at the end of 2018 there were 66.4 million smart speakers in use across the United States.  Which means that over 25% of American adults have access to a smart speaker.

Why is 50 million an important number with technology?

The race to 50 million has long been a metric of adoption of technology across the board. The root of the metric is with Metcalfe’s law which states the effect of a network is proportional to square of the total users.

That means with 50M users you have a connected network of 2.5 quadrillion, or 2,500,000,000,000,000 possible connections.

It stands to reason that physical goods are more difficult than digital goods.

For example, it took the telephone 50 years to reach 50M. (Took 3 years to reach 500k!) The automobile crossed 50M 62 years after it’s invention.  Television took 22 years, the computer, 14 years and cell phone 12 years.

What all these have in common is they act as ‘middleware’ for something else.  A TV without programming and viewer, just a complicated box.  The same is true of a phone, computer and a car.

With the advent of digital goods, the distribution became simpler and speed applied to Metcalfe’s law allows for faster adoption and use.  So, while it took the iPhone 45 months to capture 50M users it took Angry Birds 35 days.  While it took 7 years for the internet (www) to reach 50M, it took, 19 days.

So when you think of messaging, communications, utility, commerce, organizing and influence, having 50 million people all using the same ‘thing’ is a powerful threshold. 


Why are smart speakers an important segment when voice assistants are available on billions of devices?


Feeding on the power of 50 million, once you have the hardware, distribution of software is considerably simpler.  So ,when Google announced that Google Assistant would be on over a billion devices by the end of 2018 and Amazon touts over 100 million devices where Alexa is pre-installed, the 50 million number is overshadowed, no?

Actually, no.

First, consider Bixby which is Samsungs horse in the Assistant War.  Samsung has the power to release Bixby on 500 million devices annually – and yet, Bixby is relegated to the ‘other’ category in most reports on voice assistant market share.

So the punchline is this: Smart Speakers are devices specifically designed as a vehicle for Voice Assistants.  Without the Voice Assistant it is rendered useless (where as a mobile device can do lots without an assistant and a connected fridge, still keeps food cold).

The adoption of smart speakers are a direct representation of voice assistant acceptance and use.  And the device users themselves create a control group from which insights and analysis can be isolated on voice assistants since they are joined at the hip.

It is both a hardware and a software story. 

If smart displays are on the rise, doesn’t that support existing screens versus voice?


The conversation about voice takes a lot of different directions.  The first smart speakers were just that, speakers.  The output met the input: voice.  Ask for something orally, receive something aurally. 

Even connected homes (the societal training ground for voice assistants), when completing your voice command such as, “Alexa, turn the AC to 72 degrees” offer an aural response “OK, Jason, the AC is turned to 72.”

But the cornerstone of voice assistants is the users voice, not that of the assistant.  In the beginning there was a single way of having a conversation, voice to voice.

However, humans use more than just their sense of hearing to have conversation we see, smell and often touch in order to determine meaning and intent.  Digitally, we call these modalities. (See Understanding Multimodal for more detailed information). Leveraging our sense of sight to communicate meaning and intent is a critical advance for voice assistants.

The ecosystem of visual devices is growing quickly.  Google Home Hub launched in October 2018.  By the end of the year it had 1.2% of the market. (!!!!) Overall the smart display segment grew an astonishing 558%. 

Alexa can now be accessed with Firestick, turning your TV into a giant virtual assistant.  At CES Capstone showcased their Connected Mirror, not only bringing an assistant with you into the bathroom, but also making it invisible when not in use. 

Expect a market shift and spike in further adoption (can it get steeper?) when Google Assistant is brought to Chrome Browser and home manufacturing has more widespread focus on embedding assistants as part of the edifice itself. 


If first party skills/actions are the most used, why do we need third party apps 


With the close relationship the assistant and smart speaker have with one another, it isn’t surprising that the most popular ways of using them were originally configured by the assistant developer themselves.  Setting a timer, alarm, controlling a device – these are native functions to the various assistants.  It makes perfect sense.  You can’t ship a voice assistant unless it does something.

I relate this to the beginning of the internet and websites.  You had this browser and there were limited things you could do with it.  There weren’t millions of sites, there were thousands.  We often had to rely on someone who had a website to communicate about the ‘thing’ we wanted to know about, versus a more trusted source or subject owner.

For giggles, here is Apple’s website in 1997.  (Kudos for their early adoption of content localization)


Amazon and Google don’t want to own every element of the voice experience. They know they are responsible for core functionality (analagous to the browser) but the nuanced element of niche function (where niche could mean millions versus billions of users) are left to others.  Right now the Google, Cortana, Amazon & Bixby ecosystems are filled with thousands of ‘apps’ (skills, actions, capsules) by hobby developers and global brands alike.

The assistants are helping solve the discoverability challenges (while Voicify is solving the creation, distribution and management challenges) and leaning on third party voice apps to bring richness and value to their platforms.

And yet, we remain in the beginning of the voice (r)evolution.  Consumers are telling us they want, like and will use voice assistants with their purchasing behaviors.

So, unless you trust the assistant to represent your brand, product, experience, reputation and communications on your behalf, third party apps are the way you reign in control.

Go ahead, ask an assistant a couple questions about your brand or the subjects your brand is reputable in.  Do the responses meet your expectations? Worse yet, did your competitor get mentioned? It’s time to act.  Voicify can help.



We love this stuff!!!

By 2020, 30% of all search queries will be conducted without a screen


Need support? Want to give feedback? Learn about the path we are blazing? Whatever it is, we’re here to help.