Member-only story
The 3 UX quantum leaps that smart speakers need to take

In one of my recent articles, I listed five flaws that both Amazon Alexa and Google Assistant suffer from that I consider nothing short of scandalous. I consider the flaws to be scandalous because: (a) They negatively impact the user experience of smart speakers in significant ways and come in the way of delivering delight and enabling a much wider adoption of voice assistants, and (b) These flaws have existed for almost seven years now, with neither Amazon nor Google sending any signals that they are taking them seriously, let alone working to solve them.
To provide a concrete vision that mere words on paper cannot do justice to, I have put together a video to specifically illustrate three of the five scandals — the three most scandalous of the scandals.
Here is the video.
Now, if you are not a regular user of the Amazon Echo, or someone who owns an Echo but doesn’t really use it, the video will seem to you to be utterly unremarkable. “So?” is probably what your reaction would be, “What’s the big deal?” Which is actually the right reaction to the video. Because, what the video shows is how a human being should obviously interact with a smart speaker.
However, if you are a regular user of the Echo (or any other smart speaker for that matter), you will notice that the video shows a human being able to do three things that they currently cannot do with any of the deployed smart speakers.
Namely:
(1) They can talk for a long time (more than ten seconds or so at a time) without the smart speaker unilaterally taking the turn back (as in: “Ok, Human — that’s enough talking!”).
(2) They can speak haltingly, pause for a couple of seconds here and there, and stumble over their words, without the smart speaker taking the turn back (as in: “Ok, Human — I don’t have time for your dithering”).
(3) They don’t need to speak the wake word every time they want to get the attention of the smart speaker. If you…