Voice assistant

How does Alexa, Siri and Co. work?

In more and more households, a smart voice assistant from Amazon, Google, Apple and others accompanies the occupant(s) through the day, completes tasks, plays favourite music on request or switches the light on and off when called. It also translates various foreign words on request, helps us find the right recipe or informs us about the expected traffic on the way to work. But have you ever wondered how the technology behind it works?

Why do I need an internet connection?

In order for Alexa and Co. can execute their voice commands, the device must be connected to the internet - usually via WLAN. The reason for this is that the commands are not processed in the voice assistant itself, but on the servers of Google, Amazon and others.
Without an active internet connection, your smart speaker is at most a reasonably good-sounding Bluetooth speaker, but it already fails to skip tracks by voice command or to play your favourite music on demand. If it bothers you when Alexa is permanently online, you'd better leave it alone.

How does a voice assistant work in detail?

The most popular voice assistant, Amazon's Echo, has seven microphones built in under the bonnet. In the true sense of the word, the Echo is nothing more than a speaker with built-in microphones. These are also immensely important for its functionality, because they record the words you speak, filter out any ambient noise as best as possible and then upload them to the cloud at lightning speed.

This is because Alexa's brain is located online and not in the Echo itself. The spoken words are recognised on Amazon's servers and converted into text form. A service called "Alexa Voice Services" analyses the command and reacts to certain keywords. In the command "Alexa, do I need an umbrella tomorrow?" these are the words "tomorrow" and "umbrella". These terms are forwarded to the respective apps and services so that you receive an answer in the shortest possible time. So that the time between command and answer is usually only a few tenths of a second, all the cogs must mesh and everything must happen at lightning speed. Because this is the only way to make using a voice assistant an experience. Another positive aspect is the fact that Alexa is able to learn.