How does Alexa work? The tech behind Amazon’s virtual assistant, explained

How does Alexa work? The tech behind Amazon’s virtual assistant, explained

Amazon Echo Dot Alexa speaker with light ring turned on stock photo 1

Edgar Cervantes / Android Authority

We’ve got quite a few guides to using Amazon Alexa on Android Authority, but you may be curious about the voice assistant’s underlying technology. Here’s a brief explanation of how Alexa works, from its overall structure to how it hears and responds to voice commands.

How Alexa works: An overview

The Amazon Echo Show 15 on a wall.

The base components of Alexa, from a user perspective, are an Amazon account and an Alexa-enabled device connected to the internet, usually a smart speaker or display. The account lets you build a profile, save software and hardware settings, and link compatible devices, services, and accessories. Alexa devices listen for voice commands, upload them to Amazon servers for translation, then deliver results in the form of audio or video. Some models also serve as Thread or Zigbee hubs for compatible smart home products.

All voice commands begin with a wake word that tells a device to listen. The default of course is “Alexa,” but using the assistant’s app for Android or iPhone/iPad, you can change this to “Amazon,” “Computer,” “Echo,” or (in some regions) “Ziggy.” In fact the app is effectively a third base component, since it’s needed for device setup and linking things to your Amazon account.

There are many, many possible Alexa commands, so we won’t dive too far here, but these are natural-language voice requests covering everything from general knowledge questions through media playback and smart home control. For instance:

Some functions require enabling “skills,” whether through Amazon’s website or the Alexa app. Using the commands above as examples, the middle one wouldn’t work without a skill linking your Spotify account, and thermostat control would require an appropriate brand skill such as Ecobee or Nest.

The Alexa app also enables routines, which is just another word for automations. You can learn more about them in our routines guide. The short version is that they’re user-created, and trigger actions based on voice commands or various conditions, such as location, accessory status, or the time of day. A Good Morning routine for example might turn on your lights, play NPR news, and warm up your coffee maker via a smart plug when you say “Alexa, start my day.”

To be controlled by Alexa, smart home accessories must specifically support the platform, at least until the universal Matter standard goes live in fall 2022. Just about any kind of accessory type is available, though. Aside from plugs, thermostats, and smart bulbs, you can get everything from air purifiers to robot vacuums. These are paired using the Alexa app, regardless of whether they connect via skills, Thread, or Zigbee.

More: How to use Amazon Alexa

How does Alexa hear?

Amazon Echo Show 8 side profile with smarthome controls

Dhruv Bhutani / Android Authority

While all Alexa-equipped devices have at least one microphone, it’s more often two or more on smart speakers and displays. This makes it easier to isolate voices from ambient noise, since it creates directional data that can be compared and filtered through signal processing algorithms. There are finite limits of course — you can’t stand next to a loud TV or dishwasher and expect an Echo speaker to understand.

Contrary to what you may have been told, Alexa isn’t constantly recording everything you say. It is continually listening for its wake word, and subsequent audio (ending after you stop talking) is normally sent to Amazon for interpretation. We say normally because Amazon is increasingly pushing towards offline processing. You need recent devices like the 4th gen Echo or Echo Show 10, however, which have the company’s AZ Neural Edge processor. The feature must also be enabled manually, and devices will still upload transcripts.

Amazon says it encrypts uploaded audio recordings, but saves them by default and analyzes “an extremely small sample” of anonymized clips to improve Alexa’s performance. Recordings have been used in criminal cases, and some sounds or phrases can be misinterpreted as wake words — so if you’re concerned about privacy, you’ll want to opt out of saving or regularly delete your voice history. Read our smart home privacy guide for more details and comparisons.

See also: How to set up Alexa for emergencies

How does Alexa respond?

A 4th gen Amazon Echo Dot in 2020

The reason Alexa has been utterly dependent on the cloud until recently is the demands of natural language processing. Each command is broken down into individual speech units called phonemes, and those units are then compared with a database to find the closest word matches. On top of that software has to identify sentence structure, as well as terms relevant to different subsystems. If you say “set the thermostat to cool,” Alexa knows to forward that to a smart home API (application programming interface).

This is the main reason Alexa can distinguish between accents and dialects. There are unique databases for each language Amazon supports, including regional variations, and users need to select them in the Alexa app if their device doesn’t ship with them preloaded. An American Echo speaker won’t understand German out of the box, as anyone who’s asked for songs by Nachtmahr can attest.

Machine learning plays an equally critical role, since context and history gives Alexa a better shot at guessing your intentions. It’s why Amazon is so invested in analyzing recordings from real-world customers. Humans tend to use context and history to gauge meaning in conversation, and with only computer logic, Alexa might interpret something like “play music by Chvrches” (the Scottish synthpop band) as a request to hear music by church choirs. Alexa can and does make mistakes, but the seas of data Amazon has available means that the assistant evolves over time.

Frequently asked questions

Effectively. While some devices may allow offline voice control of volume and hub-linked smart home accessories, or checking and canceling things like timers and reminders, just about everything else requires communicating with Amazon servers and/or linked third-party services. Even devices that can process audio locally are still uploading transcripts.

Yes, assuming you haven’t muted a device’s microphone(s). It needs to in order to react to its wake word.

Crucially though, it’s not recording everything. Recording is only triggered after a wake word is detected, and ends once you stop talking (or Alexa thinks you have, anyway). If you’re worried about privacy, you’ll need to opt out of these recordings being saved or regularly delete voice history.

According to some definitions. It’s capable of learning and problem solving, for instance interpreting voice commands it hasn’t been pre-programmed for.

That said, it doesn’t display the same flexibility or adaptability as a human or animal mind. You can’t have a genuine conversation, and its learning happens incrementally rather than on the fly. It’s certainly nowhere near sentient, no matter how difficult that might be to define.

Leave a Reply