Amazon.com Inc. employs thousands of people around the world to help improve the Alexa digital assistant powering its line of Echo speakers. The team listens to voice recordings captured in Echo owners’ homes and offices. The recordings are transcribed, annotated and then fed back into the software as part of an effort to eliminate gaps in Alexa’s understanding of human speech and help it better respond to commands.
The Alexa voice review process, described by seven people who have worked on the program, highlights the often-overlooked human role in training software algorithms. In marketing materials Amazon says Alexa “lives in the cloud and is always getting smarter.” But like many software tools built to learn from experience, humans are doing some of the teaching.
The team comprises a mix of contractors and full-time Amazon employees who work in outposts from Boston to Costa Rica, India and Romania, according to the people, who signed nondisclosure agreements barring them from speaking publicly about the program. They work nine hours a day, with each reviewer parsing as many as 1,000 audio clips per shift, according to two workers based at Amazon’s Bucharest office, which takes up the top three floors of the Globalworth building in the Romanian capital’s up-and-coming Pipera district. The modern facility stands out amid the crumbling infrastructure and bears no exterior sign advertising Amazon’s presence.
Sometimes they hear recordings they find upsetting, or possibly criminal. Two of the workers said they picked up what they believe was a sexual assault. When something like that happens, they may share the experience in the internal chat room as a way of relieving stress. Amazon says it has procedures in place for workers to follow when they hear something distressing, but two Romania-based employees said that, after requesting guidance for such cases, they were told it wasn’t Amazon’s job to interfere.
“We take the security and privacy of our customers’ personal information seriously,” an Amazon spokesman said in an emailed statement. “We only annotate an extremely small sample of Alexa voice recordings in order [to] improve the customer experience.
In Alexa's privacy settings, the company gives users the option of disabling the use of their voice recordings for the development of new features. A screenshot reviewed by Bloomberg shows that the recordings sent to the Alexa auditors don’t provide a user’s full name and address but are associated with an account number, as well as the user’s first name and the device’s serial number.
The Intercept reported earlier this year that employees of Amazon-owned Ring manually identify vehicles and people in videos captured by the company’s doorbell cameras, an effort to better train the software to do that work itself.
A recent Amazon job posting, seeking a quality assurance manager for Alexa Data Services in Bucharest, describes the role humans play: “Every day she [Alexa] listens to thousands of people talking to her about different topics and different languages, and she needs our help to make sense of it all.” The want ad continues: “This is big data handling like you’ve never seen it. We’re creating, labeling, curating and analyzing vast quantities of speech on a daily basis.”
Some Alexa reviewers are tasked with transcribing users’ commands, comparing the recordings to Alexa's automated transcript, say, or annotating the interaction between user and machine. What did the person ask? Did Alexa provide an effective response?
Others note everything the speaker picks up, including background conversations—even when children are speaking. Sometimes listeners hear users discussing private details such as names or bank details; in such cases, they’re supposed to tick a dialog box denoting “critical data.” They then move on to the next audio file.
According to Amazon’s website, no audio is stored unless Echo detects the wake word or is activated by pressing a button. But sometimes Alexa appears to begin recording without any prompt at all, and the audio files start with a blaring television or unintelligible noise. Whether or not the activation is mistaken, the reviewers are required to transcribe it. One of the people said the auditors each transcribe as many as 100 recordings a day when Alexa receives no wake command or is triggered by accident.