AI voice assistants promise to revolutionize human-machine interaction, but their performance falls short of expectations
Gu Jie (pseudonym) wakes up each working day to the sound of Xiao Ai’s voice. While she gets dressed, Xiao Ai reports the weather forecast and latest news for her. When Gu returns home after work, Xiao Ai greets her at the door: “Welcome home! Xiao Ai has been waiting for you,” or “You must have brought some good stuff home again—remember to leave some for Xiao Ai!”
But Xiao Ai is not one of Gu’s family members. “She” is the voice assistant associated with the Mi AI Speaker made by Xiaomi, one of China’s top internet companies.
“I bought the speaker mainly to listen to music and try voice control on my two smart plugs,” Gu, a 38-year-old engineer from Shanghai, recalls to TWOC. Since she purchased the speaker three years ago, Gu has become so enamored with having a voice assistant that, after purchasing her own apartment in early 2019, she couldn’t stop buying “smart” home devices.
Today, Gu’s apartment is filled with smart devices ranging from ceiling lights to air purifiers, TVs to air-conditioners. Most were manufactured by Xiaomi, and all can be controlled by giving Xiao Ai voice commands. “After having experienced the convenience, life cannot return to the days without smart speakers,” Gu concludes.
Over the last few years, voice-based smart speakers have grown increasingly popular among tech-savvy consumers like Gu. China’s leading technology companies are engaged in fierce competition for market share, and have promoted the gadgets as a “gateway” to the coming “Internet of Things” era, when people, devices, and systems can all be connected over the internet.
According to market consultancy All View Cloud, 15.56 million units of smart speakers were sold in China in the first half of 2019, up 233 percent from the previous year. Sales revenue totaled 3.01 billion RMB. China has overtaken the US to become the world’s largest market for smart speakers.
However, it remains to be seen whether the gadgets will grow to be an indispensable device for Chinese, like smart phones, or become a command hub for Chinese families as leading tech players expect.
The international smart speaker market emerged in 2014 with the launch of Amazon’s Echo, powered by the voice assistant Alexa, followed by Google Home in 2016. Apple, Microsoft, and Facebook have all made their own devices. Chinese startups such as Xiaozhi and Rokid had also been working on this sector since 2014, and Linglong Tech, a joint venture by China’s e-commerce giant JD and leading AI company iFlytek, released China’s first smart speaker brand DingDong in August 2015.
The domestic market, though, only started taking off in the second half of 2017, when leading Chinese technology companies, including Alibaba, Xiaomi, and Baidu, rolled out their own products. Announced in July 2017 at 299 RMB apiece, Xiaomi’s Mi AI Speaker sold out all its stock within 23 seconds upon its official release that September. On the “Singles’ Day” shopping festival on November 11, 2017, sales of Alibaba’s Tmall Genie, offered at a heavily discounted price of 99 RMB instead of the original 499 RMB, hit over 1 million units.
Competitors followed by with discounts of their own; Xiaomi’s Mi AI Speaker Mini and Baidu’s Xiaodu smart speaker are all priced at under 100 RMB. But while the ongoing price war drives the explosive growth of the business in China, the products themselves still suffer from technical issues.
Mr. Shi, a 36-year-old owner of a Tmall Genie in Zhengzhou, Henan province, who wanted to be identified by his surname only, derisively calls smart speakers “chicken ribs”—an object of little value or interest, except as a sound system and clock. “We all know that smart speakers and other smart devices have nothing to do with intelligence, nor can they truly understand people and their needs,” he says. “[It’s] like a toy one can play with sometimes.”
Using smart speakers to control home appliances requires the speaker to be compatible with appliances of contracted third parties, but Shi says even with such appliances, the control operation is “embarrassing,” because the smart speaker can only help users turn the devices on or off, but offers no greater operability.
“For instance, to start my rice cooker, I have to select the cooking mode and set the time to start manually, and then command my Tmall Genie to turn it on,” Shi explains. Even Gu, who uses a smart speaker and other smart home devices manufactured by Xiaomi, only turns her Mi TV on or off with her smart speaker, but does not use it to change channels or browse shows due to complicated setting required.
As the majority of Chinese users use smart speakers as a sound system, limited access to copyrighted songs has disappointed many and discouraged potential users. The gadgets’ voice recognition performance is another target of criticism: Beijing IT engineer Cheng Shidong tells TWOC that his Mi AI Speaker often starts on its own abruptly when he is on the phone and in conversation with his wife. Gu’s father can sometimes activate Xiao Ai by calling it Xiao Ben (“Little Idiot”) or simply Xiao Xiao instead of using its “wake” word; meanwhile, it’s hard to activate the smart speakers in noisy surroundings, such as when many people are talking or when the TV is on.
Moreover, smart speakers’ lack of dialect recognition has limited access for many people who do not speak Mandarin well, especially seniors in rural areas. “My parents often have to repeat themselves several times to wake Xiao Ai or get it understand, because they mainly speak Hunanese, and this really frustrates them,” Cheng’s wife Zhou Li complains. Currently, Tmall Genie only supports Sichuanese besides Mandarin, while Xiao Ai has seven northern Mandarin dialects similar to standard Putonghua.
Privacy and information security is another major concern. In 2019, Amazon was reported to have employees monitor and analyze recordings of user interactions with Alexa to improve the voice assistant’s speech understanding capability. “[Amazon’s practice] was not a secret in the industry,” an insider of the smart speaker industry, identified by the pseudonym Zhang Sicheng, told Chinese tech news outlet All Weather TMT in August 2019.
According to Zhang, however, less than 1 percent of user-device interactions are heard by a human, and these are mainly difficult questions that the system cannot answer. These may be sent to the human employees to be analyzed, but as the technology improves, this will happen less and less.
Yet, although Zhang asserted smart speakers are “too stupid” to extract sensitive information, and it is too costly for company to do so, he did not deny the privacy risks. “In the IoT and AI era, we have no privacy and no place to hide,” he said. “Even without smart speakers, companies have mastered people’s personal information, hobbies, and other information via phones and computers.”
Mr. Wang, the 32-year-old owner of a Xiaodu smart display in Shenzhen, Guangdong province, who wished to be identified by his surname only, is more concerned with the security of his visual and biometric information than his voice interactions. Integrated with a touch screen and camera, Xiaodu’s smart display can start automatically, scan his face, and report weather forecasts each time he walks by. Through his phone, it can help him check up on his home, including on his 3-year-old child, when he is away. “Those [services] are considerate, but I’m worried whether the camera would start on its own and record my family,” Wang says.
Zhang told reporters he has already taken precautionary measures: to keep camera-less smart speakers, use them only in the living room, and “never buy smart speakers or other devices with cameras or place them in the bedroom.”
But die-hard fans like Gu are unperturbed. “I don’t have any privacy to lose,” she asserts. “Smart speakers will be indispensable for smart homes of the future.”
Additional reporting by Yang Tingting (杨婷婷)