Up until recently, voice recognition technology progressed slowly with a series of what seemed to be false starts, unable to accelerate beyond the Valley. Though the brains behind the voices of Siri, Alexa, and others may still be less intelligent than the average fifth grader, their rapidly-improving software is more capable than ever of supporting a meaningful rapport between humans and machines.
Innovative new voice control software packs a punch that is driving massive changes in the way we interact with the Internet, other “smart” devices and systems in our lives, even one another. Intelligent chat assistants that can understand natural language are now poised to finally deliver on the promise of years of previous research. By drawing on “lessons” learned from within deep troves neural network data, the intent behind any human verbalization becomes clearer and semantic understanding increases. Many prominent tech firms are racing to develop the “best” voice control software, but there’s more to understanding human speech than simply feeding input into an algorithm that then produces plain text. Everyone has his or her own individual speech characteristics, and heavy accents and slang expressions pose special difficulties.
For instance, VocalZoom is designing a solution that will employ optical optical sensors while conversations are taking place. The optical sensor extracts the voice from the facial vibration of the person when he/she speaks. By combining audio input with data from vibrations of the mouth, VocalZoom has already proven that its sensor can achieve high fidelity results among many speech recognition leaders.
Apple is pushing state-of-the-art boundaries in voice recognition by way of Siri, the virtual assistant. “Her” first incarnation left much to be desired, but successive refinements have increased Siri’s ability to accurately process and fulfill verbal requests. Now users are adding the functionality of Siri to their smart home systems with Apple’s HomeKit infrastructure.
The producers of the iPhone don’t by any means have a monopoly on this market. Amazon’s Echo is equipped with the Alexa virtual assistant, and it can order groceries, adjust lighting levels, search the internet and more when given spoken commands. Google has also thrown its hat into the ring with Google Home, which was announced in April. As millions of consumers eagerly chat with their new home assistants, voice data is sent back to developers, who then use it to adjust and enhance performance for the next generation of hardware and software.
Car makers are installing speech recognition units to their automobiles so that drivers can ask for directions, control the climate and perform similar tasks without diverting their attention from the road. Existing automotive voice platforms can be frustrating to use because they often misunderstand what the speaker is saying and sometimes pause for a few seconds before responding. SoundHound has been working on software that eliminates these problems and can correctly parse complex or strangely worded driver requests.
Banks and other financial institutions are incorporating voice identification features into their security precautions. Requiring account holders to verify their identities through voice recognition serves as essentially a form of two-factor authentication in combination with more traditional passwords. This is a type of biometrics, i.e., a way of using difficult-to-counterfeit biological information to make sure that people attempting to log into the system are actually who they say they are. Other biometric frameworks use retinal scans, fingerprints and even typing rhythm to figure out who’s who, but it’s difficult to argue with the simplicity of sophisticated voice control.
Yesterday’s keyboards, mice, touch pads, levers, knobs and dials are beginning to seem like steampunk relics in their absurdity when compared to the concept of speaking aloud to an “intelligent” computer. Indeed, the variety of interfaces we now use with our machines may come to be regarded as quaint and archaic in a future powered by voice. We’re getting a taste of such a world right now, and we can expect voice recognition technology to become more commonplace with each passing year.