Hearables and the real on-the-go Assistant we’ve been waiting for

Hearables and the real on-the-go Assistant we’ve been waiting for

For many of us, our voice assistants have earned a well-established place in our lives. In our homes, it’s in the form of smart speakers like the Google Home and Amazon Echo. In our cars, it’s through automotive voice assistants and platforms like Car Play and Android Auto, and now — in our ears, through a new category of devices called Hearables. Hearables, are personal audio devices designed for a range of different entertainment, fitness informational and communication purposes. They’re becoming increasingly technologically advanced and represent the latest manifestation of the concept of ubiquitous computing.


The tiny wireless in-ear headphones have quickly become standard fare in major urban centres around the world and a major new revenue stream for companies like Apple. The category spans beyond all in-ear headphones as well, with new formats and devices like the Bose Frames — a wireless pair of connected sunglasses from Bose.

 Bose Frames
Bose Frames

The Frames act as your personal sound system, allowing you to listen to music as well as invoke and interact with your voice assistant of choice, all while your ears and hearing remain open to the world around you. This is done with highly directional sound drivers that are narrowly focused on the wearer's ear. With the volume set to a moderate level, they remain discrete and difficult for someone nearby to detect.


Although the Frames have recently been discontinued by Bose they are important to discuss because they’ve introduced the world to a novel technology that Bose called Bose AR — the tech behind which may have big implications for the future of voice assistants and hearables alike.

Bose AR is a form of audio augmented reality. While the visual Augmented Reality we’re generally familiar with is overlaying our world with georeferenced visual graphics and information, Audio AR overlays our world with georeferenced spatial audio and similarily can be used to provide information; communication, and entertainment. Imagine a voice assistant understands the context of where you are, and what you’re looking towards and can render audio spatially to provide feedback and information about your world.

To accomplish audio AR two things need to take place, the first is both geospatial data (GPS coordinates) and head position data need to be combined to get the location and positional context, and secondly, the audio (voice assistants, entertainment, earcons, etc.) needs to be rendered spatially for the user. This is done through the use of slight temporal delays that trick our brain into perceiving the sound on a 360-degree soundstage. With head tracking positional data, this sound can be rendered in real-time and hold positional context as the user moves through space.

Unfortunately, the developer support from Bose for Bose AR has ended with the discontinuation of the Frames device. This however is not the last we’ll hear of AR audio, for only a month after it’s cancellation Apple has announced a new feature for their Airpod Pro headphones, spatial audio — amazingly, it’s been added as a feature through an over-the-air firmware update. We’re beginning to see that unlike typical headphones, hearables can evolve over time with system updates.


While Apple isn’t making the direct comparison to Bose AR or calling this technology Audio AR, the similarity in their ambition and the implications of the technology are clear.

When Bose introduced the world to their new development platform at SXSW in 2019 they had a demo where festival-goers could wear a pair of Frames while walking down the main strip. The Bose AR technology allowed them to discover restaurants and bars as well as their reviews and ratings, just by looking at them and tapping on the side of the Frames.

From this demo, it’s immediately evident that while the ability to render audio spatially is impressive, what maybe even more important as far as it’s implications to product developers is the addition of head position to a voice assistant’s contextual awareness. This additional context opens up a wide array of potential new use cases for voice assistants; imagine a digital tour guide giving you a contextually aware tour of a foreign city in your native language, or a botany assistant answering your questions about the plants, flowers, and trees you encounter as your explore a nature reserve.

Now that Apple has brought spatial audio to their Airpod line they might begin to consider future Siri capabilities that could leverage the additional context of head position and directional gaze. Will Siri be able to direct our attention with spatial audio cues? Imagine navigating through an airport by following the sound of a church bell. And how long until Google answers Apple and their Pixel Buds receive a similar update? They have promised a ‘feature drop’ with an over-the-air update to their Pixel Buds in the fall, could they similarly add spatial audio support?

The additional context of geospatial and positional data as well as the ability to render audio spatially is an untapped but powerful combination to Voice product developers. Up until this point, it’s been difficult to get location data on the Google Actions or Amazon Skills platforms, and rendering audio spatially hasn’t been possible before the sensor laden hearables products we’re seeing take over the headphone market. We can begin to see both of these technologies converging though, and as we get accustomed to a world overlayed with digital information we’ll likely be introduced to these contextual geospatial experiences through the voice assistants Siri, Google Assistant and Alexa and later through extensions and applications within those ecosystems. Voice product developers should begin to anticipate the adoption of Audio AR (spatial audio and location + gaze contextual awareness) by major voice assistant platforms in the near future. To get ahead it’s a great idea to look at what can be built today, what tools are already available, and how they can begin to explore the use cases that this additional data will soon unlock.

Toolkit to help you begin building for Hearables

The majority of hearables on the market today can access the existing VPA (virtual personal assistant) and the development platforms for Google Actions and Alexa Skills. Google and Amazon both have existing location APIs. There are also some great resources from Google and Mozilla to help you begin exploring and experimenting with spatial audio today.

Google Assistant Actions

For the developers of Google Actions (Google’s platform for voice products), the location API can provide a user’s longitude and latitude, currently, Google requires the voice assistant to make an explicit request to the user for permission or mark your action as requiring location.Alexa Skills

Alexa Skills

Amazon has location services with a requirement for similar user permissions and constraints. They are a bit farther along in allowing for universal location access when permissions have been granted but with significant protections for privacy.

Google Resonance SDK

Google’s opensource library Resonance Audio enables the real-time rendering of spatial audio to popular gaming development platforms as well as iOS, Android, and Web.

Mozilla Web Audio API

The Web Audio API provides a powerful and versatile system for controlling audio on the Web, allowing developers to choose audio sources, add effects to audio, create audio visualizations, apply spatial effects (such as panning) and much more.