I initially came up with this idea years before returning to it and exploring it deeper. I was working at Connected and involved in a labs project when it became relevant to our work and I was reminded of it. In our research work, we were tasked with exploring the potential use cases for on-device ML (machine learning). Apple and Google had both released mobile SDKs for iOS and Android in order to support the development of on-device machine learning applications. We were looking for novel use-cases for mobile that these new SDKs (coreML and MLkit) might be able to unlock. It was also during this project that I worked on another experiment that use object recognition to support people with visual impairments, you can read about it here.
As a group, we brainstormed use cases that could leverage mobile experiences that could benefit from on-device ML. To help us evaluate our ideas I developed criteria that I had learned were important benefits of on-device ML.
Latency
When the ping of sending data for predictive analysis is too slow for good usability.
Privacy
Experiences that require more ambient or ongoing analysis of sensitive data, like ‘always listening/ seeing’ devices.
Bandwidth
Significant amounts of audio or video need to be analyzed using costly and capped mobile data connections.
Our work on this machine learning project had brought this old idea to the forefront of my mind and I realized how much it would benefit from on-device ML, for both speech recognition and natural language understanding. Since the idea was centred around ambient AI and search I had always imagined it as a Google product, and a feature of Google Assistant. I named this hypothetical product Google Campfire.
Introducing Campfire
Campfire helps to augment a conversation by providing additional context and information at the moment it’s relevant, without requiring an interruption and a Google search query. It does this by leveraging automatic speech recognition and natural language understanding.
Ambient and agentive search applications like Campfire actually meet all three of the evaluation criteria that I had developed for our ‘Pocket Pal’ project. Let’s consider it through these lenses here:
Latency
For people to be able to use Campfire to augment their conversation the ASR needs to be able to transcribe the conversation as close to real time as possible. Having the ASR on-device allows use to accomplish this and reduce cloud computing expenses.
Privacy
Although a campfire is initiated, it is easy to imagine it being completely ambient. The ongoing analysis of sensitive data, like ‘always listening/ seeing’ devices should strive to hold this data on the edge ie. on-device, and not in the cloud. This can help build confidence and trust with users.
Bandwidth
A Campfire needs to run ASR and NLU predictive models on a much longer basis than a typical one-off voice command. As such, the bandwidth costs for sending so much audio data would be prohibitive and create significant viability risks.
Project brief
In order to explore this use case, I submitted a brief and had our design and engineering team managers sign off on the project. I worked over the course of four weeks with a rotating team of three engineers to develop a functional prototype of the application that would help validate the use case for desirability and feasibility.
Proof of concept prototype
The first thing I wanted to do before we went further was get a proof of concept working in order to validate usability and feasibility before investing the time in optimizing our solution and exploring the ux/ui.
To do this we worked to develop an MVP–a minimal viable prototype. We just needed a small application that would listen, transcribe and use NLU to extract the relevant subject matter being discussed. In order to speed along this process, I asked the software engineer I was working with to use ASR and NLU services from AWS or GCP. I did a quick audit of the subject analysis and speech recognition offerings and recommended that we use GCP, and after review my teammate had the same assessment.
My teammate then raised an initial roadblock with me, the audio wasn’t streamed to GCP, it had and to be sent to the API as a contained package. Doing entity analysis on audio in near real time and on a continuous basis was not a feature built into the service, which makes sense given the context of a word within a sentence help to determine it’s importance. To overcome this I came up with a very simple solution that ended up working. We would measure the time between transcribed words and have the system register pauses of 1200ms as the end of an audio snippet and send the recorded audio to the service, after which it would start a new recording. This will continue until the Campfire session is paused or ended. This is similar to detecting a full-stop in the grammar of the transcription, although we refrained from making that assumption and instead indicated this cut-off we would create a line break in our transcription.
Proof of concept demo
A simple proof of concept shows the automatic snippet breaking and the returned entity analysis from GCP
Android development
At this point the software engineer I had been working with was allocated to a client engagement and I was partnered with two software engineers who were Android developers. We decided to move forward with a functional prototype on an Android mobile device.
Knowledge Graph
GCP’s entity analysis was doing a good job at giving us the main subject matter within our transcriptions, but now we needed to connect these entities with a knowledge graph that could provide content for the intended info cards.
I did a bit of research and figured out that we could query Google’s knowledge graph through their API. The Google Knowledge Graph is a knowledge base used by Google and its services to enhance its search engine's results with information gathered from a variety of sources. I also discovered we could also query Wikipedia’s knowledge graph as well. We combined both of these to help us surface topical subjects while in a Campfire session. In the future additional APIs could be integrated to give us a wider variety of sources.
Interface design
The UX flow was straightforward with a few buttons for session control, and a realtime transcription that quickly implied to the user that Campfire was listening and also serves as a quick note taking feature for later review and synthesis. The information would be communicated with cards that resemble Google’s knowledge panels found in Google Search. Knowledge panels are information boxes that appear on Google when you search for entities (people, places, organizations, things) that are in the Knowledge Graph. They are meant to help you get a quick snapshot of information on a topic based on Google’s understanding of available content on the web. I was aiming to resemble the same knowledge panels in a clean and minimalist interface.
I also made the decision while we were in development to have the matched entity be highlighted because I hypothesized that it would aid in review and synthesis.
Lastly, the GCP entity analysis service allows us to set a sensitivity to our subject matching salience between values of 0.0 and 1.0. We mapped this value to a settings slider while would allow for quick adjustments. You can see our first iteration of Campfire here with the sensitivity slider added. When it’s turned up our application will display more cards to the user.
After using Campfire I was thinking about the transcript it was generating, and how it might be useful as a note taking tool. The intention was to allow for review of these recordings and in our second week of iteration I gave some thought to how this interaction would work. When scrolling through either the presented cards, or the transcription itself, both columns will scroll in a synchronized fashion. We called this feature synchroscroll, you can see it in action in this video of our second iteration.
A metaverse campfire
After creating our functional demo of Campfire I decided to explore other potential use cases for this kind of ambient and agentive search. I created a mock-up of what this experience might be like for a collaborative brainstorming session in VR.
Additional applications
Additional applications could include a Zoom extension, or a chrome cast varriation allowing for a visual aid, increased accessibility, and a note-taking service.