An ML experiment that posits scenarios and use cases for voice assistants with vision. Using CoreML and on-device Speech-to-text, we built a proof of concept and tested it on an iPhone 8. This research as part of my practice time at Connected, and over 3 weeks I led of team of two engineers through our idea generation, evaluation design and the building of our minimal viable prototype.
Pocket Pal Concept
After some initial research opportunity framing and brainstorming, we identified the use case of object and scene recognition for people with visual impairments as a prime use case for on-device machine learning.
We mapped out some use cases for this visual assistant and created user journey storyboards to think about the interactions, contexts and user flows.
Functional Prototype development
To help us understand the capabilities of CoreML and validate the technical feasibility of our concept we built a functional prototype. We didn't have the chance to validate this concept with visually impaired users, so we remain unsure about how desirable or usable our solution was.
We sourced out existing trained vision models and converted them to run on devices, when we tested them for accuracy we found some of them were better at identifying objects in different domains, we came up with a simple solution by running four at the same time and choosing whichever had the highest confidence score. This significantly increased the accuracy of our results.
Initially we had our pocket pal assistant running all the time, narrating what was being detected. We quickly understood that this provided a poor user experience, and was also draining on the battery and caused the phone to heat up. We also wanted a more seamless and accessible interaction and didn't want the user to have to look or tap on the phone. We came up with a novel solution that triggers the application to take a photo whenever the phones accelerometer detects two taps on the back of the phone in quick succession. This allows are user to keep the phone in a shirt pocket without needing to hold it or use its screen.
Pocket Pal demo
Google announces Lookout
About a year after we had worked on this short research project Google announces Lookout, an app designed to help people who are blind or visually impaired gain more independence by giving real-time spoken cues about objects, text and people around them. It's pretty amazing how close the Lookout app resembles PocketPal. Even the double tap on the back of the device to invoke the assistant is there.