Exploring the in-car voice experience  (pt 1)

Exploring the in-car voice experience (pt 1)

A global leader in audio electronics created a Bluetooth speaker with voice assistant capability. They approached us to explore and validate how their new technology might live (and win) within the auto-tech space. This was a formative project for me as it was the first project that I lead as a UX designer and also the first time I utilized the value proposition framework.I worked with a product manager and occasionally with a second designer for a total of 6 months over two phases of work, in the last phase of work two engineers were added to help support our prototype and MVP development. I’ve documented it in two separate case studies. In this first phase, we worked to understand our customer and their needs, identify opportunities and define a value proposition through iterative prototyping and refined through hypothesis-driven validation.

Project Kickoff

We were told that some of what our project was trying to solve, others within the organization might already be working on. Therefore, we spoke with 12 stakeholders in different departments in the days after the kickoff. This provides us with a deeper dive into what are some of the parallel initiatives and differing perspectives on the work. It gave us a clear ground of what to focus on and what we could collaborate together with.

Benchmarking

To help us better understand the competitive landscape of aftermarket HMI we did some benchmarking. We purchased the JBL Trip, Logi Zero Touch and Navdy. We took these products and recorded the onboarding flows, and I spend a day driving around with one of my colleagues while using these products.

We also tested software-only solutions like Google Maps, Waze, and Android Auto. After testing each one we reported on their strengths and weaknesses.

image
image
image

Initial user interviews

During our preliminary user interviews, we're trying to get a better understanding of current user needs and behaviours for drivers. We interviewed six participants and established the following insights.

  1. Users see AndroidAuto & Carplay as an extension of their phone to gain better access to their music.
  2. Music/Podcast/audiobooks. Entertainment takes priority over productivity.
  3. Built-In Radio. Lean back listening makes discovery easier and provides variety.
  1. Having multiple VPA was well understood. Each has a strength and purpose.
  2. Voice to text for communication is desirable for drivers’ safety.
  3. Voice interface problems. Conversational interfaces need to make things easier, not harder.
  4. Proactive interactions are controversial. However, users responded positively when given control over what proactive notifications they would receive.
  5. Users have a specific preference for what their ideal voice experience should look like.

Internal ideation

We recruited our colleagues for a structured ideation workshop to help us generate a high quantity of concepts within the opportunity areas we had identified.

internal ideation workshop
internal ideation workshop

Prototype development

I knew right away that Simili, an internally developed voice prototyping tool that I had worked on, was the right tool to use for our purposes in this project.

It's a very flexible voice design and testing platform that leverages wizard-of-oz testing methodology to allow for rapid and iterative prototyping. You can read more about our development of this tool here.

We developed 5 of our voice assistant concepts into prototypes for validation testing:

  • Core music control
  • Proactive notifications (short journey)
  • Proactive notifications (long journey)
  • Routine (morning commute)
  • Routine (travel guide)

Simili - internally developed voice prototyping tool
Simili - internally developed voice prototyping tool
transcripts generated by Simili of user testing
transcripts generated by Simili of user testing
image

Validation Testing

We had a small budget for building a compelling setup for our user testing. We thought through a couple of options such as field research, however for ethical, safety and insurance reasons we decided against testing in the car.

Conducting successful user testing was a big challenge, mainly because we haven't used a driving simulator before. Although we weren't testing in an actual car, we still wanted to achieve a level of experiential fidelity and simulate some aspects of the cognitive load required to drive.

We tested and interviewed 11 participants in 60-minute sessions and we were able to gather a lot of insights into their various jobs, pains and gains as drivers.

We brought them through several scenarios driving through the virtual city while using a VPA (virtual personal assistant). We were only testing a voice-only interface in this iteration which was controlled with our internally developed CUI design tool, Simili.

demand validation testing
demand validation testing
driving simulator
driving simulator

Synthesis

We observed a limited set of things people wanted to be able to do in the car. They can be summarized in the following five jobs:

  • Communication
  • Control (music, media etc.)
  • Navigation
  • Staying informed
  • All while being kept safe

The job of safety cuts across all other jobs and informs how they are delivered.

At the same time, within that small set of jobs, there exists a multitude of different paths each participant expected the concept to take when delivering against them.

No two participants were alike, except, in one key area: voice in the car is predominantly a utility.

There is a baseline functionality that today’s market has struggled to meet. When the prototypes veered into more personal or colourful interactions the participants were less enthused than when it was more utilitarian.

results from card sort tabulation indicating job prioritization
results from card sort tabulation indicating job prioritization

Card Sort Exercise

We asked our participants to sort 12 potential features and rank them from Least to Most Interested.

💡
Top Ranked Features 1. Traffic Updates 2. Turn by Turn Navigation 3. Weather Updates 4. Low on Gas 5. Contextual Routine

Key Insights

Our initial interviews and user testing led us to some great insights and helped to both validate and invalidate some of our early hypotheses. These insights, in particular, helped guide our design decisions going forward. Here are just a few of the insights that we took away from this stage in our user research. If you’d like to take a deeper look you can find the rest of our key insights here:

researchinsights_ph1pt1.pdf54.8KB

Design principles

Having learned so much in our initial voice-first testing we wanted to integrate our new insights into a new prototype that we could continue testing and learning from. We wanted to further validate our assumptions and hypothesis through the further iterative testing of multimodal, gesture-based interfaces, that work in concert with a voice assistant, to control music playback, navigation and communication.

We used the following design principles to help guide the design of our interface:

Contextually Aware

Be sensitive to when information is presented so as not to contribute to the conflict drivers experience when confronted with the choice between paying attention to the road or to the notification.

Deep Personalization

Drivers are willing to defer some of the control they enjoy outside the car, but how much, when and where is different for everyone.

Elegant Presentation

The interface does not make assumptions, does not intrude unnecessarily, and ultimately communicates in a clear and predictable manner.

Effective Aggregation

In order to keep drivers from switching back to radio, in-car content needs to be comprehensive.

Hybrid Interface

A multimodal interface with glanceable information, that is complimentary to voice.

Ideation

We had learned a lot in our first experiments and we went to the drawing board and started ideating on what a multimodal experience could look like. After a lot of ideation and sketching, we developed several hybrid interfaces prototypes that are users could interact with through both touch and voice, each of these was designed to help us validate our hypothesis and define our value proposition. I used the sketching below in our concept development to communicate some of my stronger ideas for solutions that were grounded in our design principles and the needs of drivers.

image

Value Proposition Design

I used a large format laser printer to give myself a physical version of our value proposition canvas for this project, it was extremely useful to reference and update our value mapping as we learned through our research activities.

We used this framework to help us identify critical assumptions that our value proposition was making so we could work to validate them through experimentation and testing.

image

Hypothesis Driven Validation

There were many hypotheses and assumptions that we really wanted to validate in our testing, but these two were of high interest to us as they were core to our value proposition. These were prioritized in our test design process to ensure we were maximizing our learning and de-risking our assumptions.

We believe that drivers want information that is presented in a, helpful and safe way, appropriate to the context of the moment.
We believe that multimodal hybrid voice, touch, and visual interfaces are preferable to voice alone, allowing drivers to modality switch as needed.

Progress tracking

All of our hypotheses were then prioritized by evaluating them by how critical they were, in addition to how confident or uncertain we were of their veracity. We tracked them relative to the progress of our prototypes and additional tests to ensure what we were making would help us prove these hypotheses

image

An Intuitive Gesture Language

Tap targets on-screen interfaces are too small to interact with safety while driving. For our prototypes we introduced a gesture language that allowed for the whole screen surface to be used for input, without the need to look and tap on small tap targets, ensuring broad gestures would also register successfully.

image

Prototypes

To help us validate our hypothesis around multimodality, gesture, aggregation, and contextual awareness we designed these five prototypes that we could put in front of users and have them use.

Seek & Scan

Swipe between multiple sources of content

Participants were presented with a phone mounted in front of them and given the option to use voice and gestures to access their content. Swiping up and down switches between sources. Swiping left and right switches between pieces of content.

This navigation pattern is inspired by the seek and scan functionality found in car radios.

image

Routines

Passive consumption of customized content

Personalized streams that automatically cycle through a set of fresh, pre-populated content with the additional ability to skip through with a simple swipe, or redirect with voice.

image

Proactive Prompts

Push notifications at moments of change

While driving, participants were presented with notifications during deltas (major changes: red lights, arrivals, end of a podcast). Participants would receive phone calls, traffic updates, messages and content recommendations at these moments.

They could use voice or tap on the screen to respond.

image

Gesture Only

Gesture control without the screen.

We asked participants to imagine that this cardboard box was a portable speaker and mic with a touch-capacitive surface. They could interact with it using voice and gestures.

Participants started their morning routine at home with this device and continued using it in the car.

image

Wizard of Oz setup

For the driver's setup, we purchased driving simulation hardware including foot pedals and a force feedback steering wheel. We also used a driving simulator software called City Driving Simulator. I set up an iPhone set up on dash mount for our drivers to help us test our multimodal experiences. All 3 of us acted as WoZ (Wizard of Oz) operators of the prototypes with myself controlling audio feedback using both Simili and a soundboard. One of my partners was controlling the visuals with InVision and our earcons (sound UI) with a drum machine, and another was controlling audio content via Spotify, TuneIn Radio, Youtube and Apple Podcasts.

participant setup
participant setup
participant setup
participant setup

Detecting gesture input

The gestures we wanted to test were not supported in any no-code / low-code prototyping software and we did not have engineers on our team. However, in our test design, we came up with a hack solution.

operator WoZ setup
operator WoZ setup

We used an overhead camera to record and transmit the video to the operators. This was used by one of our operators to control the onscreen interface for the participant. The operator was responsible for monitoring the gestures of the participant and would respond with the corresponding key presses on an inVision project. This solution worked better than expected, albeit far from perfect. For our purposes of learning quickly and failing fast, it was good enough. In fact, this technique allowed us to run a test where our prototype was a mounted cardboard box. Users were told that it was a trackpad and they could use it with the same gestures they've been doing on screen.

User Testing

We tested with a total of 22 participants in 90-minute test sessions through five different prototypes. Each of our prototypes was designed to help us validate our early hypothesis and value assumptions. We screened all of our users to ensure we got had the right type of participants, we looked for licensed drivers who were somewhat tech-savvy, with some knowledge of or experience with voice assistants, and were regular users of audio streaming services.

image

Our testing protocol started out with a short opening interview, to help us build rapport and get to know our participants, we then took them through each of the five prototypes with us soliciting feedback after each one. We concluded each session with a debrief interview and a card sorting exercise.

Routines

Gesture only

Proactive notifications

Seek and Scan

Voice only

Card sorting

We finished our sessions with a card sorting exercise where participants were asked to rank Jobs, Pains, and Gains in order of importance. Participants were also able to discard a card if it wasn’t important to them. We gave the most important card a greater weight, and each subsequent card’s score was reduced by one. Any discarded cards received a penalty of negative three.

participant card sorts of jobs, pains, gains
participant card sorts of jobs, pains, gains

Synthesis

Key Insights

💡
Notifications have a hierarchy Not all notifications are weighted the same by drivers
💡
Effortless browsing is important The ability to fluidly switch between lean-in and lean-back selection modes lightens the cognitive load
💡
Switching between apps and content is challenging Most mobile apps and services aren't designed for the car
💡
The Screen is distracting no matter the circumstances Seeing the interface isn’t as important as being able to control it
💡
Wake words make small control tasks Cumbersome Voice is best for major changes, gesture is better for small changes.
💡
Gestures have their place in the car Swiping left and right was universally accepted
💡
Visual navigation and radio scanning are the competition Driver’s are looking for a digital equivalent to the lean-back experience of AM/FM radio built around the core visual experience of Navigation

Needs prioritization

Ranking our customer's needs helped us define a value proposition that addressed the things customers really care about. Keeping Eyes on the Road is the top job and something that we think our product can help in a variety of ways. After our card sorting scores were calculated these were the highest priority needs our users had. Note how practical the top of the list is versus how emotional the bottom half is. This supports one of our initial insights: the basic expectations for tech in the car have yet to be met.

image

Competitive Analysis

I worked with my partner to develop this 2x2 matrix to help us do our competitive analysis. We ranked competing products lean-in and lean-back. Lean-in experiences require our attention, although are best suited for top-of-mind requests. Lean-back experiences support browsing and scanning behaviours. We also evaluated them for whether they support voice-only interaction, touch-only, or a hybrid of both touch and voice.

image

Through this mapping, we realized that with the exception of standard AM/FM radio, there is a market gap in products that offer drivers more lean-back experiences. This creates an opportunity for us to serve this market with experiences that offer more proactive, aggregated, browsable, and subsequently lean-back content consumption.

Value Proposition Design

With a clearer understanding of our customer's most pressing needs and a lot of great feedback from the participants during our validation testing we further refined our value proposition statement. Through two rounds of iterative rapid prototyping and user testing we were able to invalidate the initial hypothesis that voice alone was sufficient in the car, validate the importance of lean back control and multimodality,

image
image

Car Mode Demo

To help us share our product concept, features and value proposition with our stakeholders we filmed a ‘product vision’ video for the purposes of demoing the solution.

Next steps

Through user testing and iterative, low-fidelity prototyping, our team was able to rapidly generate risk-reducing insights and identify a strategic path forward. Our research identified out-of-scope items and user “jobs to be done,” most notably demonstrating that a pure voice-only experience wasn’t viable given the high cognitive load voice places on the driver. This finding persuaded the client to pivot toward a multimodal interface incorporating gestural touch inputs.

The client then selected us to begin a new phase of the project that included high-fidelity prototyping, further iteration and refinement of a gestural vocabulary and designing and in-depth onboarding flow.

In phase 2, the focus of Detroit is to validate the effectiveness of gesture interactions, specifically for keeping the driver's eyes on the road, through a functional prototype.

You can read about the subsequent project work right here.