Exploring the in-car gesture experience (pt 2)

Exploring the in-car gesture experience (pt 2)

This project was preceded by a research, ideation and prototyping project that you can read about here. The team for this phase of the work shuffled a bit where the product manager and my fellow product designer were both swapped for another product manager and product designer. This meant that I was responsible for onboarding them and kicking off this phase of our project where we were doing two more rounds of prototyping and iteration in order to help us converge on the requirements for a minimal viable product.

Due to the inherent limitations of WoZ (Wizard of Oz) prototyping, our focus in this next phase is to further validate the effectiveness of gesture interactions, specifically for keeping the driver's eyes on the road, through a functional prototype. Only a functional prototype will have the responsiveness that we'll need in order to accurately assess the usability and desirability of gestural input.

Key hypothesis to validate

We believe that gesture interactions will help drivers keep their eyes on the road while managing media content.

Prototyping Approach

In our last round of validation testing, we wanted to ensure we were getting to a high degree of experiential fidelity. Along with having functional gesture input we also wanted to ensure that the library, recommendations and content aggregation were tailored to each individual. This was important because we had minimal ability to do this with our InVision prototypes and as such were restricted in

Configurable All prototypes leverage the same gesture and content settings are were designed to be customizable in response to feedback, allowing us to change the gesture mapping on the fly.

Personalized Tailored to each participant across all content types and populated by artists, music genres, news topics, podcasts, and radio, that the participants listen to in order to simulate a realistic experience.

Aggregation A comprehensive experience that includes music, podcasts, news and radio from multiple sources by leveraging available APIs from Spotify, TuneIn and Audioburst.

Value Proposition Design

We used the Value Proposition Canvas as a living document that we iterate on throughout our discovery process. Since we were shifting the nature of our hypothesis-driven validation process away from questions around voice interaction, to instead focus on those hypotheses regarding the desirability of aggregation, discovery as well as around the usability and desirability of gesture interactions, we began this iteration by removing all jobs, pains, and gains from our value proposition directly related to voice and rewrote our value proposition statement to articulate a focus on gesture input, media browsing, and effective aggregation.

Value Proposition Statement

Our car-mode media player helps drivers keep their eyes on the road and improves control over app content and music services.


Articulating our value proposition helped us to identify the key assumptions that we need to focus our validation efforts towards in our next round of prototyping and experimentation.

Some of our high-risk hypotheses were as follows:

  • We believe that drivers feel that above all, the road requires their eyes and attention
  • We believe that drivers will be able to remember a basic and advanced set of gestures
  • We believe that drivers prefer more passive lean-back browsing to lean-in directed control
  • We believe that drivers want to listen to content they haven't discovered before
  • We believe that drivers want access to dynamic, fresh content when their playlists get stale
  • We believe that drivers will value an agentive curation based on context


I started this next phase with some preliminary benchmark research. After identifying the need for more lean-back interactions we wanted to take a look at living room experiences because the context of the living room requires lean-back experiences that are actually similar in some regards to the car environment.

We were surprised and encourage to see many similar interaction patterns, in particular, the media matrix and top-line recommendations of Android TV.

Recommendation row on Android TV Nougat
Recommendation row on Android TV Nougat


In order to validate our hypotheses, we built four prototypes that all employed gesture as a primary input and did not include voice input. While we recognize its high value for drivers, we had already validated its value for top-of-mind queries and requests, and there was no need to duplicate that work. Our goal here was to gain deeper insight into the value of gesture control for media playback and validate it for usability, and desirability.

We used a Lean In - Lean Back scale to guarantee we were representing various degrees of user engagement, and we weren't looking at highly Lean-In experiences because they aren't well suited for the automobile and as we learned, typically voice input can address these kinds of top-of-mind requests.



On the right of this spectrum, we have the Routines prototype, which required the least effort and cognitive load from the users because it dynamically aggregates content based on context and a driver's preferences. While straightforward to initiate and control it provides the least amount of choice to the user.

On the far left, we have voice interaction alone, which in our early testing required a significant amount of forethought and cognitive load. We were not testing voice further, and have validated its importance in the car. Next to it, the Seek and Scan prototype, which gave users the most choice but required a significant amount of attention and effort to use.

Seek & Scan prototype

Swipe between multiple sources of content

While driving, participants could access multiple music sources stacked vertically into playlists that can be browsed by swiping left to right.

  • Swiping up and down to switch between playlists. Swiping left and right to switch between tracks in a playlist.
  • This navigation/organization pattern inspires familiarity

Seek & Scan demo

Stations prototype

Generative content across news, music, and podcasts

While driving, participants could access multiple music stations based on their preferences for news sources and music genres.

  • Swiping up and down switches between genres, news sources, or topics. Swiping left and right switches between tracks in a genre.
  • This navigation/organization pattern inspires discovery 

Stations demo

Routines prototype

Passive consumption of customized content

While driving participants had access to the VPA which provided contextual drive information and tailored and aggregated content. Routines are customized and may be suggested based on driver, time of day, routing or other contextual cues.

  • Routines exist in the uppermost row and can be played through, or skipped forward or backwards.
  • This navigation/organization pattern inspires safety

Routines demo

Blank screen prototype

Swipe between multiple sources of content

Evolving the cardboard box prototype, we created a functional interface that includes all primary and secondary gestures for Seek and Scan and Stations, without any graphical interface elements

  • Swiping up and down to switch between playlists. Swiping left and right to switch between tracks in a playlist.
  • Could be implemented on a device without a screen
  • This navigation/organization pattern inspires safety

Blank screen demo

User Testing

Our testing setup was a lot more straightforward than in our previous iteration due to the fact that we were no longer operating a three-person WoZ prototype. We also went from three monitors for our driving simulation, to one. This was because some of our participants had experienced motion sickness and we wanted to ensure everyone was comfortable moving forward.


Gesture configuration

Our four prototypes were code-based functional prototypes with configuration menus. We included configurable gesture settings and parameters.

  • We were able to assign app functionality to a gesture on the fly using a configuration menu
  • We were able to identify gestures that worked best by experimenting with different gestures for each action.



To better understand the value of personalization and effective aggregation wanted to ensure that our prototypes were personalized to our participant's music preferences and interests, and would include talk media such as news and podcasts. To add talk media we explored additional streaming sources to integrate with and expanded our initial offering to include genres and recommendations from Spotify (in addition to playlists).

We also were using APIs for different content providers and participants were able to log into their personal Spotify accounts.


I also found Audioburst, an Israeli startup that was digitalizing AM/FM radio broadcasts and breaking them down into snackable short 30-40 second clips based on topic type. This allowed us to identify the interests and hobbies of our testing participants in the onboarding interview, and tailor some of our prototypes to suit their interests on the fly.

Adding topics of interest to our Stations prototype
Adding topics of interest to our Stations prototype


Key insights

After a round of testing with 9 participants, we synthesized our research into user insights that would further inform our next iteration. Some of these insights directly impacted our MVP design, while others we felt were relevant to the product roadmap. Included for each insight were ‘next steps’ which were incorporated into our final MVP requirements.

You can read the full report here:


  • People want to explore new content and listen to what they already love.
  • Context and mood are the main factors that guide what users want to listen to.
  • Primary gestures are natural. Secondary gestures add value.
  • Voice feedback is useful but brevity, fidelity and efficiency in responses is essential.
  • Interfaces must be responsive to user input (touch, and voice)
  • Graphical elements are supplemental but not core to the experience.

UI Insights

Along with our user needs insights we documented our insights around the gesture interfaces themselves and our observations from our user's interactions with our prototypes. We summarized these observations to help inform our MVP requirements.

You can read more here:



For drivers, their number one job-to-be-done in the car is to keep their eyes on the road. We believe that an interface should be highly glanceable in order to facilitate this need.


There is a hierarchy to the information that users need when they are using the interface and designing with this hierarchy in mind makes the GUI more glanceable


When a user interacts with the prototype, the slightest delay in response would cause confusion, distraction, and user error. It’s important to make Detroit extremely responsive to every user input for maximum control and understanding.

Intuitive Gestures

Our primary gestures were further validated in this phase, they were perceived as useful, natural and easy to remember by our all of the users. Secondary gestures were seen as useful but will require some onboarding.

Final Design & MVP Development

To finish our work on this project, we were tasked with incorporating everything that we'd learned into a 'Car mode' for the client's core music streaming app, focusing on gesture-based aggregation and source switching, all with an understanding that voice control is coming over the next horizon. We incorporated our findings from user testing into our final requirements, which we then moved on to design and develop into a polished minimal viable product.

Introducing: Car Mode Gesture Mode

The car mode player also has significant value outside of the context of the car as it enables quicker browsing and lean-back listening. Mobile users are also getting increasingly accustomed to gestures on iOS and Android, and so rather than prescribe it with the name Car Mode, we felt the name Gesture Mode allows the user to decide when they want to use it. Moving forward, This mode could become the standard player in order to create a more focused experience, as having more than one player in the app will be confusing for the user.


There was a lot of value seen from our users towards the various prototypes we tested exploring alternative approaches to content aggregation. Rather than pursue these avenues further we narrowed our focus on the core UX/UI of the gesture experience, but significant value remains in exploring both contextually-aware routines and better facilitation of music discovery.


Value Proposition

Our gesture-based player helps users keep their eyes on the road and improves control over app content and music services.


Our Gesture Mode interface creates a distraction-free experience. The use of large text (e.g. track title) allows for quick glanceability. The use of track images and hinting provide a degree of tangibility to suggest swiping interactions. With the exception of the press and hold buttons in the top left and right corners, there are no discreet tap targets. The rest of the interface is one large trackpad. This allows for interaction while a driver (or cyclist) keeps their eyes where they should be, on the road.

Adapting the IA

It’s important that the mental model of the Gesture Player is supported and enforced by matching information architecture. This presents a challenge when integrating into an existing design. The information architecture in the music streaming application was different than our prototypes, it introduced the notion of categories as an additional layer. To integrate the gesture player seamlessly the elements are vertically arranged in a way that follows the mental model of the app (category -> playlist -> track).


Gesture language

We further refined our gesture language by delineating between what we called 'basic' gestures and 'advanced' gestures. The basic gestures were intuitive enough that throughout our testing the majority of our participants were able to intuitively use them without instruction.


Music Player toggle

We ideated on different ways to integrate our gesture player with the existing mobile application and decided on a simple toggle that would allow the users to easily between the legacy player and our new gesture player.


Onboarding flow

We realized that without the capacity to onboard individuals in testing sessions, we'd need a method to effectively convey our gesture language to new users. We designed an onboarding flow to help introduce this new interface to existing users of the mobile app.

Basic Gestures

Our final MVP includes important changes based on our most recent insights, including a complete UX/UI overhaul. Since this will was going to be deployed across in a large internal release, we concentrated our efforts during these last few weeks on polishing things up visually and paying attention to every detail.

Ambient display

The “blank screen” prototype was well-received by our users who felt it reduced distraction. However, some more tech-savvy users with more driving experience saw value in the screen because it added more control and understanding. To satisfy both user types the player UI can fade away into an ambient state after a number of seconds, and only show the track and artist. The screen returns immediately when the user interacts with the phone in anyway.

Track scrubbing

When listening to spoken word content like podcasts, the user might want to skip forward/back 15 seconds.

Here, a two-finger swipe left or right scrubs the track by 15 seconds.

It will come back immediately after the user interacts with the phone.

Interaction hints

We want our gesture interactions to be completely free of tap targets but for infrequent interactions, like exiting the player, this presented a challenge.

We opted for an onscreen element and tap target that requires a long press to initiate. To help novice users an onscreen hint is presented when a single tap on this tap target occurs.

Next Steps

The client's most recent mobile app release was branched to fit our Gesture Mode interface and scheduled for an internal release. While currently, the user can select the gesture player with a toggle our hope is that we can validate this new approach, gather analytics and iterate on our design with the end goal of unifying the two players and eliminating the need for a toggle.

Future Roadmap

Some of our learnings throughout this project seemed to have implications on potential new hardware formats. Which led us to think–where this could lead beyond mobile?

Voice everywhere

We validated in the first phase of this work the importance of voice in the car. But beyond the car, conversational interfaces and smart speakers are the fastest-growing categories of consumer technology and voice is being integrated into everything. Consumers will expect access to their VPA of choice and desire a connected experience where access is ubiquitous across their products.

  • Smart speakers
  • Wearables
  • Automotive interfaces
  • Headphones

A new gesture language

Through this project, I gained an appreciation for how valuable gesture input could be in multimodal conversational interfaces. There were so many small interactions to browse and control a diverse library of media that gestures made a lot of sense for these smaller actions. Voice just wasn’t the best input to do something like changing your volume control or skipping a track.


While this gesture vocabulary was tested and validated as part of a mobile media streaming app, it can also be applied to additional hardware products in new hardware formats like wearables, headphones and smart speakers. This insight came to me when we were exploring our blank screen prototype, the cardboard box originally, and then later its blank screen UI. I realized that it might encourage engagement where touchscreens might not fit, in new hardware formats and wearables that our client is involved in developing. After attaching (with tape) a small iPhone to a pair of wireless headphones it was immediately evident that this gesture language had application to a range of new hardware formats. This is particularly true in emergent categories such as smart speakers, wearables and connected headphones as well as established categories such as mobile devices or automobiles.

a rough prototype consisting of an iPhone tapped to a pair of Bluetooth headphones helps to test the usability of headphone based gestures


A unified language seemed important to establish so users could seamlessly move between devices. To demonstrate this vision, we modified our interface for both real and imagined devices across our clients' hardware ecosystem.


  • Common gesture interface that translates across product catalogue
  • Seamlessly connected ecosystem
  • Potential to increase attach rate per customer


These products have simpler interfaces that present new challenges to the established interface patterns of both keyboards and precision tap targets. A hybrid gesture and voice interface offers an opportunity to provide the lean-back and lean-in controls users are looking for.