Flexible virtual agents using GPT2 as a fallback response

Flexible virtual agents using GPT2 as a fallback response

During our exploration with body language and implicit invocation, we uncovered a new blended approach to building conversational agents that are more social and flexible, while still retaining functional utility and purpose. This blended approach is novel and exciting, in that up until now, agents do one or the other: imbue sociality or service functional needs. A step towards an agent that does both means that we’re that much closer to building conversational interfaces that are truly responsive to human needs.

To understand why this is so important, we need to examine conversational interfaces of past and present. We’ve all experienced frustration when faced with a conversational interface that just isn’t understanding us (which is, to be honest, many of them). As users we’re meant to follow along predetermined dialog flows — and when we stray from these paths, we’re typically given a canned response when an ‘intent’ isn’t matched to our ‘utterance’.

“Can you repeat that? I don’t understand.”

Unfortunately when the illusion is broken and the curtain is pulled back, it becomes harder for agents to truly engage with people and hold their attention. These edges might remind us of telephone banking IVR’s (Interactive voice response). You might be familiar with IVR’s: the rage-inducing, button smashing systems that force us through endless decision trees. Not the most well-designed interface, to say the least.

“I’m sorry, I don’t understand what you mean. Did you want an indoor plant or an outdoor plant?”

image

At this point in our project, we had already developed a conversational agent — the “Shopkeeper” who was embodied in an avatar and existed in a virtual reality simulation of a flower shop. While I had already established the agent's dialogue flows, and now we wanted to see if we could incorporate an open domain language model to allow for more flexible conversational interactions.

image

A virtual placeona

In order to ground our experience and provide context for our agent's conversational domain, I decided on the setting of a virtual flower shop with a shopkeeper as our agent.

image

Development

Typically a fallback response is used for conversation repair, which is an agent's response when there is no intent matched from a user's utterance (what they say). Generally, fallback responses are intentionally designed to steer people back to the ‘happy path’ of the conversational flow. Instead, we were going to be calling our webhook, providing it with the captured utterance, the input required to get a generated response from our GPT2 instance.

This method allows our shopkeeper agent to make small talk, and converse about subjects completely outside of their predetermined domain, while still understanding when the conversation steers within the realm of the flower shop and can seamlessly switch back to her predetermined dialogue flows.

image

Around this time Facebook Research also released Blenderbot, another open-domain language model, however this one was optimized for engaging conversational dialogue. In our initial testing it was slightly faster, and produced positive results. We ended up swapping GPT2 to Blenderbot in our architecture.

Here is a diagram of the architecture of our AWS and GCP web services:

Diagram of architecture for shop keeper agent
Diagram of architecture for shop keeper agent

Demo

Now to show you our shopkeeper in action — the first half of the conversation is an example of our agents flexible small talk, she is improvising with Blenderbot. However, when the conversation shifts to plant shopping the dialogue manager is able to have the predetermined flow to take over. Even with the faster model, there is a significant delay in our agent's responses, as we’re not running the model locally. However, the potential for hybrid closed/open domain chatbots is still clear. Open-domain language models are here, and getting really good, really fast. Conversation designers and product developers should anticipate a future where they are shaping the contours of their agents, not writing every single line of their story.