Intelligent Virtual Agents Get Smarter Thanks to Google Contact Center Speech & NLP Enhancements

Inference Studio Customers Get Automatic Access to New Advancements

Last week Google product managers, Dan Aharon and Shantanu Misra announced new speech and NLP enhancements to Cloud Speech-to-Text and Dialogflow that can improve recognition accuracy by up to 40% and offer better virtual agent performance.

These enhancements are described in detail in their blog post and  include:

·     Auto Speech Adaptation in Dialogflow (beta)

·     Speech recognition baseline model improvements for IVRs and phone-based virtual agents in Cloud Speech-to-Text

·     Richer manual Speech Adaptation in Dialogflow and Cloud Speech-to-Text (beta)

·     Endless streaming in Cloud Speech-to-Text (beta)

·     MP3 file format support in Cloud Speech-to-Text

Be assured that we are testing these new features and will add them to our platform so that tasks built within Studio get automatic and immediate access to the capabilities that are of value to you. 

You might be wondering why it’s so easy. 


It’s because we developed Inference Studio as an application development layer that sits on top of speech and NLP services like those provided by Google. This was done by design, so that our customers would always have access to the latest enhancements released by any of our underlying speech providers. With our drag-and-drop service creation environment, you can easily build self-service applications, either from scratch or by using a pre-built application from our task library. Spoken and text-based dialogs are created by assembling a series of nodes. Our Cloud Speech-to-Textnode uses Google Cloud Speech-to-Text to translate human speech into text and our Dialogflow node works with Google Dialogflow for NLP.  So, any applications that use these nodes immediately benefit from Google’s updates. 

What improvements can customers expect?

Our customers will get the greatest benefits from the following enhancements:

1.       Google announced Speech-to-Text baseline model improvements for IVRs and IVAs. According to Google 

“In April 2018, we introduced pre-built models for improved transcription accuracy from phone calls and video. We followed that up last February by announcing the availability of those models to all customers, not just those who had opted in to our data logging program. Today, we’ve further optimized our phone model for the short utterances that are typical of interactions with phone-based virtual agents. The new model is now 15% more accurate for U.S. English on a relative basis beyond the improvements we previously announced.“

Our upcoming release, Studio 6.3 will add support for these pre-built models. Inference Studio applications that use our Speech-to-Text node automatically benefits from these improvements.

2.       They also announced a beta release of Auto Speech Adaptation in Dialogflow.The purpose of this update is to help virtual agents more accurately understand what a caller is asking for. By helping the virtual agent better understand the context of the inquiry, it can increase the accuracy of understanding caller utterances made throughout the conversation. Google uses the following example:  

“…if the Dialogflow agent knew the context was “ordering a burger” and that “cheese” is a common burger ingredient, it would probably understand that the user meant “cheese” and not “these”. Similarly, if the virtual agent knew that the term “mail” is a common term in the context of a product return, it wouldn’t confuse it with the words “male” or “nail”. “

Google reports that use of Auto Speech Adaption can improve virtual agent accuracy by up to 40%. If you have developed your own Dialogflow agents, simply turn on Auto Speech Adaption and you can start benefiting from the improvements . 


3.       Richer manual speech adaptation tuning in Cloud Speech-to-Text. Earlier this year we added support for phrase hints within Studio. Phrase hints are a list of phrases that act as "hints" to boost the probability that words or phrases will be recognized. Phrase hints are used from within Google’s Speech Context Parameter.  With the latest updates, Google has introduced three enhancements which are now in beta and will be supported this year within Studio: 

  • Speech Context Classes - Using classes lets developers tune ASR for a whole list of words at once, instead of adding them one by one. Classes are now supported within our Cloud Speech-to-Text and Dialogflow nodes.

  • Speech Context Boost - The new “boost” feature lets you set a speech adaptation strength based on your use case. This is also now available within Studio.

  •  Speech Context Expanded Phrase Limits – Google announced that “ The maximum number of phrase hints per API request has now been raised by 10x, from 500 to 5,000, which means that a company can now optimize transcription for thousands of jargon words (such as product names) that are uncommon in everyday language.” We have also raised the limit within our Studio nodes. Studio automatically now supports the expanded phrase limits.

4.       Google increased session streaming time. With our 6.0 release earlier this year, we added support for Google real-time streaming.  By streaming and simultaneously interpreting audio from callers in real-time, we were able to greatly enhance the over-the-phone customer experience, removing the “awkward pauses” and misunderstanding synonymous with traditional speech recognition. Google announced that streaming audio increments have been increased from 1 minute to 5 minutes to improve support for longer-running transcriptions. Studio applications now benefit from this increase.


As a Google Cloud partner, we receive access to pre-release versions of AI software updates. It will continue to be our mission to review new enhancements and add support for those updates that benefit our customers. Our goal is also to maintain a platform that gets smarter and more capable based not only on our own advancements but also from those of our technology partners. 


Callan Schebella