Customer service is script-driven. As a matter of fact, the human brain has a concept of scripts and domains. It helps with disambiguation in speech communication. At a fast food counter, e.g., the person taking orders will not expect questions on train schedules or stock quotes. The domain at the counter is food, drink, and dessert ordering. Therefore, sluggish speech can still be understood as the person taking orders maps it to a very small set of expected words and phrases. Furthermore, they follow a clearly defined script of questions to ask – or “semantic slots to fill”, before they can successfully process the order.
When designing an IVR system for your customer self-service on the phone, VUI (voice user interface) designers follow various guiding principles, among them caller-centricity, conciseness, and natural (human) language patterns. The goal is to design a spoken interface that answers the caller’s question in such a way that the caller has to deviate from common and trained patterns of spoken communication as little as possible, without attempting to fool the caller into believing they are talking to a human.
What’s interesting is when concepts applied in VUI design, such as multi-slot speech recognition (the ability to provide multiple fields of information/data in a single utterance), don’t actually work well in some human communication situations. I challenge you to the following: Go to the fast food chain of your choice and place an order with more than 3 items of information. Example: “I’d like a large Hamburger meal with extra BBQ sauce, a Caesar salad, and a vanilla shake, for here”. More than once the first response I got was: “Is this for here or to go?” Uh… I just told you! (By the way, I certainly never order the above. This is just an illustration to make my point. (I prefer strawberry banana smoothies)). I have learned that fast-food chain employees actually prefer hearing as little information as possible with your first request – or at least they ignore any “data fields” they didn’t expect in your first request.
When designing VUIs you certainly strive to mimic human understanding capabilities, but it is easy to overshoot the mark as the above example shows. Design your prompts such that they elicit utterances of 2 or (max) 3 items at a time, and follow-up with a concise set of directed dialog questions to complete the transaction. When it comes to designing and developing the “understanding” side of the equation, though, make sure to accommodate as much variety in the user response as possible, without letting your grammar over-generate (ie allow responses that are very unlikely to be uttered), as that will impact recognition accurate over the phone. When it comes to dialog design, Siri, with its integration to Fandango for movie tickets and OpenTable for restaurant reservations, among others, is doing a good job with this. Due to technical restrictions, however, the interface is not yet as fast as one might want to have it. In the end, with me being familiar with the script and domain of the interaction, the follow-up questions are known to me, so therefore could be more to the point.
The post Multi-Slot Speech Recognition at the Fast Food Counter – IVR vs the Human Brain appeared first on Aspect Blogs.