jump to navigation

Why designing for a VUI is more difficult than designing for a GUI May 11, 2009

Posted by HubTechInsider in Mobile Software Applications, VoIP, VUI Voice User Interface, Wireless Applications.
Tags: , , , , , ,

Despite the fact that many Automated telephony and IVR vendors advertise that their web-based SaaS offerings can seemingly make the development, testing, deployment and maintenance of an IVR application seem easy and straightforward, this over-confidence in the VUI design abilities of untrained, non-technical business analysts and enterprise services managers is woefully misplaced. This mistaken impression is borne out by the simple fact that just because a software tool may be easy to use (even though all of these SaaS web-based vendors provide VUI tools with horrific interfaces and GUI designs, such as reliance on stone-age Java applets) only cursory thought, if any thinking at all, has been invested into how these untrained resources should use that tool. This can and often does lead to catastrophic results.

I frequently encounter the mistaken prevailing notion that designing a VUI consists of nothing more than taking a GUI and “simplifying it” for use on the telephone. As the thinking goes, we can all talk on the telephone; Not all of us can navigate a complex forms-based web site. But despite this mistaken general impression (perpetuated by IVR and automated telephony vendors and many software development teams within them, as well as their clients), some basic realities persist in shattering these ill-conceived concepts: People can read faster than they can listen with comprehension, speak faster than they can type, and talk much more quickly than they can process the meaning behind spoken words. So even though, based on initial impressions, designing an effective VUI might seem easier than designing a first-rate GUI, the converse is true: designing a great VUI is far more difficult than designing a GUI.

A VUI is inextricably linked with Time

When a user is navigating a GUI, they can read text at any location on the web page or application screen. The user can skip ahead visually to the section they are interested in. With a VUI, the user is a “prisoner” of the VUI design. The attention is captive: they must listen with (or without) patience to each word before they can hear the one that follows it. With this in mind, some best practices for VUI design emerge:

1. Long prompts are Bad: The longer the prompt, the more the user’s patience is being taxed. Introductory or “tutorial” prompts explaining how the system works may be required for an outbound IVR application or alternatively provided for the benefit of novice users, however they should not be forced upon returning visitors or outbound IVR call recipients that have received similar IVR communications in the past.

2. Long VUI menus are Bad: Again to use the GUI as a contrasting example, on a web page you can present many menu options to the user, even hiding numerous options in a drop-down menu. A VUI menu, on the other hand, should never exceed five or six items at the most.

3. Get to the gist of the communication quickly: Forcing your captive “audience” to listen through introductory marketing copy written into an outbound IVR or inbound VRU script will become annoying very quickly to the user. Script your important information into the beginning of your prompts.

4. Allow ‘barge-in’: Expert users who know how to use the system and know what they want to do desire the ability to speed up the automated interaction with the system. Allow them to issue their commands to the system without forcing them to wait for the system to finish talking.

5. Give expert users global hotwords: Global “hotwords”, or application-level shortcuts, allow users to “cut to the chase”, enabling them to cut through menus and enjoy the feeling of enablement that a responsive VUI system can provide.

6. Allow the user to pause the interaction: The GUI has another crucial advantage over the VUI – the ability to stop and start again exactly where you left off after an indeterminate interval. While providing the exact same level of interaction control to the user is impossible in a VUI, if within your VUI design you are asking the user to provide the system with a membership number in a COB (Coordination of Benefits) automated telephony call for a health care provider, or asking them for their account number in an inbound VRU application, or if the system wants the user to write down a confirmation code or other information, then design your VUI so that the call recipient or caller can get their pencil and paper ready, find their membership card, and say “continue” when they are ready.

The One-way Temporal Flow of the User

Of course, the spoken word is not only temporally linear, but also one-way. In the same manner in which time is a “one way street”, so is speech a “one way medium”. When you are listening to a prerecorded voice prompt, you can’t easily hit the nonexistant rewind button on your telephone. A VUI is not like watching a ball game on your DVR or Tivo, either. You can’t easily go back and listen to the prompt again. This is in stark contrast to the GUI world, where the user can jump back-and-forth within the text on the page or screen. Three simple techniques can help to alleviate this conundrum:

1. Always let the user ask to have the system repeat the prompt: Perhaps the most elementary technique to mitigate the one-way temporal flow of the user is to have the system offer to repeat the last prompt. The user must be made aware of the fact that they can have any prompt repeated to them at any time during the IVR interaction.

2. Make Help available to the user: Information or instructions that are crucial to the task completion ability of the call recipient or caller presented at the beginning of the interaction must be made available to the user at any point in the IVR interaction. Offer help to the user not only at the beginning of the call but also at moments where the user seems to have arrived at an impasse in the interaction. The need to offer help to the user is acute at “no input”, “Out of Grammar (OOG)” or “no match” states.

3. Present a summation of the gathered data: In form-filling dialogs or IVR interactions where the caller is being asked to provide information to the system, a marvelous approach to overcome the one-way temporal flow nature of the IVR interaction is to offer the call recipient or caller a summation of the data that has been gathered from them during the course of the IVR interaction so far.

Persistence in a VUI is not visible to the user as in a GUI

Callers or call recipients perhaps show the most frustration when they feel they have lost track of “where they are” in the course of traversing a scripted IVR inbound or outbound interaction. Aggravation mounts as the user becomes increasingly unsure of what to do next, and what the system expects the user to do next. Whereas a web page or application screen typically provides a multitude of visual ques, such as a menu tree, “breadcrumb” navigation path, or something similar, even something as simple and effective as a URL web address window on a browser is unavailable in the VUI world. Some approaches to mitigate these factors emerge to the experienced VUI designer:

1. Auditorily “Announce” the user’s position in the IVR exchange: In the same manner that a properly designed web page or application screen will tell the caller or call recipient where they are in terms of navigating a site or application, so should a well-designed voice interface let the user know their exact position in the IVR interaction. A simple and efective technique for providing the user with such “mental markers” is to use a word or two to announce this position to the user: “Main Menu”…”Here’s the drugs in your prescription refill:”, etc.

2. Audio breadcrumbs: The VUI version of the “breadcrumb navigation” trails that are featured so prominently on web sites in the GUI world can be emulated in the VUI world, where they prove no less useful. Each “voice page” that requires interaction with the user can be associated with a “position page” that announces the user’s position within the dialog tree. “Prescription, Reorder, Address”, as an example, would very nicely indicate to the user that they chose “prescriptions”, then “Reorder”,a nd are now confirming their prescription reorder address on file with the system. A “Go Back” provision or option should be offered to users at these “position page” states.

3. Audio Icons: Auditory icons, or “earcons”, are VUI equivalents of the GUI’s icons. These audio icons can be extremely useful to both the VUI designer as well as the call recipient or caller by either annoucing to the user that a particular action is about to be undertaken or positioning the user within a IVR menu structure or transaction path. “Wait audio”, or sounds played to the user to indicate that the system is busy performing a record lookup or other function can prevent the user from interpreting a system crash or IVR interaction end when faced with an absolute extended silence.

GUIs present one fundamental advantage over VUIs: the user navigating a web page or an application screen has control over the medium, the message, and the interaction itself. Although a poor GUI can make the user feel helplessly confused, a VUI faced with the challenges outlined above has to be near-perfect to prevent the user abandoning the IVR interaction entirely by the simple and universal act of hanging up the telephone. VUI designers should always be aware of the significant differences between designing an effective and useful GUI and VUI. It would be ill-advised to enter into a VUI design task or project of any size while carrying into the endeavor the familiar GUI design assumptions.

Want to know more?

You’re reading Boston’s Hub Tech Insider, a blog stuffed with years of articles about Boston technology startups and venture capital-backed companies,software developmentAgile project managementmanaging software teams, designing web-based business applications, running successful software development projectsecommerce and telecommunications.

About the author.

I’m Paul Seibert, Editor of Boston’s Hub Tech Insider, a Boston focused technology blog. You can connect with me on LinkedIn, follow me on Twitter, even friend me on Facebook if you’re cool. I own and am trying to sell a dual-zoned, residential & commercial Office Building in Natick, MA. I have a background in entrepreneurshipecommercetelecommunications andsoftware development, I’m the Director, Technical Projects at eSpendWise, I’m a serial entrepreneur and the co-founder of Tshirtnow.net.

Add to FacebookAdd to DiggAdd to Del.icio.usAdd to StumbleuponAdd to RedditAdd to BlinklistAdd to TwitterAdd to TechnoratiAdd to Yahoo BuzzAdd to Newsvine

%d bloggers like this: