PG 1 Improving Speech Applications with Usability Surveys How does Nortel measure the ‘Usability Pulse’ of Self Service? Judith Sherwood Sales Engineer Nortel Self Service Solutions PG 2 Number of Participants Many Usability Test Methods Live Pilot Usability Survey Employee Test Calls Follow-Up (call-back) Surveys Focus Groups Data from Each Participant PG 3 What is a Usability Survey? • Usability testing is an evaluation of a customer touch-point from a user perspective • Typically conducted using small focus groups (12-20 subjects) in a controlled studio environment PG 4 What is a Usability Survey? • Methodology: Usability Survey using 200 to 500+ panelists to reveal problems affecting fewer callers o Chosen from a large pool of over 80,000 panelists with known demographics PG 5 Size Matters ! • Traditional Usability Testing Methods have sample size limitations • If a problem affects only 5% of the users: o 10-call sample has a 40% chance of finding it o 100-call sample has a 99% chance of finding it • If a problem affects only 1% of the users: o 100-call sample has a 63% chance of finding it o 500-call sample has a 99% chance of finding it PG 6 Methodology Panelist places call, completes a task, and then fills out a questionnaire on an internet website Design Tasks & Questionnaire Entire call conversation is recorded for analysis Target Service Survey ties the individual caller experience and questionnaire response to the call recording Panelist Pool Call-in Platform PostQuestionnaire Analysis & Recommendations Report PG 7 Methodology More efficient, less expensive, faster execution, broader feedback Panelists recruitment and call-in campaign is outsourced Design Tasks & Questionnaire Target Service Analysis provides recommendations for service improvement Listen to problem calls to suggest ways to fix Panelist Pool Call-in Platform PostQuestionnaire Analysis & Recommendations Report PG 8 Usability Survey Grading Process • Percentile-based letter grading system to compare against other speech applications • Raw Scores are based on: o Caller Satisfaction (% very satisfied - % dissatisfied - % very dissatisfied) o Task Completion (% who finish task in one call) o Consistency (variability in call length) PG 9 Lessons Learned: A Case Study • “Acme”: A regional Managed-Care Health Insurance Company. Customer Service available for Members and Healthcare Providers • Available Tasks in Self-Service: o Members: • Check Co-Pay amount and Physician Name • Order a replacement ID card o Providers: • Check Claim Status • Verify Member Status and Co-Pay • Initial Usability Survey, then Tuning, followed by 2nd Survey PG 10 Initial Results 1.0 How easy was it for you to accomplish your objective in this call? Very easy Easy Neither easy nor difficult Difficult Very difficult I did not accomplish my objective in this call. 16% (85 Panelists) 27% (146 Panelists) 15% (79 Panelists) 21% (110 Panelists) 4% (22 Panelists) 17% (93 Panelists) • Call Completion Grade = D • Call Completion Score = % accomplished objective in one-call only • Call Completion Score = (100–17) x (0.83) = 69% 2.0 How satisfied were you with your overall experience? Very satisfied 12% (66 Panelists) Satisfied Neither satisfied nor dissatisfied 33% (174 Panelists) 14% (73 Panelists) Dissatisfied Very dissatisfied 29% (155 Panelists) 12% (66 Panelists) No Response 0.2% (1 Panelist) • Satisfaction Grade = D • Satisfaction Score = %VS – %D – %VD • Satisfaction Score = +12 – 29 – 12 = – 29% PG 11 After Tuning 1. How easy was it for you to accomplish your objective in this call? Very easy Easy Neither easy nor difficult Difficult 52% (285 Panelists) 25% (136 Panelists) 8% (44 Panelists) 6% (35 Panelists) Very difficult 1% (6 Panelists) I did not accomplish my objective in this call. Call Completion Grade = C Call Completion Score = (100–7)x(0.86) = 80% 7% (37 Panelists) No Response 1% (3 Panelists) 2. How satisfied were you with your overall experience? Very satisfied 50% (275 Panelists) 30% (165 Panelists) Satisfied Neither satisfied nor dissatisfied 8% (46 Panelists) Dissatisfied 8% (46 Panelists) Very dissatisfied Satisfaction Grade = A Satisfaction Score = +50 – 8 – 2 = +40% 2% (13 Panelists) No Response 0.2% (1 Panelist) PG 12 UI Overall Improvement After Tuning • How easy was it for you to accomplish your objective in this call? Initial Results Very easy Difficult Very difficult I did not accomplish my objective in this call. 27% (146 Panelists) 21% (110 Panelists) 17% (93 Panelists) Easy Neither easy nor difficult 15% (79 Panelists) 4% (22 Panelists) After Tuning Very easy 16% (85 Panelists) Easy Neither easy nor difficult 69% Call Completion Difficult Very difficult I did not accomplish my objective in this call. No Response 80% Call Completion 52% (285 Panelists) 25% (136 Panelists) 8% (44 Panelists) 6% (35 Panelists) 1% (6 Panelists) 7% (37 Panelists) 1% (3 Panelists) • How satisfied were you with your overall experience? Very satisfied 12% (66 Panelists) Satisfied Neither satisfied nor dissatisfied Dissatisfied Very dissatisfied 33% (174 Panelists) 14% (73 Panelists) 29% (155 Panelists) 12% (66 Panelists) No Response 0.2% (1 Panelist) – 29% Satisfaction Score Very satisfied Satisfied 50% (275 Panelists) 30% (165 Panelists) Neither satisfied nor dissatisfied 8% (46 Panelists) Dissatisfied 8% (46 Panelists) Very dissatisfied 2% (13 Panelists) No Response 0.2% (1 Panelist) + 40% Satisfaction Score PG 13 Best Practice 1: Catch Recognition Problems Quickly • Look for the Red Flags of Voice Recognition o Low satisfaction and call-completion scores o Low voice-recognition rating scores o Check Complaints in free responses o Observe if “can’t hear” problems for low-score panelists • Work with Developer and IT support o Is speech level strong enough in the IVR? Check switch gain levels o Then check speech detector parameters o Then check recognition confidence thresholds and grammars PG 14 Voice Recognition Issues • How well or poorly did the system recognize your responses when you spoke the answer to questions? Initial Results Very well Well Neither well nor poorly 11% (59 Panelists) 21% (110 Panelists) 12% (66 Panelists) Poorly Very poorly After Tuning 34% (183 Panelists) 22% (116 Panelists) Very well Well Neither well nor poorly Poorly Very poorly 47% (257 Panelists) 31% (167 Panelists) 8% (46 Panelists) 10% (57 Panelists) 3% (16 Panelists) No Response 1% (3 Panelists) No Response 0.2% (1 Panelist) • How quickly or slowly did the system respond to your spoken answers? Very quickly 16% (88 Panelists) Quickly Neither quickly nor slowly Slowly Very slowly 44% (238 Panelists) 21% (113 Panelists) 13% (72 Panelists) 4% (23 Panelists) Very quickly 46% (249 Panelists) Quickly Neither quickly nor slowly Slowly 41% (225 Panelists) 8% (46 Panelists) 3% (16 Panelists) Very slowly 1% (8 Panelists) No Response 0.4% (2 Panelists) No Response 0.2% (1 Panelist) • Fix: Increased digital gain from host switch PG 15 Best Practice 2: Spot Prompt Clarity Confusions • Look for the Red Flags of Prompt Confusion o Low satisfaction and call-completion scores o Low ‘What-to-Speak’ scores o Listen for caller hesitations for low-score panelists • Work with Dialog Designer o Let callers know ahead of time that they can speak o Reword prompts; Callers appreciate clear choices o Give Touch-Tone options when reprompting o Coach your voice actor PG 16 Voice-Prompt Issues • How appropriate or inappropriate was the speaking style and voice of this service? Initial Results Very appropriate Appropriate Neither appropriate nor inappropriate Inappropriate 32% (172 Panelists) 52% (276 Panelists) 11% (61 Panelists) 3% (17 Panelists) After Tuning Very appropriate 58% (319 Panelists) Appropriate Neither appropriate nor inappropriate 36% (198 Panelists) 4% (20 Panelists) Inappropriate 1% (5 Panelists) Very Inappropriate 1% (7 Panelists) Very Inappropriate 0.2% (1 Panelist) No Response 0.4% (2 Panelists) No Response 1% (3 Panelists) • Was it clear what you needed to select or say at each step of the call? Clear for all steps 27% (142 Panelists) Clear for all steps Clear for almost all steps 27% (147 Panelists) Clear for almost all steps Clear for some steps Clear for only a few steps Clear for no steps 24% (129 Panelists) 16% (88 Panelists) 5% (28 Panelists) No Response 0.2% (1 Panelist) Clear for some steps Clear for only a few steps 58% (315 Panelists) 29% (160 Panelists) 8% (45 Panelists) 4% (21 Panelists) Clear for no steps 1% (4 Panelists) No Response 0.2% (1 Panelist) PG 17 Voice-Prompt Issues • Fix: Clarify prompt choice wording o Initial: “Are you a member, provider, or a nonmember looking for information?” o After Tuning: “First, tell me who you are: a member, a provider, or a non-member. ” • Fix: It’s not just what you say, but how you say it. o Coach your voice talent for proper inflections. PG 18 Best Practice 3: Spot Call Flow Frustrations • Look for the Red Flags of Call-Flow Frustration o Low satisfaction and call-completion scores o Low Enough-Choices scores o Check complaints about inflexible systems o Listen for caller frustrations for low-score panelists • Work with Dialog Designer o Look for easy ways to complete repetitive tasks o Provide an easy exit strategy o Leverage the spoken language instinct PG 19 Call-Flow Issues • When you were given menu choices, were you given too many, just enough, or too few? Initial Results Too many choices 11% (61 Panelists) Just enough choices Too few choices 74% (395 Panelists) 15% (78 Panelists) No Response 0.2% (1 Panelist) After Tuning Too many choices 10% (55 Panelists) Just enough choices Too few choices 85% (466 Panelists) 4% (22 Panelists) No Response 1% (3 Panelists) PG 20 Call-Flow Issues • Fix: To clarify options, Provide Anchor, Split Main Menu choices, and Add Grammar Synonyms o Initial: o o o “Alright, I’m going to tell you the things I can help you with. When you hear the right one, just say it… Verify member status and office co-pays. Get status of a claim. Get member ID cards. Change PCP. Or, order forms and literature. After Tuning: “Alright, Main Menu. Please say one of the following options at any time … Verify member status. Check PCP co-pay. Get claim status. Order ID cards. Change PCP. Order forms. Or, order literature …” PG 21 Best Practice 4: Spot Task Differences • Look for the Red Flags of Task Differences o Low satisfaction and call-completion scores for Specific Tasks o Check complaints about complicated tasks o Listen for caller confusion in low-scoring tasks • Work with Dialog Designer o Look for easy ways to complete repetitive tasks o Make sure task instructions are clear o Provide an easy operator exit strategy PG 22 Task Differences • Relative Task Satisfaction can change after tuning o Can’t Hear problem can swamp other UI task issues o Claim Status Sat. low for business reasons (no shortcut for multiple claims) PG 23 Task Differences • Fixes o Improve recognition to shift all scores up and reveal other UI issues o Future: Give clearer exit menu options o Future: For Claim Status, offer stream-lined repeat-claim function Claim Status Verify Status Satisfaction AFTER Task Satisfaction BEFORE ID Cards Office CoPay -75 -25 25 75 Satisfaction Score PG 24 Conclusions • The Usability Survey method is very effective for tuning applications prior to pilot production o Collect hundreds of calls and analyze results efficiently • Recognition/Prompt/Call-Flow issues are revealed quickly in Usability Survey o Listen to calls for which various ratings are low • Task differences show the effect of Task Consistency, Complexity, and Business Rules on Usability Quality o Longer tasks require more call-flow efficiency and friendly hand-holding. o Transfer to Agent for business reasons can sometimes lower satisfaction if callers must first spend a long time in Self-Serve, or if transfer takes place with no explanation. PG 25 Thank You! PG 26