Fault and and Performance Performance Management Management Fault for for Next Generation Generation IP IP Communication Communication Next Alan Clark, Clark, Telchemy Telchemy Alan Outline • • • • • Problems affecting VoIP performance Tools for Measuring and Diagnosing Problems Protocols for Reporting QoS Performance Management Architecture What to ask for/ integrate? Enterprise VoIP Deployment IP Phone IP Phones IP VPN Branch Office Teleworker Gateway IP Phone VoIP Deployment - Issues IP Phones ROUTE FLAPPING, LINK FAIL IP Phone IP VPN CODEC DISTORTION Gateway LAN CONGESTION, DUPLEX MISMATCH, LONG CABLES…. ECHO ACCESS LINK CONGESTION IP Phone Call Quality Problems • • • • • • • Packet Loss Jitter (Packet Delay Variation) Codecs and PLC Delay (Latency) Echo Signal Level Noise Level Packet Loss and Jitter Jitter Buffer IP Network Codec Distorted Speech Packets lost in network Packets discarded due to jitter Routers, Loss and Jitter Queuing delay Arriving packets Processing delay Input queue Queuing delay Serialization delay Output queue Prioritize/ Route Packet loss due to buffer Overflow or RED Voice packet delayed by one or more data packets Queuing Delays 200 1 x 1500 byte MTU Max delay (mS) 175 2 x 1500 byte MTU 150 3 x 1500 byte MTU 125 Added delay due to wait for data packets to be sent = Jitter 100 75 50 25 0 0 500 1000 1500 Transmission speed (kbits/s) 2000 Jitter 150 Average jitter level (PPDV) = 4.5mS Peak jitter level = 60mS Delay (mS) 125 100 75 50 0 0.5 1 Time (Seconds) 1.5 2 0 0 1 2 5 1 5 0 1 7 5 2 0 0 2 2 5 2 5 0 2 7 5 3 0 0 3 2 5 3 5 0 3 7 5 4 0 0 4 2 5 4 5 0 1 5 0 5 250 7 5 2 0 Delay (mS) & RSSI WiFi can also cause jitter 300 RSSI Delay 200 150 100 50 0 Time Effects of Jitter • Low levels of jitter absorbed by jitter buffer • High levels of jitter o o lead to packets being discarded cause adaptive jitter buffer to grow - increasing delay but reducing discards • If packets are discarded by the jitter buffer as they arrive too late they are regarded as “discarded” • If packets arrive extremely late they are regarded as “lost” hence sometimes “lost” packets actually did arrive Packet Loss 500mS Avge Packet Loss Rate 50 Average packet loss rate = 2.1% Peak packet loss = 30% 40 30 20 10 0 30 35 40 45 50 55 Time (seconds) 60 65 70 Packet Loss is bursty • Packet loss (and packet discard) tends to occur in sparse bursts - say 20-30% in density and one second or so in length • Terminology o o o Consecutive burst Sparse burst Burst of Loss vs Loss/Discard Example Packet Loss Distribution Bur st w e ight ( pa ck e t s) 200 150 100 50 0 0 100 200 300 Bur st le ngt h ( pa cke t s) 400 500 Loss and Discard • Loss is often associated with periods of high congestion • Jitter is due to congestion (usually) and leads to packet discard • Hence Loss and Discard often coincide • Other factors can apply - e.g. duplex mismatch, link failures etc. Example Loss/Discard Distribution Bur st w e ight ( pa ck e t s) 200 150 100 50 0 0 100 200 300 Bur st le ngt h ( pa cke t s) 400 500 500 400 300 200 100 0 5 MOS Bandwidth (kbit/s) Leads To Time Varying Call Quality High jitter/ loss/ discard Voice Data 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Time Packet Loss Concealment Estimated by PLC • Mitigates impact of packet loss/ discard by replacing lost speech segments • Very effective for isolated lost packets, less effective for bursty loss/discard • But isn’t loss/discard bursty? • Need to be able to deal with 10-20-30% loss!!! Effectiveness of PLC 5 Codec distortion G.711 no PLC G.711 PLC G.729A ACR MOS 4 Impact of loss/ discard and PLC 3 2 1 0 5 10 Packet Loss/Discard Rate 15 20 Call Quality Problems • • • • • • • Packet Loss Jitter (Packet Delay Variation) Codecs and PLC Delay (Latency) Echo Signal Level Noise Level Effect of Delay on Conversational Quality 5 MOS Score 4 3 2 55dB Echo Return Loss 35dB Echo Return Loss 1 0 100 200 300 400 Round trip delay (milliseconds) 500 600 Causes of Delay Accumulate and encode Echo Control CODEC RTP IP UDP TCP Network delay Jitter buffer, decode and playout RTP IP UDP TCP CODEC Echo Control External delay Cause of Echo Gateway IP Echo Canceller Round trip delay - typically 50mS+ Acoustic Echo Line Echo Additional delay introduced by VoIP makes existing echo problems more obvious Also - “convergence” echo Echo problems • Echo with very low delay sounds like “sidetone” • Echo with some delay makes the line sound hollow • Echo with over 50mS delay sounds like…. Echo • Echo Return Loss o o 55dB or above is good 25dB or below is bad Call Quality Problems • • • • • • • Packet Loss Jitter (Packet Delay Variation) Codecs and PLC Delay (Latency) Echo Signal Level Noise Level Signal Level Problems Amplitude Clipping occurs -- speech sounds loud and “buzzy” 0 dBm0 -36 dBm0 Temporal Clipping occurs with VAD or Echo Suppressors -- gaps in speech, start/end of words missing Noise • Noise can be due to o o o o Low signal level Equipment/ encoding (e.g. quantization noise) External local loops Environmental (room) noise • From a service provider perspective - how to distinguish between o o room noise (not my problem) Network/equipment/circuit noise (is my problem) Measuring VoIP performance VoIP Specific Active Test - Measure test calls Passive Test - Measure live calls VQmon ITU G.107 VQmon ITU P.VTQ Analog signal based ITU P.862 (PESQ) ITU P.563 “Gold Standard” - ACR Test 4 3 2 2 • Speech material o o o Phonetically balanced speech samples 8-10 seconds in length Test designed to eliminate bias (e.g. presentation order different for each listener) Known files included as anchors (e.g. MNRU) • Listening conditions o o Panel of listeners Controlled conditions (quiet environment with known level of background noise) Example ACR test results 50 • Extract from an ITU subjective test • Mean Opinion Score (MOS) was 2.4 40 Votes 30 20 10 • • • • • 1=Unacceptable 2=Poor 3=Fair 4=Good 5=Excellent 0 1 2 3 4 Opinion Score 5 Packet based approaches Test Call VoIP Test System VoIP Test System IP Measure call Live Call VoIP End System VQmon, G.107. P.VTQ VoIP End System IP Passive Test Passive Test Packet based approaches • ITU G.107 o o R = Ro - Is - Ie - Id + A Really a network planning tool Missing many essential monitoring features • VQmon o o ITU G.107 + ETSI TS 101 329-5 Annex E +……. Proprietary but widely used (Superset of G.107 & P.VTQ) • ITU P.VTQ o Available late 2005, very limited functionality Extended E Model - VQmon 4 State Markov Model Gather detailed packet loss info in real time Arriving packets Loss/ Discard events Discarded Jitter buffer CODEC Signal level Noise level Echo level Metrics Calculation Call Quality Scores Diagnostic Data Modeling transient effects Ie(burst) Measured Call quality User Reported Call quality Ie(VQmon) Ie(gap) 10 15 20 25 Time (seconds) 30 35 VQmon - computational model Burst loss rate Perceptual model Calculate R-LQ MOS-LQ Ie mapping Gap loss rate ETSI TS101 329-5 Signal level Noise level Calculate Ro, Is Echo Delay Calculate Id Recency model ITU-T G.107 Calculate R-CQ MOS-CQ Accuracy: Non-bursty conditions Com pa rison of VQm on v s ACR MOS - I LBC 1 5 .2 k Com pa rison of VQm on v s PESQ - I LBC 1 5 .2 k 4 5 PESQ ACR MOS 4.5 3.5 VQmon MOS- LQ VQmon MOS- PQ P ESQ Score MOS Score 4 3.5 3 2.5 3 2.5 2 2 1.5 1.5 1 1 0 5 10 Pa cke t Loss Ra t e ( % ) 15 20 0 5 10 15 20 Pa cke t Loss Ra t e ( % ) 25 30 Accuracy: Bursty conditions G.107 o o o o o • Well established model for network planning No way to represent jitter Few codec models Inaccurate for bursty loss Conversational Quality only VQmon o o o o o o Extended G.107 Transient impairment model Wide range of codec models Narrow & Wideband Jitter Buffer Emulator Listening and Conversational Quality 4 3.5 Estim a te d MOS • 3 2.5 E Model 2 1.5 1.5 2 2.5 3 3.5 ACR MO S Comparison of VQmon and E Model for severely time varying conditions 4 Signal based approaches Test Call VoIP End System P.862 Tester IP VoIP End System P.862 is an Active Test Approach VoIP End System IP P.563 is a Passive Test Approach VoIP End System P.563 Tester ITU P.862 - Active testing Tested segment of connection IP PESQ Audio files Time align FFT… Compare FFT… PESQ Score ITU P.862 - Active testing • Send speech file • Takes typically 50-100 MIPS per call 3.5 P ESQ Score s • Compare received file with original using FFT 4 3 2.5 2 1.5 • MOS-like score in the range 0.5 to 4.5 • Widely used within the industry 1 0 5 10 15 20 25 30 Pa cke t Loss Ra t e Results for G.729A codec for a set of speech files (i.e. for each packet loss rate the only thing changed is the speech source file) 35 40 ITU P.563 - Passive monitoring • Analyses received speech file (single ended) 5 .0 0 • • • Produces a MOS score Correlates well with MOS when averaged over many calls Requires 100MIPS per call ACR MOS 4 .0 0 3 .0 0 2 .0 0 1 .0 0 1 2 3 4 P5 6 3 Scor e Comparison of P.563 estimated MOS scores with actual ACR test scores. Each point is average per file ACR MOS with 16 listeners compared to P.563 score 5 Performance Monitoring - Passive Test Embedded Monitoring Function RTCP XR SIP QoS Report SLA Monitoring - Active Test Test call Active Test Functions Active or Passive Testing? • Active testing o works for pre-deployment testing and on-demand troubleshooting • But!!!! o IP problems are transient • Passive monitoring o o o Monitors every call made - but needs a call to monitor Captures information on transient problems Provides data for post-analysis • Therefore - you need both VoIP Performance Management Framework Network Management System Call Server and CDR database Signaling Based QoS Reporting Network Probe, Analyzer or VQ Router VoIP Endpoint SNMP Reporting VQ VQ VoIP Gateway RTP stream (possibly encrypted) Embedded Monitoring Media Path Reporting (RTCP XR) Embedded Monitoring VoIP Performance Management Framework • Embedded monitoring function in IP phones, residential gateways…. o o Close to the user Least cost + widest coverage • Protocol support developed o o RTCP XR (RFC3611), SIP, MGCP, H.323, Megaco Draft SNMP MIB • Works in encrypted environments • Already being deployed by equipment vendors The role of RTCP XR RTCP XR (RFC3611) 1. Provides a useful set of metrics for VoIP performance monitoring and diagnosis 2. Supports both real time monitoring and post-analysis 3. Extracts signal level, noise level and echo level from DSP software in the endpoint 4. Exchanges info on endpoint delay and echo to allow remote endpoint to assess echo impact 5. Provides midstream probes/ analyzers access to analog metrics if secure RTP is used 6. Goes through firewalls……… RFC3611 - RTCP XR Loss Rate Discard Rate Burst Density Gap Density Burst Duration (mS) Gap Duration (mS) Round Trip Delay (mS) End System Delay (mS) Signal level RERL Noise Level Gmin R Factor Ext R MOS-LQ MOS-CQ Rx Config - Jitter Buffer Nominal Jitter Buffer Max Jitter Buffer Abs Max SIP Service Quality Reporting Event PUBLISH sip:collector@example.com SIP/2.0 Via: SIP/2.0/UDP pc22.example.com;branch=z9hG4bK3343d7 ……… Content-Type: application/rtcpxr Content-Length: ... VQSessionReport LocalMetrics: TimeStamps=START:10012004.18.23.43 STOP:10012004.18.26.02 SessionDesc=PT:0 PD:G.711 SR:8000 FD:20 FPP:2 PLC:3 SSUP:on CallID=1890463548@alice.uac.chicago.com ……… Signal=SL:2 NL:10 RERL:14 QualityEst=RLQ:90 RCQ:85 EXTR:90 MOSLQ:3.4 MOSCQ:3.3 QoEEstAlg:VQMonv2.1 DialogID:38419823470834;to-tag=8472761;from-tag=9123dh311 RTCP XR MIB Session table History table Basic parameters Call quality metrics Alerting Passive Monitoring Framework VQ VQ IP Phone IP Phones VQ VQ VQ VQ IP VPN VQ VQ Branch Office VQ Teleworker VQ SNMP VQ Gateway SIP QoS Report NMS VQ IP Phone What to Implement/ Ask For • Embedded monitoring functionality in IP Phones and Gateways (e.g. VQmon) • RTCP XR for mid-call data exchange between endpoints • SIP Service Quality Events for reporting end of call quality • RTCP XR MIB for SNMP support Summary • • • • • Problems affecting VoIP performance Tools for Measuring and Diagnosing Problems Protocols for Reporting QoS Performance Management Architecture What to ask for/ integrate?