VoIP Testing

advertisement
VoIP Testing—A How-to Session for
Performance and Functional Test
Methodologies Chris Bajorek, Director, CT Labs
Before We Start
• Every year we consult with many companies, helping
them to perform many different types of VoIPoriented tests
• This provides a unique industry perspective on the
market readiness of a wide range of VoIP products
• I’m pleased to have this opportunity to share our test
experiences with you today
3
VoIP Products by Market Area
• Residential (Voice over Broadband)
– Analog terminal adapters, VoIP softphones,
residential routers
• Enterprise
– IP PBXs, IP Contact Centers, VoIP phones &
softphones, firewalls/ALGs, intrusion prevention
devices, media servers (conferencing, voice mail, IVR)
• Next-Gen Network Carriers and Service Providers
– Session border controllers, softswitches, media servers,
proxys, media gateways, VQ enhancement processors
4
Building VoIP Networks:
IMS is here and it needs testing.
• Key elements of IMS:
– Enables innovative new applications
– High levels of network complexity
– Modules from multiple vendors
must peacefully coexist
– High rate of carrier adoption
– Global deployments
– Standards based
– Exploits strengths of IP+SIP
5
IMS Basics: Functions by Layer
• Services / Application Layer
– Application servers,
Media servers
• Control / Switching Layer
– HSS (Home Subscriber Server)
– CSCF (Call Session Control
Function)
– BGCF (Breakout Gateway
Control Function)
– MGCF (Media Gateway
Control Function)
– MRFC (Media Resource Function Control)
– etc.
• Transport / Access Layer
6
– IP/MPLS, PSTN/PLMN, Cellular, SONET/SDH, ATM,
Satellite
Risks of Inadequate Testing
• From the CT Labs VoIP project files:
– VoIP terminal adapters that act unreliable and “emulate”
an occasional bad Internet connection
– IP PBX’s that drop calls when subjected to
only certain types of call loads
– VoIP soft clients that distort the caller audio
– High-end enterprise firewalls that grind to
a standstill under certain denial-of-service attacks
– Session border controllers that degrade voice quality at
traffic levels below rated maximums
7
Test
AutomationTest
AutomationReaping
theshould
Benefits
of Shorter
Test
Why you
consider
using
it
Cycles
sooner, not later
8
Test Automation—The Benefits
– Tightly controlled test environment
• All aspects of the test setup can be controlled and coordinated
by testing scripts
– Repeatable results
• Key to resolving issues that arise during testing
• Includes ability to exactly reproduce product settings and test
conditions
– Faster test execution
• Weeks of manual testing can literally be executed in Hours
– Increased accuracy of results reporting
– All of the above resulting in:
• Lower testing costs over product’s lifetime
• Greater product and delivered-service reliability
• Fewer field failures, fewer customer-reported issues
9
Challenges Using Live Callers in Tests
– The exact timing and sequence of caller actions is not
synchronized or repeatable
– Ability to distinguish and describe nuances of results varies
widely from person to person
• i.e. reliability of reported results can be low
– Ability to correlate assessment of voice quality and
anomalies across multiple listeners is typically poor
• Unless you just happen to know how to run ITU-T P.800 MOS
tests
– Call arrival profiles difficult to control when using large
numbers of callers for “load tests”
– In other words, don’t expect more than coarse results
10
Conference test, via automation
Conf #3
Ready
Conf #1
Ready
Conf #2
Ready
Then go thru a defined
sequence of talker /
listener subtests while
under real call loads…
H
2
3
4
H
11
…while
monitoring
actual voice
quality for
each listener
2
3
4
5
6
7
H
2
3
Automation-based VoIP Testing Goals
1. Verify call-handling performance
2. Verify voice quality
–
With a wide variety of caller and noise environments
3. Verify performance under real-world traffic and
network impairment conditions
4. Verify performance under malicious attack
conditions
5. Verify service reliability
–
i.e. Availability of service under extended test run durations
6. Verify interoperability and feature interaction
7. Verify quality of access to enhanced services
–
12
Applications such as voice mail, conferencing, IVR, etc.
Real-world automation testing:
The 3-phase approach
• Phase 1: Test with minimal stress in a sterile
environment
– i.e. no WAN impairments or network traffic, light call loads
– This establishes an important performance baseline
• Phase 2: Test with realistic network traffic and call
load conditions
• Phase 3: Test to rated device call loads
13
“Rules of Thumb” that simply
do not work
• “I tested it with 50 calls and the CPU only went to
25%, so we know the device can scale to 200
calls”
– Not quite. Our experience shows that in fact most VoIP
devices exhibit performance thresholding effects that are not
linear and very hard to predict. In other words, after a certain
load or capacity limit is reached the device can fail
catastrophically.
– If you don’t test to full rated capacity, you are playing
Russian Roulette with your customers.
14
“Rules of Thumb” that simply
do not work
• “We don’t need to test voice quality because
we are OEM’ing the software that does that
part.”
– Dangerous assumption. OEM software typically has
many interface points and configuration options and is
hardly in and of itself a guarantee of performance. The
“glue” code around these objects can still cause voice
quality issues.
15
Emulation of Network Impairments
• Perfectly clean networks are not the real world
• Real networks corrupt the flow of packets in the following
time-varying ways:
– Packet loss (especially burst loss), packet duplication, and
out-of-order packets
– Latency and jitter
– Restricted bandwidth
• If you test while inducing these conditions, your product or
service will be the cause of far fewer post-deployment issues
• You can perform both static and dynamic emulation of
impairment conditions
– Both have value depending on nature of the VoIP device
– e.g. IP phone that renegotiates codec type or codec mode
when network degrades in mid-call
16
Emulation of Network Impairments
DUT
DUT
Firewall / SBC
Network
Emulator
TDM Call
Generator
VoIP Call
Generator
DUT
17
The Network Emulator can be
controlled via test script for
dynamic WAN condition changes
Adding “Internet Mix” Network Traffic
• The goal: see the DUT’s impact on VoIP calls when
subjected to network traffic at rated capacity
• Product examples:
– Firewalls, intrusion prevention devices, IP phones with
integrated switch ports, session border controllers, etc
• What we do: Generate real session-based “Internet
Mix” traffic and measure throughput performance of
VoIP calls and IMIX traffic
– e.g. http, ftp, P2P, SMTP, POP3, etc
– Open source tool: “D-ITG” http://www.grid.unina.it/software/ITG/
– Notable vendor: Shenick (www.shenick.com)
18
Voice Quality AssessmentVoice and Video Quality AssessmentAutomated Testing Techniques
Automated Testing Techniques
19
Voice Quality Test Techniques
• Automated VQ measurement techniques are designed to
estimate the way humans perceive voice quality
– MOS live listener tests done per ITU-T P.800
• Active versus Passive VQ monitoring
– Passive E-Model via packet inspection
– Active end-to-end VQ measurement “to the audio wires”
– Both techniques have their benefits
1101001
Passive E-model VQ
20
Active end-to-end PESQ VQ
Active vs Passive VQ Testing
• Active voice quality testing
– Involves evaluation of “received” audio signals as compared
to known references
• i.e. you drive real 2-way calls through the VoIP network
– PESQ P.862 (2001)
• High correlation with standard MOS-LQ subjective tests
– Benefits: More accurate, uses mature standards (PESQ) for
automated quality assessment
– Negatives: Consumes VoIP network resources
21
Active vs Passive VQ Testing
• Passive voice quality testing
– Involves passive evaluation of call-based packet flows
• ITU-T G.107 E-Model
• Can return estimated MOS-LQ and MOS-CQ scores
(Listening versus Conversational)
– Benefits: Can be embedded into products and test
equipment with relatively low resource footprint
– Negatives: Ignores (or models) VoIP endpoint-specific
behaviors to network conditions. Vendor implementations
can vary.
22
How PESQ works
• Computes a voice quality score by comparing degraded
received audio with a “reference” speech prompt
– Reference prompts are actual speech clips played during an
active test call
– Quality scores relate only to the time during the test call when
the reference prompts were played with far-end audio being
captured
8 second reference prompt
60 second test call
– The calculation is not just comparing the reference and
degraded waveforms, it is using a human perceptual model to
ultimately compute a quality score (1=bad to 4.5=excellent)
23
What PESQ VQ Testing is
designed for
 PESQ is a way to quickly and cost-effectively
estimate the effects of one-way speech distortion and
noise on speech quality
 PESQ is “endpoint-agnostic” – can be used for VoIPto-VoIP, VoIP-to-PSTN calls, etc.
• Strengths
– Provides excellent estimate of voice quality
– Tests can be performed quickly
– Tests are very repeatable
24
Passive versus Active VQ:
A Real Example
From actual
CT Labs project
•
In this example, the
phone had quality
issues that the
passive test did not
see
•
Being aware of the
difference in scoring
techniques is critical
when debugging
reported VQ issues
E-Model
Voice Quality Test of VoIP Phone
Passive versus Active Scores
5.00
4.50
4.00
Speech Quality Score
•
3.50
3.00
2.50
2.00
PESQ
1.50
1.00
0.50
0.00
Varying test conditions
E-Model MOS
True end-end PESQ
Copyright © 2006 CT Labs
25
Video Quality Test Techniques
• Automated Video quality measurement techniques
estimate the way humans perceive picture quality
– Live viewer tests done per ITU-T BT.500
• Three classes of objective video quality algorithms
– Full reference, partial reference, and zero reference
• Full reference techniques
– PSNR (most used), VIM, SSIM. See ITU-T J.144.
– Compute intensive, not useful for real time measurements
– Software suite available at: http://www.compression.ru
• Zero reference techniques
26
– Best suited for in-service monitoring
– Standards activity continues
– Encompasses quality tests for picture, audio, multimedia,
and network’s ability to carry streams.
Load
Loadand
andStress
StressTesting
Testing
27
Load and Stress Testing
• What it is
– Verifying the DUT’s performance at rated call and traffic
loads
– Verify those “theoretical” specs on the data sheet
• How many simultaneous sessions? It’s all relative
– A full load & stress test on a 2-line VoIP terminal adapter will
require 2 simultaneous calls
– A full load & stress test on a carrier-grade session border
controller may support upwards of 150,000 simultaneous
SIP calls with media (or more)
– The key is this: if you want to be assured of acceptable
performance at your spec sheet limits, you cannot linearly
scale a partial load test’s results
28
Load / Stress Testing: Helpful Hints
• Use call rates and call ramp profiles that emulate the
actual call environment e.g. burst, ramp, etc.
• Monitor and log DUT platform resources during test
– CPU, memory, disk I/O, network I/O can all provide clues
as to why a test failed
– Capture a periodic snapshot into logfile for post-test run
analysis: Windows Perfmon, Linux (various utilities)
• Synchronize system clocks on DUT and test
equipment devices before a test run
– Allows failure events to be correlated from logs
29
Load / Stress Testing: Pitfalls
1. Temptation is to do high volumes of “simple” calls
–
–
Problem with this: it will not exercise internal resources in real
world way
Example: Conference bridge load test
•
•
The wrong way: calls with simple 1-dimensional “can you hear
me?” test
The right way: multiple conferences of varying sizes with real
talker-listener exchanges
2. Not running tests long enough
3. Not testing during DUT housekeeping periods
4. Leaving verbose DUT logging enabled can consume
significant resources
30
TestFunctional
AutomationTesting
Setups
31
Functional Testing
• What it is
– Verify that the DUT can execute all features and functions
correctly (positive stimulus/response testing)
– Verify that DUT responds properly to negative stimuli
• Very often ignored, to the detriment of product stability in field
• How many simultaneous sessions to test?
– Depends on device: one or a few as required to verify all
features
– Quick examples of functional
• Application servers: Conferencing
– Verify all host and listener TUI commands and DUT responses
• VoIP endpoint devices: Terminal Adapters (TAs)
– Verify all call features against softswitch/feature server
environments
• Question: Does verifying voice quality belong in a functional
test?
32
Functional Testing: A Few Hints
• Test script synchronization with DUT is key
– DTMF or MF handshaking
• Typically involves “tagging” voice prompts with numeric
sequences
– Speech recognition
– Delays
• Automation-based functional tests allow:
– Much Faster test cycles
• TA functional test plan comparison:
– 150 test cases verified against 4 different softswitch platforms
– Good idea: functional test suite can be turned into a
performance test suite
• If the tests are designed on a flexible call generator platform
• Can mix call traffic from functional and load generator platforms
33
Test
TestAutomation
AutomationSetups
Setups
34
Session Border Controller/Firewall
Automation Setup Goals
• Verify call-handling performance and advertised
specifications at real-world high density VoIP loads
• Verify Voice Quality under different Codec, frame
packing, and other configuration settings
• Verify call-handling performance when subjected to
different call rate profiles e.g. Burst, ramp, etc.
• Verify thru-SBC registration performance under burst
registration conditions
• Verify ability to survive and handle legitimate VoIP call
loads while under various types of DoS “attacks”
• Verify long-term call handling reliability
35
SBC / Firewall Automation
Test Setup
VoIP Call
Generator
VoIP Call
Generator
+ voice quality opt.
SIP Attack
Generator
Network
Emulator
+ voice quality opt.
Firewall / SBC
DUT
SIP Proxy
Registration
Generator
Protected Network
Unprotected Network
36
Terminal Adapter (TA)
Real-World Network Model
VoIP
PSTN/Analog
37
Automated Feature Test Suite Goals
• Automate as much of the Terminal Adapter
interoperability feature regression test as possible
– i.e. Verify call features of TA devices against core VoIP
service architectures
• Support input configuration files, event and error log
files
• Support multiple TA devices and PSTN access lines
in setup
38
Automation Feature Test Solution
Off-net
PSTN Caller Access
TDM Call
Generator
PSTN / PBX
Gatewa
y
Feature Test
Framework
TA/ Voice Gateway
TA/ Voice Gateway
LAN/
WAN
TA/ Voice Gateway
Router / Edge Device
On-net
Subscriber Access
Softswitch + Media
Server, etc
39
VoIP Call
Generator
VoIP
PSTN/Analog
Automation Feature Test
Framework Details
• Supports 140+ feature tests
– Including 2-way calls, 3-way calls, features including
hold/park/transfer, 911/411, voice mail, + voice quality
checking
• Test run results captured in easily analyzed logs
• Custom reports are generated
• Individual test case scripts easily changed
40
Setting up for IMS Tests
• Emulation of IMS devices
in a QA lab setting will be
critical… unless you plan to
purchase, support, and
maintain a wide variety of
third party IMS devices in
your lab, a costly and timeconsuming proposition.
41
IMS Product
Tests
IMS Function
Tests
Application Server
App Server, OSASCS, IM-SSF, SCIM,
CCCF
Gateway Controller,
Call Agents
BGCF, MGCF, I-BCF
Border Elements,
Gateways
A-BGF, T-MGF, SGF,
IWF, I-BGF
Subscriber
Databases
HSS, SLF
Media Servers
MRFC, MRFP
Policy and
Resource Function
PDF, NASS, RACS
Session Controllers
S-CSCF, I-CSCF,
P-CSCF
Setting up for IMS Tests
NGN Device
Network
Analyzer
IMS Device
Emulation
(Vo)IP
Endpoint
Emulation
WAN
Emulator
IMS Device
Under Test
SS7/TDM
Network
IP
Network
VoIP Client
TDM
Endpoint
Emulation
Mixed
Endpoint
Emulation
TDM User
42
VoIP Security Testing—
Issues to consider
43
VoIP Vulnerabilities/Threats
• The bad news: VoIP systems are vulnerable
– Platforms are vulnerable
– VoIP-specific attacks are becoming more common
• The good news: The threat is still developing
– VoIP handsets are still in minority “out there”
– Vast majority of VoIP is company-internal
Courtesy: Mark Collier, CTO SecureLogix
• VoIP networks share the same vulnerabilities that
plague data networks, PLUS some specific
additional threats
44
VoIP Product Vulnerabilities
Voice Applications
Protocol Attacks
SIP Floods
RTP Floods
Major area of
focus at CT Labs
VoIP
Protocols
Services
(Database,
Web Server)
Network Stack(s)
(IP, UDP, TCP, RTP,…)
Telephony
Devices
Network
Devices
Servers
Physical infrastructure
(power, wiring)
45
Toll Fraud
SPIT
Slammer worm
SQL attacks
SYN Floods,
etc. (many…)
OS attacks
(Windows worms,
Viruses)
Physical
Hacking
DoS Attack Testing
• Generate SIP-specific attacks (send “fuzzed” and other types of
SIP protocol packet floods) while also sending legitimate SIP
calls
• Measure call performance (dropped, blocked, delayed calls),
voice quality with security measures in place
• Test calls sent with media (real speech) to verify true voice
quality via PESQ while under attack
A
B
Received
Prompt
Reference
Prompt
VoIP Call
Generator
RTP
Injected
SIP / RTP
attacks
46
Router /
switch
Device(s) Under Test
Router /
switch
RTP
VoIP Call
Generator
SIP-Specific Attacks to Launch
– i.e. in addition to lower-layer well known DoS attacks
• Blast packets from these scenarios at up to line rates:
– Malformed and Torture Test floods
• Using SIP packets from open source Protos test suite
– INVITE, REGISTER, and Response floods
– Spoofed variations for above
• i.e. Spoofing the IP address and port of legitimate devices,
or spoofing the Via or AoR of legitimate users
– RTP attacks
• Rogue / Random RTP Fraud and Floods
47
SIP-Specific Attacks:
What to expect
• Run each variation for 10-15 minutes
– In the presence of varying levels of legitimate VoIP traffic
– Monitoring DUT resources (CPU, memory), call completion
rates, and voice quality of completed calls
• It’s typical to see threshold failure effects
– i.e. above certain levels of legitimate SIP calls + attack
packets, service takes a major hit. Below that threshold
normal calls may be handled fine.
• DUT often shows weakness within seconds of test
start
• DUT may exhibit hard or soft crashes
• Voice quality may show early warning of
catastrophic failure
48
Good resources on
VoIP Security
• NIST – National Institute of Standards and Technology
– Publication 800-58: “Security Considerations for VoIP Systems” (99 pgs, free)
– http://csrc.nist.gov/publications/nistpubs
• VoIPSA – Voice over IP Security Alliance
– Promoting education & awareness, research, testing methodologies & tools
– Extensive membership: vendors, VoIP providers, researchers, security
vendors, test tool vendors
– www.voipsa.org
• PROTOS group - University of Oulu in Finland
– Using protocol fuzzing to discover a wide variety of DoS and
buffer overflow vulnerabilities
– Have exposed HTTP, LDAP, SNMP, WAP, and VoIP vulnerabilities
– www.ee.oulu.fi/research/ouspg/protos/index.html
• Mu Security
– Manufacturers of a powerful protocol mutation tester (Mu-4000)
– www.musecurity.com
49
Feel free to call
if you have any questions
Chris Bajorek
chris@ct-labs.com
916-577-2110 (direct line)
Download