tutorial below

advertisement
Understanding VoIP
Dr. Jonathan Rosenberg
Chief Technology Strategist
Skype
What is this course about?



Getting “under the hood” and understanding
how VoIP works
An exploration of the protocols and
technologies behind VoIP
Conveying an understanding of the various
problems that need to be solved for VoIP to
work
What this course is not about



A general introduction to telephony
A detailed cookbook or deployment guide to
VoIP
A product survey of VoIP and IP telephony
products

In particular, Cisco or Skype products are not
discussed except in passing
Ground Rules





Ask Questions ANY TIME!
I will be bored if this is a one way
conversation
No question is too stupid
Laughing or mocking anyones questions is
unacceptable
Please ask off-the-wall or exploratory
questions – there is a lot that is not in
here!
Agenda







Breaking up the problem
Voice and Video coding
Voice and Video Transport
Quality of Service
Signaling
Security
NAT Traversal
Non-Agenda






Programming APIs
Emergency Services, Lawful Intercept
Numbering, Routing, Naming (ENUM, TRIP)
PSTN Interworking
Billing, Provisioning, OAM
Conferencing, IVR, Applications
Breaking Up the Problem
Directories
Databases
Accounting
Billing
LDAP,
ENUM
IP
RADIUS
DIAMETER
Application
Server
SIP
Signaling
Servers
Presence
Servers
Media
Servers
OAM
SIP, H.323,
MGCP,H.248 IP Network
Endpoint
SIMPLE,
XMPP
RTP
Endpoint
Voice Coding
Voice Endpoint Model
No Speech
+
Hybrid
DTMF/
Tone
Detection
Nonlinear
Processing
Echo
Canceller
2-wire interface
Packetizer
Speech
Decoding
Unpacker
Silence
Detection
Loss
Admin
DTMF/
Tone
Generation
Speech
Encoding
Comfort
Noise
Generation
Speech
Codecs

Waveform codecs:



Directly encode speech in an efficient way by
exploiting temporal and/or spectral
characteristics
Attempt to reproduce input signal’s waveform
by minimizing error between input and coded
signals
Source codecs / vocoders:

Estimate and efficiently encode a parametric
representation of speech
CELP

Minimizes perceptually
weighted error




similar to waveform coders
Short-term predictor is LP
(vocal tract) filter
Excitation is obtained
from codebook and longterm pitch predictor
Closed-loop search is
MIPS intensive
Codec Comparison
Codec
Sampling
Bitrate
Latency
Comments
G.711
8 Khz
64 kbps
125 us
PSTN Codec
G.729
8 Khz
8 kbps
10ms
CS-ACELP
G.723.1
8 Khz
5.3/6.3 kbps
37.5ms
AMR
8 Khz
4.75 – 12
kbps
25ms
GSM codec
G.722.1
16 Khz
24/32kbps
40ms
Polycom
SIREN
AMR-WB
16 Khz
6.6-23.85
kbps
25ms
GSM
Wideband –
encumbered
SILK
8, 12, 16, 24
Khz (SWB)
6-40kbps
25ms
Skype codec
Listen at: http://www.voiceage.com/listeningroom.php
Echo Cancellation




ERL: Echo Return
Loss (dB)
ERLE: Echo Return
Loss Enhancement
Double-talk
Convergence time
Analog
+ ERLE Non-Linear
Processor
Reflection
ERL
2-4-wire
Hybrid
Echo
Path
Estimati
on
Packet
Network
Echo Canceller
Digital
This echo canceller cancels
‘local’ echoes from the hybrid
reflection
Echo Canceller Specifics




The voice echo path is like an electrical circuit
 If a ‘break’ (cancellation) is made anywhere in the ‘circuit’, you will
eliminate the echo
 The easiest place to make the break is with a canceller ‘looking
into’ the local analog/digital telephony network, NOT the packet
network (which has much longer and variable delays)
The echo canceller at the other end of the call eliminates the
echoes that YOU hear, and vice versa
Echo canceller coverage (e.g. 32 ms) is the maximum length of
echo impulse response that can be cancelled from the local
analog/digital network (the packet network delay does not matter)
The non-linear processor is used to ‘clean-up’ any residual echo
left over from the canceller
Voice Activity Detection
Speech Magnitude (dB)
Speech Detected
Speech Detected
Hang-Over
Hang-Over
Typically fixed
at 200 ms
Sentence 1
Signal-toNoise
Threshold
Sentence 2
Noise Floor
time
Front-end
Speech Clipping
Front-end
Speech Clipping
Comfort Noise Generation

Silence isn’t golden…it’s annoying


Simple techniques:



When speech stops…what do you play to the
listener?
Play white/pink noise
Replay last receiver packet over and over
Fancier technique:



Transmitter measures local “noise environment”
Transmitter sends special “comfort noise” packet
as last packet before silence
Receiver generates noise based CN packet.
Voice Quality:
Mean Opinion Scores
Source
Channel Simulation
Impairment
Codec ‘X’
1
2
3
4
5
1
2
3
4
5
“Nowadays, a chicken leg is
a rare dish”
Rating
Speech Quality
Distortion
5
Excellent
Imperceptible
4
Good
Just perceptible but not annoying
3
Fair
Perceptible and slightly annoying
2
Poor
Annoying but not objectionable
1
Unsatisfactory
Very annoying and objectionable
MOS of 4.0 = Toll Quality
Clear Channel MOS’s
5
Mean
Opinion
Score
4.1
4
3.8
3.9
3.9
3.4
3
2
1
G.711
(64 kbit/s
PCM)
G.726
(32 kbit/s
ADPCM)
G.723.1
(6.4 kbit/s
MP- MLQ)
G.729
(8 kbit/s
CSACELP)
IS-54
(8 kbit/s
NA Dig
Cellular)
MOS Under Varying Conditions
G.729
Avg Speech Level (-20 dBmO)
Low Input Level (-30 dBmO)
2 Tandem codings
3 Tandem codings
1% Frame Erasure Rate
5% Bit Error Rate
5% FER
10% FER
20% FER
3.85
3.54
3.46
2.68
3.24
3.02
Video Coding
Key Terms
Term
Description
Frame
An individual picture in a sequence that makes up the
video
Frame Rate
The number of frames per second in video. 30 is
excellent (TV quality)
Resolution
The number of horizontal and vertical pixels.
VGA=640x480.
Interlacing
A mechanism for transmitting video by splitting a frame
into two fields, one field representing the odd lines, and
one the even field. This is the “i” in 1080i
Progressive
As opposed to interlaced, a method for transmitting video
by sending each frame as a whole.
HD
High Def resolutions – 720p is 1280x720 with 60fps.
1080i is 1920x1080 at 30fps
Key Concept: Macroblocks
Rectangular block in
an image which is
a basic unit of
compression. Typically
16x16 pixels.
Key Concept: Inter-Frame Prediction
Encode
Predict information in the current frame by looking at previous frames,
possibly taking into account motion.
Key Concept: Discrete Cosine
Transform (DCT)
Increasing vertical frequencies
Increasing horizontal frequencies
A technique for representing a
macroblock by its component
frequencies. Discarding the higher
frequencies throws away the finer
details without losing the core image.
Video Encoder Block Diagram
Key Codec Comparisons
Codec Timeline
Applications
H.261
1990
ISDN at multiples of 64kbps
H.263
1996
Early Flash using Sorenson Spark implementation.
Original RealVideo codec. Required in IMS.
H.264
–AVC
2003
Youtube, iTunes, Blu-ray; most modern video
conferencing. The current primary video codec for
real-time. Typical VGA 15fps bitrate = 500kbps
H.264SVC
2007
“Layered” video that provides improved quality and
resilience; ideal for multiparty video conferencing.
VP7
2005
On2 Technologies codec; Skype, successor to H263
in Flash
Voice and Video Transport:
RTP
RTP: What is it?


Real Time Transport Protocol
RFC 3550




product of avt working group
1996 proposed standard –
RFC1889
2004 full standard
What does it do




e2e transport of real time media
optimized for multicast
provides sequencing, timing,
framing, loss detection
provides feedback on reception
quality

What does it do (cont)



provides information on
group members
provides data to correlate
audio and video and
other media
Works with any codec


need payload format for
each codec
Flexible
RTP: What isn’t it?

Doesn’t guarantee quality of
service




doesn’t reserve network
resources
doesn’t guarantee no loss or
bounded delay
can work with QoS protocols
(RSVP)
Doesn’t provide signaling

other protocols must be used
to set up RTP (like SIP or
H.323)

Not a specific protocol
type



Does not run directly
ontop of IP
Runs ontop of UDP
No fixed port number
RTP Stack
RTP
RTCP
UDP
IP
Big Picture: RTP, SDP and SIP
C=IN IP4 123.1.2.3
m=audio RTP/AVP 1122 0 1
m=video RTP/AVP 1130 98
a=rtpmap:98 h263
SIP w/ SDP
Proxy
Proxy
End
End
User
IP Network
User
RTP
RTP Components: Data + Control

Data aka RTP



very confusing
Usually on an even UDP
port (NATs change this –
later)
Provides






sequencing
timing
framing
content labeling
User identification


Control = Real Time
Control Protocol (RTCP)
Same address as data,
but one higher port
usually
Provides




reception quality
sender statistics
participant information
(multicast)
synchronization
information
Real Time Data Transport

Originator breaks stream into
packets (segmentation)



application layer framing
(ALF)!!!
RTP Source
Packets sent; network may
lose, delay, reorder packets
Must, at receiver:





reorder
recover
resegment
rescynchronize
clock synchronization!
RTP
Packets
RTP Sink
Transport System

Source




Digitize Audio from mike
Silence Suppression
Echo cancellation
Compress Audio





G.711: 64 kbps
G.729: 8 kbps
G.723.1: 5.3/6.3 kbps
Packetize Audio in RTP
Send

Sink








Receive packets
Un-packetize
decompress
comfort noise
generation
reorder
recover loss
jitter buffer
A/D conversion to
speakers
Jitter Buffer





Packets delayed
differently
Must play them out
periodically
pkts
Packets may arrive after
designated playout time
-> loss
Insert extra delay to
compensate
May need to adapt this
amount
time
RTP Packet Header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X| CC
|M|
PT
|
sequence number
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
timestamp
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
synchronization source (SSRC) identifier
|
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|
contributing source (CSRC) identifiers
|
|
....
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
RTP Header Fields





Version: 2
P: indicates padding (for
encryption)
X: extension bit
CSRC count: for mixers
(later)
M: Marker Bit: indicates
framing


audio codecs: first packet
in talkspurt
video: last packet in frame

Payload Type: indicates
encoding


in RTP packet allows
changes per-packet
Useful for:







adaptation
DTMF codec
silence codecs
SN: defines ordering of
packets
Timestamp: when packet
was generated
SSRC: identifier
CSRC: list of mixed users
RTP Timestamp

Tick units are
dependent on codec





For speech: 125
microseconds (standard
8 khz sampling rate)
For video: 90 KhZ
For audio: 44.1 KhZ (CD
rate)
Gaps in TS, but not in
SN mean silence
Initial value random for
security

Video



Timestamp represents
time at beginning of
frame
Many packets may
have same timestamp
Speech


Time per packet may
vary
Depends on
packetization: 20100ms typical
Payload Formats


Each codec needs a way to
be encapsulated in RTP
RFC3550 defines
mechanisms for many
common codecs



G.711, G.729, G.723.1,
G.722, etc.
Some simple video
More complex codecs have
their own payload format
documents


MPEG
H.263 and H.261

Payload format defines


How to break frame into
packets
extra fields needed below
main RTP header
Advanced Topics

DTMF and Tones




RFC 2833
Special codecs for
encoding touch tones
(DTMF) and other
signals
Can send either the
waveform (frequency,
amplitude)
Or the actual signal (#,
8, 0)

Compressed RTP




RFC 2508
For dialup links
Don’t send header, just
send index
Far side uses index to
retrieve header, and
then increments certain
fields
Quality of Service
Quality of Service
The problem we are trying to solve is to give
“better” service to some at the expense of giving
worse service to to others — QoS fantasies to the
contrary, it’s a zero sum game
- Van Jacobson
Quality of Service
So, what’s the problem?
Us ability of V oice Circuit as a Function of End-to-End Delay
Toll
Quality
1.0
Satellite
Zone
CB
Zone
Fax Relay, Broadcast
Private Network
VoFR & VoIP
Technology
0.5
Early I-Phone
Technologyy
Improving I-Phone
means:
• Lower PC Delay
• Lower Network Latency
• Tighten Network Jitter
Time (msec)
800
700
600
500
400
300
200
100
0.0
0
Utility

Delay Budget











Device sample capture
Encode delay (algorithmic delay + processing delay)
Packetization/framing
Move to output queue/queueing delay
Access (up) link transmission
Backbone network transmission
Access (down) link transmission
Input queue to application
Jitter buffer
Decode processing delay
Device playout delay
Some Techniques to Improve “Network
QoS”






RED — Random Early Drop (or “Detect”)
WFQ — Weighed Fair Queuing
Intserv/RSVP — ReSerVation Protocol
IP Precedence  DiffServ
CRTP — Compressed Realtime
Transport
Protocol
MCML — Multi-Class Multi-Link PPP
Random Early Detect (RED)
this is Basic Hygiene!

Objectives




Keep average queue size
low – good for voice
Fairness – bigger streams
punished more
Avoid synchronization
Only works with loss
responsive transport
protocols
Algorithm – probabilistic
dropping of packets
Drop Probability

1
Min
Max
Queue Size
Poll: Will RED Help Voice?
Yes
• Voice not loss responsive
• Mixing voice and data in
same queue bad
• Voice queues usually not
congested
No
Weighted Fair Queueing


Each flow “sees” a
dedicated amount of
bandwidth Bj
A packet arriving at
time t is transmitted at
time t+size/Bj
B
B1
B2
B3
B = B1 + B2 + B3
Whats the Problem??

WFQ is unrealizable
because



Variable packet sizes
Causality
1500
Example:



Link speed 100Kbps
Flow 1: 10Kbps
Flow 2: 90Kbps
8.8ms
Theory
1500
100
128ms
Actual
100
Approximations of WFQ


Many PhDs written with
approximate and
implementable algorithms
Algorithms differ in their
delay bound


How much worse than
perfect WFQ is this?
Delay bounds a function of
bandwidth, number of
queues, other params
Algorithms
SCFQ: Self-Clocked Fair Queueing
WF2Q: Worst-Case Fair
Weighted Fair Queueing
FBFQ: Frame-Based Fair Queueing
PGPS:
DRR:
WFQ Voice Configuration

How to pick allocated bandwidth?

Consider G.711, 30ms framing (74.6Kbps)





If Bi = 74.6kbps, delay is at least 30ms
If Bi = 149.2Kbps, delay at least 15ms
Must set voice queue bandwidth at least 2x actual
voice usage to keep delays down!
Unused bandwidth will go to data
Need an accurate WFQ Implementation
Priority Queueing



Emulates the familiar
“elite airport line”
experience
Voice and data packets
in separate queues
If there is any packets
in voice queue, they
are serviced
Server
Voice
Data
Priority Queueing Considerations





Easy to configure – no bandwidth values
required
Main problem – data starvation
Need to police voice queue
Doesn’t work as well when there is other nonvoice high priority traffic (video)
Head-of-Line Blocking from data queue
Intserv: Integrated Services





Guaranteed Service (RFC 2212)
 Mathematically provable bounds on end-to-end datagram
queuing delay/bandwidth
Controlled Load Service (RFC 2211)
 Approximate QoS from an unloaded network for
delay/bandwidth
Describe traffic with a “TSPEC”
r= token bucket rate
b= token bucket depth
p= peak transmission rate
m= minimum (policed) packet size
M= maximum packet size
Describe endpoints with a « FlowSpec »
 Source/Destination IP addresses, ports, protocol
RSPEC/FSPEC provides the policy to the
queuing/scheduling algorithms
RSVP Design






Signaling distinct from routing (modularity,
deployability, evolvability)
Soft state (robustness, simplicity)
Transparent operation across non-RSVP routers
(deployability)
Support shared and distinct reservations
Applies to unicast & multicast applications
Simplex & receiver-oriented.
RSVP protocol
path
Src

PATH : Source  Destination




resv
Traffic parameters of source
Collects info on network capabilities
Detects current route
RESV: Source  Destination





Dest.
Receiver selected Int-Serv service
Traffic parameters of receiver selected reservation
Follows route detected by PATH
Reservation actually nailed in network
RSVP messages carried over IP

Can also be carried over UDP but few people do that
RSVP: Admission Control
Flow Request
Routing
Routing
Protocol
Routing Database
Switching
Packets In
Reservation
Protocol
Admission
Control
Resource Utilization
Database
Interface 1
Packet Scheduler
Queuing Policy
Database
Packets Out
Route Selection
Interface N
Packet Scheduler
Packets Out
Intserv/RSVP Acceptance
Enthusiasm
Intserv/RSVP will solve
the world’s QoS
Cool thing to say:
“RSVP does not scale”
vBNS RSVP over ATM
transparently transport RSVP
Real
value
RSVP for VoIP in Enterprise
Today
ISP
Today
Enterprise
Time
IP Precedence & Diffserv


“Poor man’s” approach to QoS
Set IP Precedence/DSCP higher on voice packets



Scales better than RSVP –




This puts them in a different queue, resulting in isolation from
best effort traffic
Can be done by endpoint, proxy, or in routers through
heuristics
Keeps QoS control “local”
Pushes work to the edges and boundaries
Can provide bulk QoS by customer or network
No admission control

Too much high-precedence traffic can still swamp the
network
Diffserv Architectural Model

Clouds — regions of relative
homogeneity:





Within a cloud, QoS managed
by local rules
Hard work confined to
boundaries of clouds:



Administrative control
Technology
Bandwidth
Classification
Conditioning/Policing
QoS information exchange
limited to boundaries


Bi-lateral, not multi-lateral
Not necessarily symmetric
Me
Not Me
Also
Not
Me
Far
Away
Diffserv Scalability

Fundamental assumptions:




Group packets explicitly by the “Per-hop
behavior (PHB)” they are to get



Relatively small number of feasible
queuing/scheduling algorithms for high link
speeds
Number of individual flows is large
Many different rules, often policy driven
Queue service
Shaping/policing
Nodes in the middle of a cloud only have
to deal with traffic aggregates
Diffserv Forwarding via PHBs

PHBs map to DSCPs (Diffserv Code
Points)



Values chosen for backward-compatibility with
IPv4 TOS byte including IP Precedence (RFC
2474)
Packets with different DSCPs may be reordered
Forwarding resources partitioned by
PHB/DSCP
Assured Forwarding PHB
(AF*)


Four independent classes
Within each class, three levels of drop
precedence


A congested AF node discards packets with
higher drop preference first
Packets with lowest drop preference must be
within the subscribed profile
*RFC2597
Expedited Forwarding PHB
(EF*)



Targeted at VoIP and “virtual leased lines”
Roughly equivalent to priority queuing,
with a safety measure to prevent
starvation
Implications:

No more than 50% of a link can be EF


see RFC3247,3248 for interesting mathematical
analyses
Worst case jitter at each hop is max of:


*RFC3246
number of EF microflows in the aggregate, or
a single MTU packet of some other aggregate
Diffserv Traffic Conditioner
Meter
Shaped
Packets




Classifier
Marker
Shaper /
Dropper
Dropped
Classifier: selects a packet in a traffic stream based on the
content of some portion of the packet header
Meter: checks compliance to traffic parameters (e.g. Token
Bucket) and passes result to marker and shaper/dropper to
trigger particular action for in/out-of-profile packets
Marker: writes/rewrites DSCP
Shaper: delay some packets for them to be compliant with
the profile
Diffserv Acceptance
Enthusiasm
Diffserv will solve
the world’s QoS
Diffserv Engineering?
Diffserv SLA ?
Internet e2e SLA?
Real
value
Inter-SP Diffserv and end-to-end
Internet QoS need further
standardisation and commercial
arrangements
Diffserv Design & Deployment
intra Domain
today
Time
Mixing Intserv & Diffserv:
Aggregation


Host signals with RSVP
Edge or transit domains


Edge
In transit domains



Aggregate reservations mark
packets using DSCP
Blindly transfer end to end
reservations using another IP
Protocol Number - change at
edge
Routers detect egress of
reservation (deaggregation) on
transfer from an interior or
aggregator interface to an
exterior (deaggregating)
interface
Aggregate reservation size
varies with load
Backbone
Edge
RTP Compression


20ms @ 8kbit/s yields
20 byte payload
IP header 20; UDP
header 8; RTP header
12



Twice size of
payload!
Header compression:
40 bytes to 2-4 most
of the time
Hop-by-hop: use only
on the slow links
Sample Delay Budget
(G.711 - 64kbps)
Delay Source (G.711)
Budget (ms)
Device Sample Capture
.1
Encode Delay (Algorithmic Delay + Processing Delay)
2.5
Packetization/Framing
10
Move to Output Queue/Queue Delay
.5
Access (up) Link Transmission
30
Backbone Network Transmission
5
Access (down) Link Transmission
10
Input Queue to Application
.5
Jitter Buffer
35
Decode Processing Delay
.5
Device Playout Delay
.5
Total
94.6
Sample Delay Budget
(G.729 - 8kbps)
Delay Source (G.729)
Budget (ms)
Device Sample Capture
.1
Encode Delay (Algorithmic Delay + Processing Delay)
17.5
Packetization/Framing
20
Move to Output Queue/Queue Delay
.5
Access (up) Link Transmission
30
Backbone Network Transmission
5
Access (down) Link Transmission
10
Input Queue to Application
.5
Jitter Buffer
35
Decode Processing Delay
5
Device Playout Delay
.5
Total
119.1
Signaling: SIP
SIP is one of Many

ITU H.323




MGCP



Originally for video conferencing
The first standard protocol for VoIP
Still in wide usage, but negative growth
Dumb phones controlled by smart server
“Softswitch” – PSTN emulation view
Megaco/H.248

Standard version of MGCP
Core SIP Functions






Establishment of peer to peer sessions
Management of peer to peer sessions
 Keepalives
 Graceful and Non-graceful termination
Rendezvous
 Forking
 Search
Policy Based Routing
Loose Routing
Mobility
 Limited terminal mobility
 Device Mobility
Core SIP Functions






Secure User Identification
Exchange and Management of Media
Session data
User registration
Capability declaration
Capability query
Reliability
SIP Technology Community
RTP
SDP
ROHC
STUN
O/A
3264
Events
3265
SIMPLE
SIP
RFC3261
MIDCOM
DNS
3263
ENUM
Rel
3262
SigComp
SIP Extensions
SIP Design Philosophy

Patterned after other
Successful Internet
Standards






HTTP
Don’t Reinvent the PSTN
General Purpose
Functionality
Do Not Dictate
Architectures or Services
It needs to work on any IP
Network
Leverage the Best of
Existing Standards





URLs
MIME
RFC822
Scalability
Push state to the edge
Basic Design




Request/Response Protocol
SIP is a Peer Protocol – all
entities send requests and
receive requests
Modelled after HTTP
Each request invokes
method


Main purpose of request
Messages contain bodies
request
Agent
Agent
response
Transactions

Fundamental unit of
messaging exchange






Request
Zero or more provisional
responses
Usually one final response
Maybe ACK
All signaling composed of
independent transactions
Identified by Cseq


Sequence number
Method tag
INVITE
100
200
Cseq: 1
ACK
First Transaction
BYE
200
Second Transaction
Cseq: 2
Session Independence


Body of SIP message
used to establish call
describes the session
Session could be




Audio
Video
Game
SIP operation is
independent of type of
session

SIP Bodies are MIME
objects



MIME = Multipurpose
Internet Mail Extensions
Mechanisms for
describing and carrying
opaque content
Used with HTTP and
email
Protocol Components

User Agent







Proxy
 SIP server responsible for
End systems
relaying and processing
Hard and soft phones
requests between user
agents
PSTN Gateways
 Main job: where to send
Phone Adaptors
request next?
Media Servers
 Back-to-Back User Agent
(B2BUA)
Anything that
originates or
 SIP server that terminates
and re-originates SIP
terminates SIP calls

SBCs, Call Agents, etc.
SIP Addressing


SIP addresses are URL’s
URL contains several
components







Scheme (sip)
Username
Hostname
Optional port
Parameters
Headers and Body
SIP allows any URI type




tel URIs
http URLs for redirects
mailto URLs
leverage vast URI
infrastructure
sip:jdrosen@cisco.com:5061;
user=host?Subject=foo
The SIP Trapezoid
b.com
a.com
SIP
RTP
SIP Methods


INVITE





BYE


Invites a participant to a
session
idempotent - reINVITEs for
session modification
Ends a client’s
participation in a session
CANCEL

Terminates a search
OPTIONS
ACK


Queries a participant
about their media
capabilities, and finds
them, but doesn’t invite
For reliability and call
acceptance
REGISTER

Informs a SIP server about
the location of a user
SIP Architecture
sp.com
Request
Response
Media
2
Corp DB
3
a.com
14089023077@b.com
5
4
b.com
6
1
7
11
12
10
13
8
14
9
SIP Message Syntax


Many header fields
from http
Payload contains a
media description

SDP - Session
Description Protocol
INVITE sip:+17327654321@example.com SIP/2.0
From: J. Rosenberg <sip:+14082321122@example.com>
;tag=76ah
Subject: Conference Call
To: John Smith <sip:+17327654321@example.com>
Via: SIP/2.0/UDP 1.2.3.4;branch=z9hG4bK74bf9
Call-ID: 1997234505.56.78@1.2.3.4
Content-type: application/sdp
CSeq: 4711 INVITE
Content-Length: 187
v=0
o=user1 53655765 2353687637 IN IP4 1.2.3.4
s=Sales
c=IN IP4 1.2.3.4
t=0 0
m=audio 3456 RTP/AVP 0
SIP Address Fields

Request-URI



To



Contains address of
next hop server
Rewritten by proxies
based on result of
Location Service
Address of original
called party
Contains optional
display name
From


Address of calling
party
Optional display
name
INVITE sip:+17327654321@example.com SIP/2.0
From: J. Rosenberg <sip:+14082321122@example.com>
;tag=76ah
Subject: Conference Call
To: John Smith <sip:+17327654321@example.com>
Via: SIP/2.0/UDP 1.2.3.4;branch=z9hG4bK74bf9
Call-ID: 1997234505.56.78@1.2.3.4
Content-type: application/sdp
CSeq: 4711 INVITE
Content-Length: 187
v=0
o=user1 53655765 2353687637 IN IP4 1.2.3.4
s=Sales
c=IN IP4 1.2.3.4
t=0 0
m=audio 3456 RTP/AVP 0
SIP Responses

Look much like requests




Headers, bodies

Differ in top line






Status Code

Numeric, 100 - 699
Meant for computer
processing
Protocol behavior based on
100s digit
Other digits give extra info





Text phrase for humans
Can be anything
100 - 199 (1XX): Informational
200 - 299 (2XX): Success
300 - 399 (3XX): Redirection
400 - 499 (4XX): Client Error
500 - 599 (5XX): Server Error
600 - 699 (6XX): Global Failure
Two groups

100 - 199: Provisional


Reason Phrase

Status Code Classes

Not reliable
200 - 699: Final, Definitive
Example


200 OK
180 Ringing
Example SIP Response


Note how only
difference is top line
Rules for generating
responses



Call-ID, To, From, Cseq
are mirrored in
response
Branch parameter
used as transaction
ID
Tag added to To field to
identify dialog
SIP/2.0 200 OK
From: J. Rosenberg <sip:+14082321122@example.com>
;tag=76ah
To: John Smith <sip:+17327654321@example.com>
;tag=112
Via: SIP/2.0/UDP 1.2.3.4;branch=z9hG4bK74bf9
Call-ID: 1997234505.56.78@1.2.3.4
Content-type: application/sdp
CSeq: 4711 INVITE
SIP Transport




SIP Messages over UDP or
TCP/TLS or SCTP
Reliability mechanisms
defined for UDP
UDP More Widely Used



Faster
No connection state
TCP preferred these days


NAT
Larger SIP messages
Reliability mechanisms
depend on SIP request
method



INVITE
anything except INVITE
Reason: optimized for
phone calls
Registrations


REGISTER creates mapping in
server from one URI to another
REGISTER properties




UA location in Contact
Registrar identified in Request
URI
Identifies registered user in To
and From field
Expires header indicates desired
lifetime


REGISTER sip:example.com SIP/2.0
To: sip:89023077@example.com;user=phone
From: sip:89023077@example.com;user=phone
Call-ID: 1997234505.56.78@1.2.3.4
CSeq: 123 REGISTER
Contact: sip:89023077@1.2.3.4
Expires: 3600
Can be different for each
Contact
Registrations are soft-state
sip:89023077@example.com
to
sip:89023077@1.2.3.4
Registration Handling


Registrar is logical
function handling
REGISTER
Registrar steps:





Authenticate
Authorize
Add Binding
Lower expiration
Return all currently
registered UA (can be
more than one)
SIP/2.0 200 OK
To: sip:89023077@example.com;user=phone
From: sip:89023077@example.com;user=phone
Call-ID: 1997234505.56.78@1.2.3.4
CSeq: 123 REGISTER
Contact: sip:89023077@1.2.3.4;expires=3600
Contact: sip:89023077@5.6.7.8;expires=524
Forking

A proxy may have more than one
address for a user






Happens when more than one SIP
URL is registered for a user
Can happen based on static routing
configuration
INVITE
In this case, proxy may fork
89023077@a.com
Forking is when proxy sends
request to more than one proxy at
once
First 200 OK that is received is
forwarded upstream
All other unanswered requests
cancelled
Routing of Subsequent Requests



Initial SIP request sent through
many proxies
No need per se for subsequent
requests to go through proxies
Each proxy can decide whether it
wants to receive subsequent
requests


INVITE
Proxy
Inserts Record-Route header
containing its address
For subsequent requests, users
insert Route header

Proxy
Contains sequence of proxies
(and final user) that should
receive request
BYE
Proxy
UA1
UA2
Setting up the Session


INVITE contains the Session
Description Protocol (SDP)
in the body
SDP conveys the desired
session from the callers
perspective



Session consists of a
number of media streams
Each stream can be audio,
video, text, application, etc.
Also contains information
needed about the session


codecs
addresses and ports

SDP also conveys other
information about session





Time it will take place
Who originated the
session
subject of the session
URL for more information
SDP origins are multicast
sessions on the mbone

Originator of INVITE is not
originator of session
Anatomy of SDP

SDP contains informational
headers






Time of the session
Followed by a sequence of
media streams
Each media stream contains an
m line defining




version (v)
origin(o) - unique ID
information (I)
port
transport
codecs
Media Stream also contains c
line

Address information
v=0
o=user1 53655765 2353687637 IN IP4 128.3.4.5
s=Mbone Audio
i=Discussion of Mbone Engineering Issues
e=mbone@somewhere.com
t=0 0
m=audio 3456 RTP/AVP 0 78
c=IN IP4 1.2.3.4
a=rtpmap:78 G723
m=video 4444 RTP/AVP 86
c=IN IP4 1.2.3.4
a=rtpmap:86 H263
Negotiating the Session


Called party receives SDP offered
by caller
Each stream can be



Accepting involves generating an
SDP listing same stream





accepted
rejected
port number and address of called
party
subset of codecs from SDP in
request
Rejecting indicated by setting port
to zero
Resulting SDP returned in 200 OK
Media can now be exchanged
v=0
o=user2 16255765 8267374637 IN IP4 4.3.2.1
t=0 0
m=audio 3456 RTP/AVP 0
c=IN IP4 4.3.2.1
m=video 0 RTP/AVP 86
c=IN IP4 4.3.2.1
Audio stream accepted, PCMU only.
Video stream rejected
Changing Session Parameters


Once call is started, session can be
modified
Possible changes








Add a stream
Remove a stream
Change codecs
Change address information
Call hold is basically a session
change
Accomplished through a re-INVITE
Same session negotiation as
INVITE, except in middle of call
Rejected re-INVITE - call still
active!
INVITE
200
ACK
INVITE
200
reINVITE
ACK
Hanging Up
INVITE


How to hang up depends on
when and who
After call is set up


Hangup CANCEL
From caller, before call is
accepted




either party sends BYE request
100
200 OK
Accept
200 OK
send CANCEL
BYE is bad since it may not
reach the same set of users that
got INVITE
If call is accepted after CANCEL,
then send BYE
ACK
BYE
200 OK
From callee, before accepted

Reject with 486 Busy Here
C
S
Call Flow for basic call: UA to proxy to UA

Call setup




Call parameter modification



100 trying hop by hop
180 ringing
200 OK acceptance
re-INVITE
Same as initial INVITE,
updated session description
INVITE
100 Trying
180 Ringing
200 OK
100 Trying
180 Ringing
200 OK
ACK
RTP
Termination

INVITE
BYE
BYE method
200 OK
Privacy and Identity



RFC 3325: A Private Extension for Asserted
Identity in Trusted Networks
RFC 3323: A Privacy Mechanism for SIP
RFC 4474: SIP Identity
RFC3325 Asserted Identity
Trust Domain
INVITE
P-Asserted-Identity:
sip:+14089023077@a.com
Authenticates
Caller and verifies
identity. Adds PAID.
RFC3323 – SIP Privacy
Trust Domain
INVITE
P-Asserted-Identity:
sip:+14089023077@a.com
From: anonymous
INVITE
Privacy: id
From: anonymous
Anonymous
Caller
INVITE
From: anonymous
4474: SIP Identity
INVITE
From:
sip:joe@example.com
INVITE
From:
sip:joe@example.com
Identity: asd87f7as66sda8z
Authenticates
Caller and verifies
identity. Signs Request.
Verifies
Signature
Only useful for user@domain addresses!
Transfers and Dialog Movement: REFER
(RFC 3515)
Alice
3
1
REFER
Refer-To: Bob
INVITE Bob
Referred-By: Joe
4
2
Joe
Bob
Third Party Call Control (3pcc): RFC 3725
INVITE
no SDP
3
1
ACK
SDP B
2
200
SDP A
5
4
200
SDP B
6
RTP
INVITE
SDP A
SIP and Quality of Service


RFC 3312: Integration of Resource
Management with SIP
Problem


How to make sure phone doesn’t
ring unless resources are reserved
INVITE w. Preconditions
183 Progress
QoS Reservations
Solution



SIP does not do resource
reservation!
SIP INVITE tells far side not to ring
Both sides do regular QoS
reservations



RSVP
PDP context activation
UPDATE to change state
UPDATE w. Preconditions
180 Ringing
200 OK
ACK
Security
VoIP Security
The only totally secure system I know of is
a rock
-
Tony Lauck, circa 1985
But Even Rocks can be Insecure..
It Had a Great User Interface
But it had a serious security vulnerability…
VoIP Attacks
Attack
Solution
Free Calls aka Toll Fraud
Impersonation
User Authentication
User Authentication,
Secure Caller ID
SIP Encryption, Media
Encryption
Learning Private
Information (calling
patters, PIN codes)
Steal Calls
DoS
SIP Encryption, Media
Encryption
ICE, Others
SIP User Authentication
RTP
We want this SIP server to authenticate
this user
and this SIP server to authenticate
this user
SIP Digest Authentication
Digest= Hash(joe, a7szh1,
myPassword) = z0v88a6
Hi, I’d like
to SIP
REGISTER
401 –
OK, try
again.
Nonce=a7szh1
REGISTER
Nonce=a7szh1
Username=joe
Digest=z0v88a6
Digest= Hash(joe, a7szh1,
myPassword)
OK, done!
Offline Dictionary Attack
Digest= Hash(joe, a7szh1,
alligator) =
REGISTER
Nonce=a7szh1
Username=joe
Digest=z0v88a6
Word
Hash(joe, a7szh1,word)
Aardvark 9z8v77a
Abacus
lkf88z7
Abate
8z77x
…….
Alligator z0v88a6
Digest= Hash(joe, a7szh1,
alligator)
OK, done!
Solution: Digest over TLS
Digest= Hash(joe, a7szh1,
alligator) =
TLS
Armor
This is how
Web Security works!
Digest= Hash(joe, a7szh1,
alligator)
Even Stronger: Mutual TLS for Devices
a.com
TLS
Armor
MAC
8x7a6
Phone has a
Certificate
which identifies
it
SIP Encryption
RTP
We want each SIP hop to be
Encyprted so only the SIP
servers and endpoints see the
signaling.
SIP Encryption: TLS
a.com
RTP
b.com
Mutual TLS
Authentication
Media Encryption



Countermeasure against:
 Eavesdropping
 Barge-in
 Modification
Two useful techniques
 IPSEC
 SRTP
Complications
 Key management
 Legal intercept (who has the keys)
 Firewall and NAT issues (covered later)
Alternative: Secure RTP

Authentication and encryption of RTP and RTCP
packets
V P X
CC M
PT
sequence number
timestamp
synchronization source (SSRC) identifier
contributing sources (CCRC) identifiers
…
RTP extension (optional)
RTP payload
SRTP MKI -- 0 bytes for voice
Authentication tag -- 4 bytes for voice
Encrypted portion
Authenticated portion
SRTP

Advantages
 Provides both Privacy via encryption and authentication via
message integrity check
 Very little bandwidth overhead


Uses modern strong crypto suites: AES counter mode for
encryption and HMAC for message integrity
Disadvantages
 Needs key management
 End-to-end versus hop-by-hop trust tradeoffs in protecting
keys
 Yet another security mechanism to ensure is implemented
and deployed correctly


Does not break header compression schemes like cRTP
For very low-rate channels (e.g. cellular) can sacrifice
authentication and have no packet expansion.
NAT Traversal
What is NAT?

Network Address Translation
(NAT)



Creates address binding
between internal private and
external public address
Modifies IP Addresses/Ports in
Packets
Benefits



Avoids network renumbering on
change of provider
Allows multiplexing of multiple
private addresses into a single
public address ($$ savings)
Maintains privacy of internal
addresses
S: 10.0.1.1:6554
D: 67.22.3.1:80
IP Pkt
Client
S: 1.2.3.4:8877
D: 67.22.3.1:80
IP Pkt
N
N
A
A
TT
Binding Table
Internal
External
10.0.1.1:6554 -> 1.2.3.4:8877
Problem: Getting SIP Through NATs
RTP to 10.0.1.1
N
A
T
INVITE sip:12345@b.com
m=audio 3456 RTP/AVP 0
c=IN IP4 10.0.1.1
Solution Space





Application Layer Gateways (ALGs)
Session Border Controllers (SBC)
Simple Traversal of UDP Through NAT
(STUN)
Traversal Using Relay NAT (TURN)
Interactive Connectivity Establishment (ICE)
Application Layer Gateway
RTP to 10.0.1.1
INVITE sip:12345@b.com
m=audio 3456 RTP/AVP 0
c=IN IP4 10.0.1.1
N
A
T
ALG
INVITE sip:12345@b.com
m=audio 1234 RTP/AVP 0
c=IN IP4 19.1.3.2
NAT also modifies SIP
messages to fix them up!
ALG Benefits and Drawbacks

Drawbacks





Doesn’t work when security
turned on
Hard to diagnose problems
Requires network upgrade to
support new app
Frequent implementation
problems (lack of expertise)
Incentives mismatched

Benefits

No change to clients or
servers
Session Border Controller
9.8.7.6
INVITE sip:12345@b.com
m=audio 3456 RTP/AVP 0
c=IN IP4 10.0.1.1
INVITE sip:12345@b.com
N
A
T
SBC
SBC relays
RTP back to
source
m=audio 3225 RTP/AVP 0
c=IN IP4 9.8.7.6
RTP to
9.8.7.6
SBC Benefits and Drawbacks

Drawbacks



Expensive media relaying
Interferes with some SIP
extensions
Breaks more advanced SIP
security

Benefits



No change to clients or
NATs
Works with basic SIP
security mechanisms
Easier to diagnose
Simple Traversal of UDP Through NAT
(STUN)
9.8.7.6
What is my IP address
and port please?
Its
1.2.3.4:
3472
1.2.3.4
N
A
T
STUN
Server
INVITE sip:12345@b.com
m=audio 3472 RTP/AVP 0
c=IN IP4 1.2.3.4
RTP to
1.2.3.4
STUN Benefits and Drawbacks

Drawbacks

Doesn’t always work

Benefits



No change to servers or
NATs
Works with all SIP
security mechanisms
Can support non-VoIP
apps (e.g., games)
Traversal Using Relay NAT (TURN)
9.8.7.6
Give me an IP address
and port please?
9.8.7.6:
2376
1.2.3.4
TURN
Server
RTP to
1.2.3.4
N
A
T
INVITE sip:12345@b.com
m=audio 2376 RTP/AVP 0
c=IN IP4 9.8.7.6
TURN Benefits and Drawbacks

Drawbacks

Expensive Media Relaying

Benefits



No change to servers or
NATs
Works with all SIP
security mechanisms
Can support non-VoIP
apps (e.g., games)
Interactive Connectivity Establishment
(ICE)




Hybrid of STUN and
TURN
P2P NAT Traversal
Widely Deployed on
Internet
Popular with
Application Providers
ICE Step 1: Allocation



Before Making a Call, the
Client Gathers
Candidates
Each candidate is a
potential address for
receiving media
Three different types of
candidates



Host Candidates
Server Reflexive
Candidates (STUN)
Relayed Candidates
(TURN)
TURN candidates
reside on a TURN
server
STUN
Host
Candidates reside
on the agent itself
TURN
STUN candidates
are addresses residing
on a NAT
NAT
NAT
ICE Step 2: Create Offer


Each candidate is
placed into an
a=candidate attribute
of the offer
Each candidate line
has IP address and
port plus other info
needed for ICE
c=IN IP4 192.0.2.3
t=0 0
m=audio 45664 RTP/AVP 0
a=rtpmap:0 PCMU/8000
a=candidate:1 1 UDP 2130706178 10.0.1.1
8998 typ host
a=candidate:2 1 UDP 1694498562 192.0.2.3
45664 typ srflx raddr 10.0.1.1 rport 8998
ICE Step 3: Send INVITE


Caller sends a SIP
INVITE as normal
No ICE processing by
SIP servers
SIP
Server
INVITE
ICE Step 4: Allocation


Called party does
exactly same
processing as caller
and obtains its
candidates
Recommended to not
yet ring the phone!
STUN
TURN
NAT
NAT
ICE Step 5: Provisional Response



Callee sends a
provisional response
containing its SDP with
candidates
As with INVITE, no
processing by proxies
Phone has still not rung
yet
SIP
Proxy
1xx
ICE Step 6: Verification



Each agent pairs up its
candidates (local) with its
peers (remote) to form
candidate pairs
Each agent sends a
STUN-based ping on
each pair, starting at
highest priority
If a response is received
the check has succeeded
and we know media can
flow on that pair!
TURN
Server
TURN
Server
5
4
NAT
NAT
2
3
NAT
NAT
1
ICE Benefits and Drawbacks

Drawbacks


Requires client changes
Requires other side to
support it

Benefits







Always Works
No change to servers or
NATs
Works with all SIP security
mechanisms
Minimum Media Relaying
Can support non-VoIP apps
(e.g., games)
Built-In Anti-DOS
Eliminates Ghost Rings
That’s it!
Questions?
Glossary
Advanced Intelligent Network
Adaptive PCM
Border Gateway Protocol
Communication Access for Law
Enforcement Act
Constant Bit Rate
CBR
Code Excited Linear Prediction
CELP
CODEC Coder/Decoder
Common Open Policy Service
COPS
Compressed RTP
CRTP
Contributing Source
CSRC
Computer-Telephony
CTI
Integration
Diffserv Code Point
DSCP
Digital Subscriber Line
DSL
Digital Signal Processor
DSP
DTMF Dual Tone Multi-Frequency
Echo Return Loss
ERL
ERL Enchancement
ERLE
Hybrid Fiber/Coax
HFC
AIN
ADPCM
BGP
CALEA
IN
ISDN
ISUP
JTAPI
LDAP
MCML
MGCP
MOS
MPLS
NLP
NTP
PCM
PPP
PHB
PQ
PSTN
Intelligent Network
Integrated Services Digital
Network
ISDN User Part
Java Telephony API
Lightweight Directory Access
Protocol
Multi-class Multi-link PPP
Media Gateway Control
Protocol
Mean Opinion Score
Multi-protocol Label Switching
Non-linear Processing
Network Time Protocol
Pulse Coded Modulation
Point-to-point Protocol
Per-hop Behavior
Priority Queueing
Public Switched Telephony
Network
Glossary (2)
QoS
RED
RTCP
RTP
SCP
SIP
SS7
SSRC
TAPI
TDM
TRIP
TSPEC
WFQ
Quality of Service
Random Early Detect (or Drop)
Realtime Transport Control
Protocol
Realtime Transport Protocol
Service Control Point
Session Invitation Protocol
Signaling System Number 7
Synchronization Source
Telephony API
Time Division Multiplexed
Telephony Routing Information
Protocol
Transmission Specification
Weighted Fair Queueing
Thanks
Enjoy Interop!
to contact me: jdrosen@jdrosen.net
Download