Understanding VoIP Dr. Jonathan Rosenberg Chief Technology Strategist Skype What is this course about? Getting “under the hood” and understanding how VoIP works An exploration of the protocols and technologies behind VoIP Conveying an understanding of the various problems that need to be solved for VoIP to work What this course is not about A general introduction to telephony A detailed cookbook or deployment guide to VoIP A product survey of VoIP and IP telephony products In particular, Cisco or Skype products are not discussed except in passing Ground Rules Ask Questions ANY TIME! I will be bored if this is a one way conversation No question is too stupid Laughing or mocking anyones questions is unacceptable Please ask off-the-wall or exploratory questions – there is a lot that is not in here! Agenda Breaking up the problem Voice and Video coding Voice and Video Transport Quality of Service Signaling Security NAT Traversal Non-Agenda Programming APIs Emergency Services, Lawful Intercept Numbering, Routing, Naming (ENUM, TRIP) PSTN Interworking Billing, Provisioning, OAM Conferencing, IVR, Applications Breaking Up the Problem Directories Databases Accounting Billing LDAP, ENUM IP RADIUS DIAMETER Application Server SIP Signaling Servers Presence Servers Media Servers OAM SIP, H.323, MGCP,H.248 IP Network Endpoint SIMPLE, XMPP RTP Endpoint Voice Coding Voice Endpoint Model No Speech + Hybrid DTMF/ Tone Detection Nonlinear Processing Echo Canceller 2-wire interface Packetizer Speech Decoding Unpacker Silence Detection Loss Admin DTMF/ Tone Generation Speech Encoding Comfort Noise Generation Speech Codecs Waveform codecs: Directly encode speech in an efficient way by exploiting temporal and/or spectral characteristics Attempt to reproduce input signal’s waveform by minimizing error between input and coded signals Source codecs / vocoders: Estimate and efficiently encode a parametric representation of speech CELP Minimizes perceptually weighted error similar to waveform coders Short-term predictor is LP (vocal tract) filter Excitation is obtained from codebook and longterm pitch predictor Closed-loop search is MIPS intensive Codec Comparison Codec Sampling Bitrate Latency Comments G.711 8 Khz 64 kbps 125 us PSTN Codec G.729 8 Khz 8 kbps 10ms CS-ACELP G.723.1 8 Khz 5.3/6.3 kbps 37.5ms AMR 8 Khz 4.75 – 12 kbps 25ms GSM codec G.722.1 16 Khz 24/32kbps 40ms Polycom SIREN AMR-WB 16 Khz 6.6-23.85 kbps 25ms GSM Wideband – encumbered SILK 8, 12, 16, 24 Khz (SWB) 6-40kbps 25ms Skype codec Listen at: http://www.voiceage.com/listeningroom.php Echo Cancellation ERL: Echo Return Loss (dB) ERLE: Echo Return Loss Enhancement Double-talk Convergence time Analog + ERLE Non-Linear Processor Reflection ERL 2-4-wire Hybrid Echo Path Estimati on Packet Network Echo Canceller Digital This echo canceller cancels ‘local’ echoes from the hybrid reflection Echo Canceller Specifics The voice echo path is like an electrical circuit If a ‘break’ (cancellation) is made anywhere in the ‘circuit’, you will eliminate the echo The easiest place to make the break is with a canceller ‘looking into’ the local analog/digital telephony network, NOT the packet network (which has much longer and variable delays) The echo canceller at the other end of the call eliminates the echoes that YOU hear, and vice versa Echo canceller coverage (e.g. 32 ms) is the maximum length of echo impulse response that can be cancelled from the local analog/digital network (the packet network delay does not matter) The non-linear processor is used to ‘clean-up’ any residual echo left over from the canceller Voice Activity Detection Speech Magnitude (dB) Speech Detected Speech Detected Hang-Over Hang-Over Typically fixed at 200 ms Sentence 1 Signal-toNoise Threshold Sentence 2 Noise Floor time Front-end Speech Clipping Front-end Speech Clipping Comfort Noise Generation Silence isn’t golden…it’s annoying Simple techniques: When speech stops…what do you play to the listener? Play white/pink noise Replay last receiver packet over and over Fancier technique: Transmitter measures local “noise environment” Transmitter sends special “comfort noise” packet as last packet before silence Receiver generates noise based CN packet. Voice Quality: Mean Opinion Scores Source Channel Simulation Impairment Codec ‘X’ 1 2 3 4 5 1 2 3 4 5 “Nowadays, a chicken leg is a rare dish” Rating Speech Quality Distortion 5 Excellent Imperceptible 4 Good Just perceptible but not annoying 3 Fair Perceptible and slightly annoying 2 Poor Annoying but not objectionable 1 Unsatisfactory Very annoying and objectionable MOS of 4.0 = Toll Quality Clear Channel MOS’s 5 Mean Opinion Score 4.1 4 3.8 3.9 3.9 3.4 3 2 1 G.711 (64 kbit/s PCM) G.726 (32 kbit/s ADPCM) G.723.1 (6.4 kbit/s MP- MLQ) G.729 (8 kbit/s CSACELP) IS-54 (8 kbit/s NA Dig Cellular) MOS Under Varying Conditions G.729 Avg Speech Level (-20 dBmO) Low Input Level (-30 dBmO) 2 Tandem codings 3 Tandem codings 1% Frame Erasure Rate 5% Bit Error Rate 5% FER 10% FER 20% FER 3.85 3.54 3.46 2.68 3.24 3.02 Video Coding Key Terms Term Description Frame An individual picture in a sequence that makes up the video Frame Rate The number of frames per second in video. 30 is excellent (TV quality) Resolution The number of horizontal and vertical pixels. VGA=640x480. Interlacing A mechanism for transmitting video by splitting a frame into two fields, one field representing the odd lines, and one the even field. This is the “i” in 1080i Progressive As opposed to interlaced, a method for transmitting video by sending each frame as a whole. HD High Def resolutions – 720p is 1280x720 with 60fps. 1080i is 1920x1080 at 30fps Key Concept: Macroblocks Rectangular block in an image which is a basic unit of compression. Typically 16x16 pixels. Key Concept: Inter-Frame Prediction Encode Predict information in the current frame by looking at previous frames, possibly taking into account motion. Key Concept: Discrete Cosine Transform (DCT) Increasing vertical frequencies Increasing horizontal frequencies A technique for representing a macroblock by its component frequencies. Discarding the higher frequencies throws away the finer details without losing the core image. Video Encoder Block Diagram Key Codec Comparisons Codec Timeline Applications H.261 1990 ISDN at multiples of 64kbps H.263 1996 Early Flash using Sorenson Spark implementation. Original RealVideo codec. Required in IMS. H.264 –AVC 2003 Youtube, iTunes, Blu-ray; most modern video conferencing. The current primary video codec for real-time. Typical VGA 15fps bitrate = 500kbps H.264SVC 2007 “Layered” video that provides improved quality and resilience; ideal for multiparty video conferencing. VP7 2005 On2 Technologies codec; Skype, successor to H263 in Flash Voice and Video Transport: RTP RTP: What is it? Real Time Transport Protocol RFC 3550 product of avt working group 1996 proposed standard – RFC1889 2004 full standard What does it do e2e transport of real time media optimized for multicast provides sequencing, timing, framing, loss detection provides feedback on reception quality What does it do (cont) provides information on group members provides data to correlate audio and video and other media Works with any codec need payload format for each codec Flexible RTP: What isn’t it? Doesn’t guarantee quality of service doesn’t reserve network resources doesn’t guarantee no loss or bounded delay can work with QoS protocols (RSVP) Doesn’t provide signaling other protocols must be used to set up RTP (like SIP or H.323) Not a specific protocol type Does not run directly ontop of IP Runs ontop of UDP No fixed port number RTP Stack RTP RTCP UDP IP Big Picture: RTP, SDP and SIP C=IN IP4 123.1.2.3 m=audio RTP/AVP 1122 0 1 m=video RTP/AVP 1130 98 a=rtpmap:98 h263 SIP w/ SDP Proxy Proxy End End User IP Network User RTP RTP Components: Data + Control Data aka RTP very confusing Usually on an even UDP port (NATs change this – later) Provides sequencing timing framing content labeling User identification Control = Real Time Control Protocol (RTCP) Same address as data, but one higher port usually Provides reception quality sender statistics participant information (multicast) synchronization information Real Time Data Transport Originator breaks stream into packets (segmentation) application layer framing (ALF)!!! RTP Source Packets sent; network may lose, delay, reorder packets Must, at receiver: reorder recover resegment rescynchronize clock synchronization! RTP Packets RTP Sink Transport System Source Digitize Audio from mike Silence Suppression Echo cancellation Compress Audio G.711: 64 kbps G.729: 8 kbps G.723.1: 5.3/6.3 kbps Packetize Audio in RTP Send Sink Receive packets Un-packetize decompress comfort noise generation reorder recover loss jitter buffer A/D conversion to speakers Jitter Buffer Packets delayed differently Must play them out periodically pkts Packets may arrive after designated playout time -> loss Insert extra delay to compensate May need to adapt this amount time RTP Packet Header +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | contributing source (CSRC) identifiers | | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ RTP Header Fields Version: 2 P: indicates padding (for encryption) X: extension bit CSRC count: for mixers (later) M: Marker Bit: indicates framing audio codecs: first packet in talkspurt video: last packet in frame Payload Type: indicates encoding in RTP packet allows changes per-packet Useful for: adaptation DTMF codec silence codecs SN: defines ordering of packets Timestamp: when packet was generated SSRC: identifier CSRC: list of mixed users RTP Timestamp Tick units are dependent on codec For speech: 125 microseconds (standard 8 khz sampling rate) For video: 90 KhZ For audio: 44.1 KhZ (CD rate) Gaps in TS, but not in SN mean silence Initial value random for security Video Timestamp represents time at beginning of frame Many packets may have same timestamp Speech Time per packet may vary Depends on packetization: 20100ms typical Payload Formats Each codec needs a way to be encapsulated in RTP RFC3550 defines mechanisms for many common codecs G.711, G.729, G.723.1, G.722, etc. Some simple video More complex codecs have their own payload format documents MPEG H.263 and H.261 Payload format defines How to break frame into packets extra fields needed below main RTP header Advanced Topics DTMF and Tones RFC 2833 Special codecs for encoding touch tones (DTMF) and other signals Can send either the waveform (frequency, amplitude) Or the actual signal (#, 8, 0) Compressed RTP RFC 2508 For dialup links Don’t send header, just send index Far side uses index to retrieve header, and then increments certain fields Quality of Service Quality of Service The problem we are trying to solve is to give “better” service to some at the expense of giving worse service to to others — QoS fantasies to the contrary, it’s a zero sum game - Van Jacobson Quality of Service So, what’s the problem? Us ability of V oice Circuit as a Function of End-to-End Delay Toll Quality 1.0 Satellite Zone CB Zone Fax Relay, Broadcast Private Network VoFR & VoIP Technology 0.5 Early I-Phone Technologyy Improving I-Phone means: • Lower PC Delay • Lower Network Latency • Tighten Network Jitter Time (msec) 800 700 600 500 400 300 200 100 0.0 0 Utility Delay Budget Device sample capture Encode delay (algorithmic delay + processing delay) Packetization/framing Move to output queue/queueing delay Access (up) link transmission Backbone network transmission Access (down) link transmission Input queue to application Jitter buffer Decode processing delay Device playout delay Some Techniques to Improve “Network QoS” RED — Random Early Drop (or “Detect”) WFQ — Weighed Fair Queuing Intserv/RSVP — ReSerVation Protocol IP Precedence DiffServ CRTP — Compressed Realtime Transport Protocol MCML — Multi-Class Multi-Link PPP Random Early Detect (RED) this is Basic Hygiene! Objectives Keep average queue size low – good for voice Fairness – bigger streams punished more Avoid synchronization Only works with loss responsive transport protocols Algorithm – probabilistic dropping of packets Drop Probability 1 Min Max Queue Size Poll: Will RED Help Voice? Yes • Voice not loss responsive • Mixing voice and data in same queue bad • Voice queues usually not congested No Weighted Fair Queueing Each flow “sees” a dedicated amount of bandwidth Bj A packet arriving at time t is transmitted at time t+size/Bj B B1 B2 B3 B = B1 + B2 + B3 Whats the Problem?? WFQ is unrealizable because Variable packet sizes Causality 1500 Example: Link speed 100Kbps Flow 1: 10Kbps Flow 2: 90Kbps 8.8ms Theory 1500 100 128ms Actual 100 Approximations of WFQ Many PhDs written with approximate and implementable algorithms Algorithms differ in their delay bound How much worse than perfect WFQ is this? Delay bounds a function of bandwidth, number of queues, other params Algorithms SCFQ: Self-Clocked Fair Queueing WF2Q: Worst-Case Fair Weighted Fair Queueing FBFQ: Frame-Based Fair Queueing PGPS: DRR: WFQ Voice Configuration How to pick allocated bandwidth? Consider G.711, 30ms framing (74.6Kbps) If Bi = 74.6kbps, delay is at least 30ms If Bi = 149.2Kbps, delay at least 15ms Must set voice queue bandwidth at least 2x actual voice usage to keep delays down! Unused bandwidth will go to data Need an accurate WFQ Implementation Priority Queueing Emulates the familiar “elite airport line” experience Voice and data packets in separate queues If there is any packets in voice queue, they are serviced Server Voice Data Priority Queueing Considerations Easy to configure – no bandwidth values required Main problem – data starvation Need to police voice queue Doesn’t work as well when there is other nonvoice high priority traffic (video) Head-of-Line Blocking from data queue Intserv: Integrated Services Guaranteed Service (RFC 2212) Mathematically provable bounds on end-to-end datagram queuing delay/bandwidth Controlled Load Service (RFC 2211) Approximate QoS from an unloaded network for delay/bandwidth Describe traffic with a “TSPEC” r= token bucket rate b= token bucket depth p= peak transmission rate m= minimum (policed) packet size M= maximum packet size Describe endpoints with a « FlowSpec » Source/Destination IP addresses, ports, protocol RSPEC/FSPEC provides the policy to the queuing/scheduling algorithms RSVP Design Signaling distinct from routing (modularity, deployability, evolvability) Soft state (robustness, simplicity) Transparent operation across non-RSVP routers (deployability) Support shared and distinct reservations Applies to unicast & multicast applications Simplex & receiver-oriented. RSVP protocol path Src PATH : Source Destination resv Traffic parameters of source Collects info on network capabilities Detects current route RESV: Source Destination Dest. Receiver selected Int-Serv service Traffic parameters of receiver selected reservation Follows route detected by PATH Reservation actually nailed in network RSVP messages carried over IP Can also be carried over UDP but few people do that RSVP: Admission Control Flow Request Routing Routing Protocol Routing Database Switching Packets In Reservation Protocol Admission Control Resource Utilization Database Interface 1 Packet Scheduler Queuing Policy Database Packets Out Route Selection Interface N Packet Scheduler Packets Out Intserv/RSVP Acceptance Enthusiasm Intserv/RSVP will solve the world’s QoS Cool thing to say: “RSVP does not scale” vBNS RSVP over ATM transparently transport RSVP Real value RSVP for VoIP in Enterprise Today ISP Today Enterprise Time IP Precedence & Diffserv “Poor man’s” approach to QoS Set IP Precedence/DSCP higher on voice packets Scales better than RSVP – This puts them in a different queue, resulting in isolation from best effort traffic Can be done by endpoint, proxy, or in routers through heuristics Keeps QoS control “local” Pushes work to the edges and boundaries Can provide bulk QoS by customer or network No admission control Too much high-precedence traffic can still swamp the network Diffserv Architectural Model Clouds — regions of relative homogeneity: Within a cloud, QoS managed by local rules Hard work confined to boundaries of clouds: Administrative control Technology Bandwidth Classification Conditioning/Policing QoS information exchange limited to boundaries Bi-lateral, not multi-lateral Not necessarily symmetric Me Not Me Also Not Me Far Away Diffserv Scalability Fundamental assumptions: Group packets explicitly by the “Per-hop behavior (PHB)” they are to get Relatively small number of feasible queuing/scheduling algorithms for high link speeds Number of individual flows is large Many different rules, often policy driven Queue service Shaping/policing Nodes in the middle of a cloud only have to deal with traffic aggregates Diffserv Forwarding via PHBs PHBs map to DSCPs (Diffserv Code Points) Values chosen for backward-compatibility with IPv4 TOS byte including IP Precedence (RFC 2474) Packets with different DSCPs may be reordered Forwarding resources partitioned by PHB/DSCP Assured Forwarding PHB (AF*) Four independent classes Within each class, three levels of drop precedence A congested AF node discards packets with higher drop preference first Packets with lowest drop preference must be within the subscribed profile *RFC2597 Expedited Forwarding PHB (EF*) Targeted at VoIP and “virtual leased lines” Roughly equivalent to priority queuing, with a safety measure to prevent starvation Implications: No more than 50% of a link can be EF see RFC3247,3248 for interesting mathematical analyses Worst case jitter at each hop is max of: *RFC3246 number of EF microflows in the aggregate, or a single MTU packet of some other aggregate Diffserv Traffic Conditioner Meter Shaped Packets Classifier Marker Shaper / Dropper Dropped Classifier: selects a packet in a traffic stream based on the content of some portion of the packet header Meter: checks compliance to traffic parameters (e.g. Token Bucket) and passes result to marker and shaper/dropper to trigger particular action for in/out-of-profile packets Marker: writes/rewrites DSCP Shaper: delay some packets for them to be compliant with the profile Diffserv Acceptance Enthusiasm Diffserv will solve the world’s QoS Diffserv Engineering? Diffserv SLA ? Internet e2e SLA? Real value Inter-SP Diffserv and end-to-end Internet QoS need further standardisation and commercial arrangements Diffserv Design & Deployment intra Domain today Time Mixing Intserv & Diffserv: Aggregation Host signals with RSVP Edge or transit domains Edge In transit domains Aggregate reservations mark packets using DSCP Blindly transfer end to end reservations using another IP Protocol Number - change at edge Routers detect egress of reservation (deaggregation) on transfer from an interior or aggregator interface to an exterior (deaggregating) interface Aggregate reservation size varies with load Backbone Edge RTP Compression 20ms @ 8kbit/s yields 20 byte payload IP header 20; UDP header 8; RTP header 12 Twice size of payload! Header compression: 40 bytes to 2-4 most of the time Hop-by-hop: use only on the slow links Sample Delay Budget (G.711 - 64kbps) Delay Source (G.711) Budget (ms) Device Sample Capture .1 Encode Delay (Algorithmic Delay + Processing Delay) 2.5 Packetization/Framing 10 Move to Output Queue/Queue Delay .5 Access (up) Link Transmission 30 Backbone Network Transmission 5 Access (down) Link Transmission 10 Input Queue to Application .5 Jitter Buffer 35 Decode Processing Delay .5 Device Playout Delay .5 Total 94.6 Sample Delay Budget (G.729 - 8kbps) Delay Source (G.729) Budget (ms) Device Sample Capture .1 Encode Delay (Algorithmic Delay + Processing Delay) 17.5 Packetization/Framing 20 Move to Output Queue/Queue Delay .5 Access (up) Link Transmission 30 Backbone Network Transmission 5 Access (down) Link Transmission 10 Input Queue to Application .5 Jitter Buffer 35 Decode Processing Delay 5 Device Playout Delay .5 Total 119.1 Signaling: SIP SIP is one of Many ITU H.323 MGCP Originally for video conferencing The first standard protocol for VoIP Still in wide usage, but negative growth Dumb phones controlled by smart server “Softswitch” – PSTN emulation view Megaco/H.248 Standard version of MGCP Core SIP Functions Establishment of peer to peer sessions Management of peer to peer sessions Keepalives Graceful and Non-graceful termination Rendezvous Forking Search Policy Based Routing Loose Routing Mobility Limited terminal mobility Device Mobility Core SIP Functions Secure User Identification Exchange and Management of Media Session data User registration Capability declaration Capability query Reliability SIP Technology Community RTP SDP ROHC STUN O/A 3264 Events 3265 SIMPLE SIP RFC3261 MIDCOM DNS 3263 ENUM Rel 3262 SigComp SIP Extensions SIP Design Philosophy Patterned after other Successful Internet Standards HTTP Don’t Reinvent the PSTN General Purpose Functionality Do Not Dictate Architectures or Services It needs to work on any IP Network Leverage the Best of Existing Standards URLs MIME RFC822 Scalability Push state to the edge Basic Design Request/Response Protocol SIP is a Peer Protocol – all entities send requests and receive requests Modelled after HTTP Each request invokes method Main purpose of request Messages contain bodies request Agent Agent response Transactions Fundamental unit of messaging exchange Request Zero or more provisional responses Usually one final response Maybe ACK All signaling composed of independent transactions Identified by Cseq Sequence number Method tag INVITE 100 200 Cseq: 1 ACK First Transaction BYE 200 Second Transaction Cseq: 2 Session Independence Body of SIP message used to establish call describes the session Session could be Audio Video Game SIP operation is independent of type of session SIP Bodies are MIME objects MIME = Multipurpose Internet Mail Extensions Mechanisms for describing and carrying opaque content Used with HTTP and email Protocol Components User Agent Proxy SIP server responsible for End systems relaying and processing Hard and soft phones requests between user agents PSTN Gateways Main job: where to send Phone Adaptors request next? Media Servers Back-to-Back User Agent (B2BUA) Anything that originates or SIP server that terminates and re-originates SIP terminates SIP calls SBCs, Call Agents, etc. SIP Addressing SIP addresses are URL’s URL contains several components Scheme (sip) Username Hostname Optional port Parameters Headers and Body SIP allows any URI type tel URIs http URLs for redirects mailto URLs leverage vast URI infrastructure sip:jdrosen@cisco.com:5061; user=host?Subject=foo The SIP Trapezoid b.com a.com SIP RTP SIP Methods INVITE BYE Invites a participant to a session idempotent - reINVITEs for session modification Ends a client’s participation in a session CANCEL Terminates a search OPTIONS ACK Queries a participant about their media capabilities, and finds them, but doesn’t invite For reliability and call acceptance REGISTER Informs a SIP server about the location of a user SIP Architecture sp.com Request Response Media 2 Corp DB 3 a.com 14089023077@b.com 5 4 b.com 6 1 7 11 12 10 13 8 14 9 SIP Message Syntax Many header fields from http Payload contains a media description SDP - Session Description Protocol INVITE sip:+17327654321@example.com SIP/2.0 From: J. Rosenberg <sip:+14082321122@example.com> ;tag=76ah Subject: Conference Call To: John Smith <sip:+17327654321@example.com> Via: SIP/2.0/UDP 1.2.3.4;branch=z9hG4bK74bf9 Call-ID: 1997234505.56.78@1.2.3.4 Content-type: application/sdp CSeq: 4711 INVITE Content-Length: 187 v=0 o=user1 53655765 2353687637 IN IP4 1.2.3.4 s=Sales c=IN IP4 1.2.3.4 t=0 0 m=audio 3456 RTP/AVP 0 SIP Address Fields Request-URI To Contains address of next hop server Rewritten by proxies based on result of Location Service Address of original called party Contains optional display name From Address of calling party Optional display name INVITE sip:+17327654321@example.com SIP/2.0 From: J. Rosenberg <sip:+14082321122@example.com> ;tag=76ah Subject: Conference Call To: John Smith <sip:+17327654321@example.com> Via: SIP/2.0/UDP 1.2.3.4;branch=z9hG4bK74bf9 Call-ID: 1997234505.56.78@1.2.3.4 Content-type: application/sdp CSeq: 4711 INVITE Content-Length: 187 v=0 o=user1 53655765 2353687637 IN IP4 1.2.3.4 s=Sales c=IN IP4 1.2.3.4 t=0 0 m=audio 3456 RTP/AVP 0 SIP Responses Look much like requests Headers, bodies Differ in top line Status Code Numeric, 100 - 699 Meant for computer processing Protocol behavior based on 100s digit Other digits give extra info Text phrase for humans Can be anything 100 - 199 (1XX): Informational 200 - 299 (2XX): Success 300 - 399 (3XX): Redirection 400 - 499 (4XX): Client Error 500 - 599 (5XX): Server Error 600 - 699 (6XX): Global Failure Two groups 100 - 199: Provisional Reason Phrase Status Code Classes Not reliable 200 - 699: Final, Definitive Example 200 OK 180 Ringing Example SIP Response Note how only difference is top line Rules for generating responses Call-ID, To, From, Cseq are mirrored in response Branch parameter used as transaction ID Tag added to To field to identify dialog SIP/2.0 200 OK From: J. Rosenberg <sip:+14082321122@example.com> ;tag=76ah To: John Smith <sip:+17327654321@example.com> ;tag=112 Via: SIP/2.0/UDP 1.2.3.4;branch=z9hG4bK74bf9 Call-ID: 1997234505.56.78@1.2.3.4 Content-type: application/sdp CSeq: 4711 INVITE SIP Transport SIP Messages over UDP or TCP/TLS or SCTP Reliability mechanisms defined for UDP UDP More Widely Used Faster No connection state TCP preferred these days NAT Larger SIP messages Reliability mechanisms depend on SIP request method INVITE anything except INVITE Reason: optimized for phone calls Registrations REGISTER creates mapping in server from one URI to another REGISTER properties UA location in Contact Registrar identified in Request URI Identifies registered user in To and From field Expires header indicates desired lifetime REGISTER sip:example.com SIP/2.0 To: sip:89023077@example.com;user=phone From: sip:89023077@example.com;user=phone Call-ID: 1997234505.56.78@1.2.3.4 CSeq: 123 REGISTER Contact: sip:89023077@1.2.3.4 Expires: 3600 Can be different for each Contact Registrations are soft-state sip:89023077@example.com to sip:89023077@1.2.3.4 Registration Handling Registrar is logical function handling REGISTER Registrar steps: Authenticate Authorize Add Binding Lower expiration Return all currently registered UA (can be more than one) SIP/2.0 200 OK To: sip:89023077@example.com;user=phone From: sip:89023077@example.com;user=phone Call-ID: 1997234505.56.78@1.2.3.4 CSeq: 123 REGISTER Contact: sip:89023077@1.2.3.4;expires=3600 Contact: sip:89023077@5.6.7.8;expires=524 Forking A proxy may have more than one address for a user Happens when more than one SIP URL is registered for a user Can happen based on static routing configuration INVITE In this case, proxy may fork 89023077@a.com Forking is when proxy sends request to more than one proxy at once First 200 OK that is received is forwarded upstream All other unanswered requests cancelled Routing of Subsequent Requests Initial SIP request sent through many proxies No need per se for subsequent requests to go through proxies Each proxy can decide whether it wants to receive subsequent requests INVITE Proxy Inserts Record-Route header containing its address For subsequent requests, users insert Route header Proxy Contains sequence of proxies (and final user) that should receive request BYE Proxy UA1 UA2 Setting up the Session INVITE contains the Session Description Protocol (SDP) in the body SDP conveys the desired session from the callers perspective Session consists of a number of media streams Each stream can be audio, video, text, application, etc. Also contains information needed about the session codecs addresses and ports SDP also conveys other information about session Time it will take place Who originated the session subject of the session URL for more information SDP origins are multicast sessions on the mbone Originator of INVITE is not originator of session Anatomy of SDP SDP contains informational headers Time of the session Followed by a sequence of media streams Each media stream contains an m line defining version (v) origin(o) - unique ID information (I) port transport codecs Media Stream also contains c line Address information v=0 o=user1 53655765 2353687637 IN IP4 128.3.4.5 s=Mbone Audio i=Discussion of Mbone Engineering Issues e=mbone@somewhere.com t=0 0 m=audio 3456 RTP/AVP 0 78 c=IN IP4 1.2.3.4 a=rtpmap:78 G723 m=video 4444 RTP/AVP 86 c=IN IP4 1.2.3.4 a=rtpmap:86 H263 Negotiating the Session Called party receives SDP offered by caller Each stream can be Accepting involves generating an SDP listing same stream accepted rejected port number and address of called party subset of codecs from SDP in request Rejecting indicated by setting port to zero Resulting SDP returned in 200 OK Media can now be exchanged v=0 o=user2 16255765 8267374637 IN IP4 4.3.2.1 t=0 0 m=audio 3456 RTP/AVP 0 c=IN IP4 4.3.2.1 m=video 0 RTP/AVP 86 c=IN IP4 4.3.2.1 Audio stream accepted, PCMU only. Video stream rejected Changing Session Parameters Once call is started, session can be modified Possible changes Add a stream Remove a stream Change codecs Change address information Call hold is basically a session change Accomplished through a re-INVITE Same session negotiation as INVITE, except in middle of call Rejected re-INVITE - call still active! INVITE 200 ACK INVITE 200 reINVITE ACK Hanging Up INVITE How to hang up depends on when and who After call is set up Hangup CANCEL From caller, before call is accepted either party sends BYE request 100 200 OK Accept 200 OK send CANCEL BYE is bad since it may not reach the same set of users that got INVITE If call is accepted after CANCEL, then send BYE ACK BYE 200 OK From callee, before accepted Reject with 486 Busy Here C S Call Flow for basic call: UA to proxy to UA Call setup Call parameter modification 100 trying hop by hop 180 ringing 200 OK acceptance re-INVITE Same as initial INVITE, updated session description INVITE 100 Trying 180 Ringing 200 OK 100 Trying 180 Ringing 200 OK ACK RTP Termination INVITE BYE BYE method 200 OK Privacy and Identity RFC 3325: A Private Extension for Asserted Identity in Trusted Networks RFC 3323: A Privacy Mechanism for SIP RFC 4474: SIP Identity RFC3325 Asserted Identity Trust Domain INVITE P-Asserted-Identity: sip:+14089023077@a.com Authenticates Caller and verifies identity. Adds PAID. RFC3323 – SIP Privacy Trust Domain INVITE P-Asserted-Identity: sip:+14089023077@a.com From: anonymous INVITE Privacy: id From: anonymous Anonymous Caller INVITE From: anonymous 4474: SIP Identity INVITE From: sip:joe@example.com INVITE From: sip:joe@example.com Identity: asd87f7as66sda8z Authenticates Caller and verifies identity. Signs Request. Verifies Signature Only useful for user@domain addresses! Transfers and Dialog Movement: REFER (RFC 3515) Alice 3 1 REFER Refer-To: Bob INVITE Bob Referred-By: Joe 4 2 Joe Bob Third Party Call Control (3pcc): RFC 3725 INVITE no SDP 3 1 ACK SDP B 2 200 SDP A 5 4 200 SDP B 6 RTP INVITE SDP A SIP and Quality of Service RFC 3312: Integration of Resource Management with SIP Problem How to make sure phone doesn’t ring unless resources are reserved INVITE w. Preconditions 183 Progress QoS Reservations Solution SIP does not do resource reservation! SIP INVITE tells far side not to ring Both sides do regular QoS reservations RSVP PDP context activation UPDATE to change state UPDATE w. Preconditions 180 Ringing 200 OK ACK Security VoIP Security The only totally secure system I know of is a rock - Tony Lauck, circa 1985 But Even Rocks can be Insecure.. It Had a Great User Interface But it had a serious security vulnerability… VoIP Attacks Attack Solution Free Calls aka Toll Fraud Impersonation User Authentication User Authentication, Secure Caller ID SIP Encryption, Media Encryption Learning Private Information (calling patters, PIN codes) Steal Calls DoS SIP Encryption, Media Encryption ICE, Others SIP User Authentication RTP We want this SIP server to authenticate this user and this SIP server to authenticate this user SIP Digest Authentication Digest= Hash(joe, a7szh1, myPassword) = z0v88a6 Hi, I’d like to SIP REGISTER 401 – OK, try again. Nonce=a7szh1 REGISTER Nonce=a7szh1 Username=joe Digest=z0v88a6 Digest= Hash(joe, a7szh1, myPassword) OK, done! Offline Dictionary Attack Digest= Hash(joe, a7szh1, alligator) = REGISTER Nonce=a7szh1 Username=joe Digest=z0v88a6 Word Hash(joe, a7szh1,word) Aardvark 9z8v77a Abacus lkf88z7 Abate 8z77x ……. Alligator z0v88a6 Digest= Hash(joe, a7szh1, alligator) OK, done! Solution: Digest over TLS Digest= Hash(joe, a7szh1, alligator) = TLS Armor This is how Web Security works! Digest= Hash(joe, a7szh1, alligator) Even Stronger: Mutual TLS for Devices a.com TLS Armor MAC 8x7a6 Phone has a Certificate which identifies it SIP Encryption RTP We want each SIP hop to be Encyprted so only the SIP servers and endpoints see the signaling. SIP Encryption: TLS a.com RTP b.com Mutual TLS Authentication Media Encryption Countermeasure against: Eavesdropping Barge-in Modification Two useful techniques IPSEC SRTP Complications Key management Legal intercept (who has the keys) Firewall and NAT issues (covered later) Alternative: Secure RTP Authentication and encryption of RTP and RTCP packets V P X CC M PT sequence number timestamp synchronization source (SSRC) identifier contributing sources (CCRC) identifiers … RTP extension (optional) RTP payload SRTP MKI -- 0 bytes for voice Authentication tag -- 4 bytes for voice Encrypted portion Authenticated portion SRTP Advantages Provides both Privacy via encryption and authentication via message integrity check Very little bandwidth overhead Uses modern strong crypto suites: AES counter mode for encryption and HMAC for message integrity Disadvantages Needs key management End-to-end versus hop-by-hop trust tradeoffs in protecting keys Yet another security mechanism to ensure is implemented and deployed correctly Does not break header compression schemes like cRTP For very low-rate channels (e.g. cellular) can sacrifice authentication and have no packet expansion. NAT Traversal What is NAT? Network Address Translation (NAT) Creates address binding between internal private and external public address Modifies IP Addresses/Ports in Packets Benefits Avoids network renumbering on change of provider Allows multiplexing of multiple private addresses into a single public address ($$ savings) Maintains privacy of internal addresses S: 10.0.1.1:6554 D: 67.22.3.1:80 IP Pkt Client S: 1.2.3.4:8877 D: 67.22.3.1:80 IP Pkt N N A A TT Binding Table Internal External 10.0.1.1:6554 -> 1.2.3.4:8877 Problem: Getting SIP Through NATs RTP to 10.0.1.1 N A T INVITE sip:12345@b.com m=audio 3456 RTP/AVP 0 c=IN IP4 10.0.1.1 Solution Space Application Layer Gateways (ALGs) Session Border Controllers (SBC) Simple Traversal of UDP Through NAT (STUN) Traversal Using Relay NAT (TURN) Interactive Connectivity Establishment (ICE) Application Layer Gateway RTP to 10.0.1.1 INVITE sip:12345@b.com m=audio 3456 RTP/AVP 0 c=IN IP4 10.0.1.1 N A T ALG INVITE sip:12345@b.com m=audio 1234 RTP/AVP 0 c=IN IP4 19.1.3.2 NAT also modifies SIP messages to fix them up! ALG Benefits and Drawbacks Drawbacks Doesn’t work when security turned on Hard to diagnose problems Requires network upgrade to support new app Frequent implementation problems (lack of expertise) Incentives mismatched Benefits No change to clients or servers Session Border Controller 9.8.7.6 INVITE sip:12345@b.com m=audio 3456 RTP/AVP 0 c=IN IP4 10.0.1.1 INVITE sip:12345@b.com N A T SBC SBC relays RTP back to source m=audio 3225 RTP/AVP 0 c=IN IP4 9.8.7.6 RTP to 9.8.7.6 SBC Benefits and Drawbacks Drawbacks Expensive media relaying Interferes with some SIP extensions Breaks more advanced SIP security Benefits No change to clients or NATs Works with basic SIP security mechanisms Easier to diagnose Simple Traversal of UDP Through NAT (STUN) 9.8.7.6 What is my IP address and port please? Its 1.2.3.4: 3472 1.2.3.4 N A T STUN Server INVITE sip:12345@b.com m=audio 3472 RTP/AVP 0 c=IN IP4 1.2.3.4 RTP to 1.2.3.4 STUN Benefits and Drawbacks Drawbacks Doesn’t always work Benefits No change to servers or NATs Works with all SIP security mechanisms Can support non-VoIP apps (e.g., games) Traversal Using Relay NAT (TURN) 9.8.7.6 Give me an IP address and port please? 9.8.7.6: 2376 1.2.3.4 TURN Server RTP to 1.2.3.4 N A T INVITE sip:12345@b.com m=audio 2376 RTP/AVP 0 c=IN IP4 9.8.7.6 TURN Benefits and Drawbacks Drawbacks Expensive Media Relaying Benefits No change to servers or NATs Works with all SIP security mechanisms Can support non-VoIP apps (e.g., games) Interactive Connectivity Establishment (ICE) Hybrid of STUN and TURN P2P NAT Traversal Widely Deployed on Internet Popular with Application Providers ICE Step 1: Allocation Before Making a Call, the Client Gathers Candidates Each candidate is a potential address for receiving media Three different types of candidates Host Candidates Server Reflexive Candidates (STUN) Relayed Candidates (TURN) TURN candidates reside on a TURN server STUN Host Candidates reside on the agent itself TURN STUN candidates are addresses residing on a NAT NAT NAT ICE Step 2: Create Offer Each candidate is placed into an a=candidate attribute of the offer Each candidate line has IP address and port plus other info needed for ICE c=IN IP4 192.0.2.3 t=0 0 m=audio 45664 RTP/AVP 0 a=rtpmap:0 PCMU/8000 a=candidate:1 1 UDP 2130706178 10.0.1.1 8998 typ host a=candidate:2 1 UDP 1694498562 192.0.2.3 45664 typ srflx raddr 10.0.1.1 rport 8998 ICE Step 3: Send INVITE Caller sends a SIP INVITE as normal No ICE processing by SIP servers SIP Server INVITE ICE Step 4: Allocation Called party does exactly same processing as caller and obtains its candidates Recommended to not yet ring the phone! STUN TURN NAT NAT ICE Step 5: Provisional Response Callee sends a provisional response containing its SDP with candidates As with INVITE, no processing by proxies Phone has still not rung yet SIP Proxy 1xx ICE Step 6: Verification Each agent pairs up its candidates (local) with its peers (remote) to form candidate pairs Each agent sends a STUN-based ping on each pair, starting at highest priority If a response is received the check has succeeded and we know media can flow on that pair! TURN Server TURN Server 5 4 NAT NAT 2 3 NAT NAT 1 ICE Benefits and Drawbacks Drawbacks Requires client changes Requires other side to support it Benefits Always Works No change to servers or NATs Works with all SIP security mechanisms Minimum Media Relaying Can support non-VoIP apps (e.g., games) Built-In Anti-DOS Eliminates Ghost Rings That’s it! Questions? Glossary Advanced Intelligent Network Adaptive PCM Border Gateway Protocol Communication Access for Law Enforcement Act Constant Bit Rate CBR Code Excited Linear Prediction CELP CODEC Coder/Decoder Common Open Policy Service COPS Compressed RTP CRTP Contributing Source CSRC Computer-Telephony CTI Integration Diffserv Code Point DSCP Digital Subscriber Line DSL Digital Signal Processor DSP DTMF Dual Tone Multi-Frequency Echo Return Loss ERL ERL Enchancement ERLE Hybrid Fiber/Coax HFC AIN ADPCM BGP CALEA IN ISDN ISUP JTAPI LDAP MCML MGCP MOS MPLS NLP NTP PCM PPP PHB PQ PSTN Intelligent Network Integrated Services Digital Network ISDN User Part Java Telephony API Lightweight Directory Access Protocol Multi-class Multi-link PPP Media Gateway Control Protocol Mean Opinion Score Multi-protocol Label Switching Non-linear Processing Network Time Protocol Pulse Coded Modulation Point-to-point Protocol Per-hop Behavior Priority Queueing Public Switched Telephony Network Glossary (2) QoS RED RTCP RTP SCP SIP SS7 SSRC TAPI TDM TRIP TSPEC WFQ Quality of Service Random Early Detect (or Drop) Realtime Transport Control Protocol Realtime Transport Protocol Service Control Point Session Invitation Protocol Signaling System Number 7 Synchronization Source Telephony API Time Division Multiplexed Telephony Routing Information Protocol Transmission Specification Weighted Fair Queueing Thanks Enjoy Interop! to contact me: jdrosen@jdrosen.net