Validating the Resilience Mechanisms for the Packet Switched

advertisement

Validating the Resilience Mechanisms for the

Packet Switched Domain in 3G Networks

Jari Hietanen

Nokia Networks

Helsinki

1 © Jari Hietanen

Thesis Seminar, 7.6.2005.

Supervisor: Professor Raimo Kantola

Instructor: Juhani Helske (M.Sc.)

Agenda

• Background for the Thesis

• The Objective and the Scope

• Packet Switched Domain in 3G Networks

• Resilience Mechanisms in Gn Interface

• Validation of the Existing Resilience Mechanism in Gn Interface

• Validation of New Resilience Mechanisms

• Validation Results and Comparison of Resilience Mechanisms

• Conclusions

2 © Jari Hietanen

Thesis Seminar, 7.6.2005.

Background for the Thesis

• Mobile networks are evolving from 2G towards 3G.

• So far the most of traffic has been voice traffic.

• Now amount of data traffic is growing faster than traditional voice traffic.

• New packet based services:

 Multimedia messaging

Wireless Internet browsing

Advertising

Entertainment services

3 © Jari Hietanen

Thesis Seminar, 7.6.2005.

Voice and data traffic volume evolution in

Western Europe

4 © Jari Hietanen

Thesis Seminar, 7.6.2005.

New Challenges for Telecom Vendors and

Operators:

• Amount of IP based data traffic is increasing in 3G networks.

• Traditionally people are used to high availability and service quality in circuit switched PSTN and GSM netwoks.

• Now people are expecting the same level of availability and quality also in packet switched 3G networks.

• This causes huge challenge for vendors and operators to develop and build fault tolerant and resilient networks, which quarantee high availability of the networks.

• 3G networks are complex:

• Several physical transmission mediums (e.g. IP, ATM, SS7, GTP..)

• Many protocols and network layers

Multi-vendor HW

5 © Jari Hietanen

Thesis Seminar, 7.6.2005.

Terms

• What is the difference between terms ”resilience” and ”redundancy”?

RESILIENCE:

“Ability of the system to function seamlessly in the event of the failure of any single item of hardware or failure of the software package”.

REDUNDANCY:

According to Collins English dictionary, the term redundancy means

“duplication of components in electronic or mechanical equipment so that operations can continue following failure of a part or repetition of information or inclusion of additional information to reduce errors in telecommunication transmissions and computer processing”.

6 © Jari Hietanen

Thesis Seminar, 7.6.2005.

The Objective and the Scope of the Thesis

The objective:

• The main objective was to study and validate existing and new resilience mechanisms for the Gn interface in 3G networks.

The scope:

• Thesis concentrated only on resilience mechanisms of data link layer (L2) and network layer (L3). Upper and lower level resilience solutions were out of the scope.

• The implementation of Gn interface architecture and resilience mechanisms are not standardized. Thesis concentrated on validating Nokia’s implementation to build the resilient Gn interface between 3G SGSN and

GGSN network elements.

7 © Jari Hietanen

Thesis Seminar, 7.6.2005.

8 © Jari Hietanen

Thesis Seminar, 7.6.2005.

The UMTS Network Architecture

The 3G SGSN

Main functions:

• Subscriber authentication & authorization.

• User data tunneling and routing (acts as a gateway for user data tunneling between RNC and GGSN, separate tunnels towards RNC and GGSN for each connection).

• Mobility management (controls the location, state and security of UE).

• Session management (managed through resource monitoring, admission control and PDP context creation, modification and deletion).

• Traffic management (performs packet classifying, policing, buffering, shaping, marking and scheduling to ensure that all connections receive appropriate Quality of Service).

• Short message delivery.

• Collection of charging data and traffic statistics.

9 © Jari Hietanen

Thesis Seminar, 7.6.2005.

The 3G GGSN

Main functions:

• Signalling towards access networks. There is signalling, which is required for creating, modifying and deleting the PDP contexts. The request for PDP context creation comes always from an external network equipment.

• Signalling towards data networks. Some of the signalling is required for configuring the PDP context. Most of this signalling happens when the PDP context is created. It is used e.g. allocating a IP address for User Equipment

(UE).

• Charging. The GGSN analyses user plane traffic and reports the metering results to the charging system via signalling interfaces.

• Subscription management, authentication and session control. GGSN may need to authenticate mobile subscribers before PDP context can be created. In addition GGSN may need to know what services the mobile subscriber is allowed to use.

• Lawful interception. In many countries local authorities require possibility for monitoring the traffic for certain mobile subscribers.

10 © Jari Hietanen

Thesis Seminar, 7.6.2005.

The GPRS Tunneling Protocol (GTP)

• GPRS Tunnelling Protocol (GTP) is used in Gn interface between GPRS

Support Nodes (GSNs) in UMTS and also in GPRS backbone networks.

• GTP allows multi-protocol packets to be tunnelled through the UMTS or

GPRS backbone between GSNs and UMTS Terrestrial Radio Access

Network (UTRAN).

• GTP protocol is divided in to GTP Control Plane (GTP-C) and GTP User

Plane (GTP-U) procedures.

11 © Jari Hietanen

Thesis Seminar, 7.6.2005.

GTP Path Management Messages and timers

• Echo Request Interval

• Echo Response Interval

• The timer T3-RESPONSE holds the maximum wait time for a response of a request message.

• The counter N3-REQUESTS holds the maximum number of attempts made by GTP to send a request message. The recommended value is 5.

12 © Jari Hietanen

Thesis Seminar, 7.6.2005.

Default Parameter values used in Nokia 3G SGSN and GGSN:

Parameter

GTP Echo Request Interval

Min value

Max value

Default value

10 360

Unit

60 seconds (s)

Echo Reply Waiting Time

(T3)

Echo Request

Retransmisson (N3)

1

1

60

10

10

5 seconds (s) times

13 © Jari Hietanen

Thesis Seminar, 7.6.2005.

This means that GTP considers a path between GSNs to be down from 1 to 600 seconds depending on configuration of GTP parameters.

For example if Echo reply waiting time (T3) is set to be 5 seconds and Echo request retransmission (N3) is set to be 3 times, then the time that a GTP tunnel is declared to be down is T3 x N3 = 5 s x 3 = 15 seconds.

Existing Resilience Mechanisms in the Gn

Interface

• Existing solution is to use dynamic OSPF routing protocol in 3G networks.

• 3G SGSN build on an IPSO router platform.

14 © Jari Hietanen

Thesis Seminar, 7.6.2005.

Gn Interface Resilience in 2G networks

• Gn interface topology for host based elements in 2G networks.

• 2G SGSN has not any routing functionality.

15 © Jari Hietanen

Thesis Seminar, 7.6.2005.

New Resilience Mechanisms for the Gn

Interface

• The problem of the existing OSPF based resilience mechanism for Gn backbone is that convergence time is not necessarily fast enough.

• The worst case scenario is that convergence time can be even 40 seconds, if OSPF hello protocol is only mechanism to detect a network failure.

• Also some operators are not willing to start to use a dynamic routing protocol in their Gn backbone network.

• Solution would be to find a suitable data link layer (L2) protocol to be used as Gn interface resilience mechanism.

• Advantages of L2 mechanisms:

• Simple and flexible network architecture using L2 resilience

• mechanisms.

Similarity with 2G solution.

• No need for two separate Gn interface Virtual LANs (VLANs).

Fast convergence from error situations compared to OSPF.

16 © Jari Hietanen

Thesis Seminar, 7.6.2005.

Link Layer Resilience Mechanism validated:

• Link aggregation (IEEE 802.3ad)

• Proxy ARP

• Virtual Router Redundancy Protocol (VRRP)

• Hot Standby Router Protocol (HSRP)

• Virtual MAC address based method (Nokia’s own solution)

• Bidirectional Forwarding Detection

17 © Jari Hietanen

Thesis Seminar, 7.6.2005.

Validating Existing Resilience Mechanism

• First existing OSPF based (L3) resilience mechanism was validated building a Gn test network, which included 3G SGSN and GGSN nodes.

• The conclusions from OSPF validation test results:

• With default parameter values OSPF network convergence times are too slow. Convergence time can be even 40 seconds, if network includes hubs or switches.

• It is possible to improve the performance of OSPF convergence, using shorter Hello and Router Dead intervals.

Minimum value for Hello interval is 1 second and for Router Dead interval 4 seconds.

18 © Jari Hietanen

Thesis Seminar, 7.6.2005.

Validating New Resilience Mechanisms

• Some of the link layer techniques were validated building a test network and measuring convergence times from link failure situations.

• Some techniques were analyzed with other methods:

• using literature sources

• interviewing

19 © Jari Hietanen

Thesis Seminar, 7.6.2005.

Link Aggregation Test Results

• Test topology:

• Failover time from a link failure about 500 ms.

20 © Jari Hietanen

Thesis Seminar, 7.6.2005.

• Test topology:

VRRP Test Results

• Failover time from a link failure about 2.8 seconds.

21 © Jari Hietanen

Thesis Seminar, 7.6.2005.

Test results of Virtual MAC based Mechanism

• Test topology:

• Failover time from a link failure about 600 ms.

22 © Jari Hietanen

Thesis Seminar, 7.6.2005.

Comparison of Resilience Mechanisms

Existing OSPF based solution:

+ A dynamic routing protocol is easy to configure and administrate for network operator.

+ OSPF is common protocol, which is supported by also by other vendors products.

- OSPF protocol not suitable to all networks topologies, e.g. if network includes switches

- Convergence time from error situation rather slow.

- Hello protocol based mechanism is not fast enough if the default parameter values are used.

23 © Jari Hietanen

Thesis Seminar, 7.6.2005.

Link aggregation:

+ Does not require new hardware

+ Does not waste extra interface or line card capacity in router.

+ Fast recovery time from failure situation (about 500 ms)

+ Standardized solution

+ Economic and flexible method to increase network capacity

24 © Jari Hietanen

Thesis Seminar, 7.6.2005.

Proxy ARP:

+ It can be added to a single router on a network without disturbing the routing tables of the other routers on the network.

+ IP hosts can be used without configuring default gateway.

+ IP network does not need to have any routing intelligence.

+ Fast recovery time from failure situations.

+ Standardized solution.

+ It is easy to implement (only the gateway router has to be updated to support Proxy ARP).

- Hosts need larger ARP tables to handle IP-to-MAP mappings.

- The amount of ARP traffic increases.

- This does not work with all network topologies (e.g. more than one router connecting two physical networks).

25 © Jari Hietanen

Thesis Seminar, 7.6.2005.

VRRP:

+ It offers higher availability of the default path without requiring configuration of dynamic routing protocol or router discovery protocols on every end-host.

+ Fast recovery from failure situations (average about 2.8 s).

+ Simple and flexible protocol.

+ Standardized solution.

- Not feasible to use with all network node architectures.

HSRP

:

- It is a patented solution, which is not possible to be utilized in free of charge.

- It does not offer anything superior compared to VRRP.

26 © Jari Hietanen

Thesis Seminar, 7.6.2005.

Bidirectional Forwarding Detection:

+

OSPF alone offers minimum convergence time of 1-2 second. With BFD protocol OSPF can provide sub-second failure detection time. According to measurements made by Marko Luoma (Lic.Tech.) in HUT Networking

Laboratory convergence time can be even 75 ms.

+ Because BFD is not tied to any particular routing protocol, it can be used as a generic and consistent failure detection mechanism for e.g. OSPF, IS-IS,

EIGRP, and BGP routing protocols.

+ CPU usage is minimal for route processor.

- BFD can potentially generate false alarms and signaling a link failure when one does not exist. Because the timers used for BFD are so tight, a brief interval of data corruption or queue congestion could potentially cause BFD to miss enough control packets to allow the detect-timer to expire

27 © Jari Hietanen

Thesis Seminar, 7.6.2005.

Conclusions

• OSPF based layer 3 resilience mechanism is suitable for Gn network alone, but in the future it will be too slow mechanism to detect network failures.

• For a greenfield operator who does not have existing 2G network, it might be reasonable to implement resilience using OSPF. A dynamic routing protocol based solution offers advantages compared to link layer solution.

For example configuration of a network is simpler

• However, the performance of Gn network can be improved using some L2 mechanism under OSPF.

• Bidirectional Forwarding Detection seems to be most suitable L2 resilience mechanism to be used with OSPF.

28 © Jari Hietanen

Thesis Seminar, 7.6.2005.

29 © Jari Hietanen

Thesis Seminar, 7.6.2005.

Questions?

Download