January 2015 Henry Starzynski Network Operations Support Global Network Mgmt Centre Bell Canada Henry Starzynski – Manager, Global Network Management Centre • Graduated from the University of Waterloo in 1982 with Bachelor of Mathematics (Computer Science) • Post graduation, worked for a computer time sharing company called Datacrown, which become Canada Systems Group, then SHL-Systemhouse , now CGI • I’ve been with Bell 30.5 years! (yes there have been LOTS of changes since I started!) • Started out working on network design tools for services called Datapac and Megastream • Moved to our network management centre taking care of Datapac, managing the 7/24 console then Frame Relay (Hyperstream) support • Today, I continue with legacy network support, PLUS bring in new business for our centre, support our computers (PCs) and handle international escalations • I have a life outside of Bell too! I’m involved in the local community with Scouts Canada – so, when you are free of University life, don’t forget to be involved in your community as well. You have lots of energy and knowledge that can help make local communities, where ever you end up, much better! • Don’t forget, when you leave Carleton, learning never ever stops! Keep your brains active, technology is continually changing Bell Canada’s GNMC • GNMC = Global Network Management Centre • One of the world’s first Data Network Management Centres • Operating locally in Ottawa, serving Bell Canada customers globally Bell Canada GNMC A bit about who were are … • • • • • • Involved in managing data networks in Canada since 1974, globally since 1992 Originally - the National Data Network Control (NDNC) for domestic (Canada only) core data networks: Dataroute, Datapac (packet switching) , Megastream (Pt-Pt T1), Hyperstream (frame relay), Canadian ATM Gateway networks Expanded to include private networks (Lotto Quebec) and VPN clouds Started internationally with Financial Networks Associates (FNA – consortium of 8 countries ) network in 1991 (Alcatel based network) Evolved into Global Network Management (GNMC) at the individual customer circuit level Today, we serve as International Help Desk/SPOC (single point of contact) for international data circuit troubles going OUT of Canada (with the exception of Canadian government circuits, which are handled by a separate group) Bell Canada GNMC Main Focus Areas: Single Point of Contact (SPOC) for international customer data circuits VPN Managed Services (MPLS) and support of private or virtual private network clouds and routers (LAN) Core Network Management (WAN) of legacy data networks (Datapac=Packet Switching, Frame Relay) Technical Support on existing legacy networks Surveillance of 2 major customers’ networks internationally GNMC is involved in major processes of Network Management: Fault Management Configuration Management (Provisioning) Performance Management & Change Management Security Management Network Management • Like any industry, we toss around lots of BUZZ WORDS • What do all those terms mean?? • WANs • Clouds • OSSs • Network Management • SPOC • Why do we do network management & customer management? • Why is it important? WELL let’s start … WHAT IS A NETWORK? What is a Network ?? A Network means something different to everyone For example, a ‘network’ can be .. • LAN (Local), WAN (Wide) MAN (Metropolitan), CAN (Campus) Area networks • Point to Point network - connecting two sites regardless of distance • The ‘CLOUD’ - the service provider’s network – the infrastructure, sometimes termed the Public Network • The `NET - the ubiquitous network • The PSTN – Public System Telephone Network • Wireless network • Home Network • A VPN – a Virtual Private Network • A ‘social’ network! A NETWORK MEANS DIFFERENT THINGS TO DIFFERENT PEOPLE BUT whatever your definition, all networks do the same thing! What is a Network ? • A standard definition of a ‘network’ we will use is the following: • A set of elements or NODES linked together to provide paths to transmit information, (data, voice, video) from one location to another. • A critical tool which allows businesses & people to operate and communicate • When it is all boiled down, All information is ‘data’, and it travels over a network. • Successful networks are managed Examples of Data Networks • Transport Networks (Sonet, DS3, DS1, Fibre, IP core) – the BIG infrastructures • Circuit Switched (Public Switched Telephone Network) • Dedicated (Point to point) • Packet/Frame/Cell (legacy services) • IP (Internet/ Intranet) • Local Area Networks, in the home, office, or around the campus. • Private (TV, Radio, Financial, Lottery) or Virtual Private Networks (VPNs) • Wireless Network Characteristics • Common characteristic of all networks is • the transmission of DATA (information, etc.) • Some type of information (i.e. - data) is being transmitted from one person/computer/location to another, for business, pleasure, research, etc. • In today’s world, we take data communications over networks for granted - it is there, reliable, fault tolerant, and it NEVER fails. • We use it every day, it is part of our daily routines, part of our ‘life’! We expect connectivity! What then - is Network Management and why is it important ? • Network management has 5 main processes: Fault Management Configuration Management (Provisioning) Accounting Management Performance Management (including Change Management) Security Management Bruce Deachman The Ottawa Citizen Sunday, March 20, 2005 In 1994, Nicholas Negroponte, founder of MIT's Media Lab, predicted one billion people would be using the Internet by the year 2000. What he failed to point out, was that most of them would be trying to get U2 tickets. At least that's how it must have felt for countless fans who were unable to snag tickets to the Bono-led, Irish rock band's Nov. 25 Corel Centre show yesterday morning, as technology failed to keep pace with overwhelming demand, leaving old-fashioned overnight campers the happiest of all Question! What is the latest current estimate of the number of internet users in the world? http://www.internetworldstats.com/stats.htm Blasts from the Past!! ROOT CAUSES OF BLACKOUTS AND THEIR REMEDY The electric power transmission system of the United States is seriously deficient. Experts generally agree that fixing this system to an adequate level would take many years and cost of tens of billions of dollars. But the root causes of the recent “Blackout of 2003” can be solved in a relatively short time and at a much more reasonable cost. The root causes of the present problems are: • A totally outdated reliability philosophy; and • Inadequate real time monitoring of the transmission grid. Isn’t the power grid a network too? Of course! Electricity is just a form of ‘data’! http://www.speedmatters.org/blog/archive/fcc-verizon-at-fault-for-network-failures-of2012-derecho/#.UPgdWh1lGQG In June 2012, large parts of the Midwest and Middle Atlantic were, without warning, hit by a destructive rain and windstorm called a derecho. It left in its wake 22 dead, hundreds of injuries and millions of people without power or communications. Today, the FCC released a lengthy report prepared by its Public Safety and Homeland Security Bureau that looks at the communications outages that followed from the derecho, and made recommendations to avoid or reduce future failures. FCC Commissioner issued a statement reinforcing the findings and recommendations, and commenting on the service breakdowns: "Tragically, many of these were avoidable interruptions involving a lack of back-up power to central offices or failures of the service providers' monitoring systems... Carriers should test their networks and ensure that plans are in place in case of an emergency. It is time for an honest accounting of the resiliency of our nation's network infrastructure in the wireless and digital age." In computer networking: “Resiliency is the ability to provide and maintain an acceptable level of service in the face of faults and challenges to normal operation.”[1] ] Network resilience touches a very wide range of topics. In order to increase the resilience of a given communication network, the probable challenges and risks have to be identified and appropriate resilience metrics have to be defined for the service to be protected Why ‘Network Management’? From a network provider’s viewpoint … • Manage network resources equitably to ensure users can establish communications quickly & reliably • Ensure information is transferred with original quality, integrity, and securely • Operate a high performance, reliable, cost effective network that meets customer/ business/organizational needs and requirements • Plan and implement measures to prevent or mitigate interruptions of service degradation • Make $$$$$ for the network provider and its shareholders • Gain market share for the network provider • At Bell Canada, networks are the building blocks of our own business – they are why we exist! Why ‘Network Management’? From the customer’s viewpoint … • Ensure information is transferred with original quality, integrity, and securely • Obtain service at best cost/service/value combination • To ensure a customer’s business operates with minimum downtime, in order to meet the requirements of its’ customers • Meet regulatory, legal, safety requirements • For a customer, networks are critical • For businesses, for their operations. • For the general public, so we can communicate, get money, do our assignments, talk .. BE CONNECTED Network Management Poses Endless Challenges by Willie Schatz If network managers are in accord about anything, it’s that they have a lot more tasks to do than resources to handle them. The fundamental roles of a network administrator are to provide network connections for computer equipment and to ensure availability and performance of network communications. But that’s only the beginning. The administrator must set up and manage hardware and software solutions, enabling servers, clients, printers and other peripherals to communicate. He or she also is responsible for providing users the highest quality server functionality, which means uninterrupted, optimum network availability and performance. This same individual also must plan so any changes required in the network conform with changes in the larger enterprise system. “People really think network management is easier than it really is”. Network Management Processes There are five processes involved in network management Configuration Management ==Provisioning • Programming network elements to communicate with each other and user equip. • User datafill to make their service functional • Copying critical (non default) network provisioning parameters to storage in offline in databases • Ensuring billable parameters/features are updated in related billing systems • Providing ‘dumps’, downloads, or application program interfaces (APIs) to other downstream systems Why is Configuration/Provisioning management important? • Users want their service when it is ordered (on due date) • Users want to get the options they pay for • The network provider needs to ensure their service is billed Network Management Processes Fault Management==Service Assurance • Surveillance - proactive - alarms/traps from the network that indicate major problems • Isolating problems - reactive - when users have troubles • Having clearly defined escalation procedures - how to prioritize troubles • Providing customers with timely and honest status on problems - when will it be fixed? • Performing analysis on failures for trends, root cause Service Assurance is .. REAL-TIME surveillance, control , and analysis of a network, with the objective of ensuring maximum use of network resources , particularly when it is under stress due to traffic overload or failure conditions. Network Management Processes Performance Management • Performance measures can be internal (for the provider), regulated (CRTC), or to assist the customer (how is my network performing) • Network performance (Mean time to repair, Network availability) are standard metrics used in the industry, and are often basis for ‘service level agreements’ • Customers may require information on their traffic patterns - are they paying for bandwidth they don’t require, or is their network overloaded? • Many customers want guarantees of performance – a Service Level Agreement (SLA) in order to ensure they are getting the performance they pay for. • A SLA may include the following • Network Availability • Frame/Cell/Packet delivery • Mean time to Repair • Penalty clauses for non-performance • Delay metrics Network Management Processes Change Management • Scheduling downtime / maintenance activities (new software, network upgrades) with users (notification, release or emergency) • Ensuring software levels are compatible with all network components • Keeping the customer informed of planned service interruptions is critical Networks are in need of periodic maintenance for software or hardware upgrades, etc. In a 7x24 world, unscheduled downtime can mean • loss of revenue • legal liability • threats to public safety. FROM: CHANGE MANAGEMENT PLANNED OUTAGE Foreign-Tel COMMUNICATIONS Dept.: GNMC Phone: 1-555-868-7883 Fax: 1-555-868-7822 Please respond to the following Email: tcsccip@foreigntelcommunications.com ForeignTel Communications would like to inform you that the Change Management activity will be performed as indicated below: _____________________________________________________________________ Outage #: POM041793 / POT356369 Your ref. #: Description: DISREGARD OUTAGE NOTICE//THIS IS NOT SERVICE AFFECTING//WE ARE ADDING BACKBONE CAPACITY: PORTLAND-SANTA CLARA DURING THIS PERIOD, NETWORK WILL BE IN HAZARDOUS CONDITION. WALL NOC WILL CLOSELY MONITOR THE NETWORK AND ANY ALARMS ON IT Scheduled Planned Start Date (UTC): february 16, 2014 15:00:00 Scheduled Planned End Date (UTC): february 24, 2014 03:00:00 Related Network Management Activities • Co-ordination with other Carriers and Agencies. No one carrier can route traffic everywhere on the planet. Strategic alliances and co-operation amongst carriers is essential. • Dynamic Controls. Can traffic be rerouted around failures or congestion? Is this automatic or manual? • Disaster recovery planning. Could it happen to you? What would you do in the event of a ‘disaster’? • Security Who has access to the network infrastructure? Can it be ‘hacked’? Ensuring one customer’s data does not go to another customer. Security Management • The goal of security management is to control access to network resources according to local guidelines so that the network cannot be sabotaged (intentionally or unintentionally) and sensitive information cannot be accessed by those without appropriate authorization. • Security management subsystems work by partitioning network resources into authorized and unauthorized areas. – They identify sensitive network resources (including systems, files, and other entities) and determine mappings between sensitive network resources and user sets. – They also monitor access points to sensitive network resources and log inappropriate access to sensitive network resources. AT&T Customer Info Hacked By TSC Staff 8/29/2006 9:05 PM EDT AT&T late Tuesday said that hackers broke into a computer system and accessed personal data, including credit card information, from thousands of customers who had purchased DSL equipment from the company's Web store. Kaspersky says Web hack 'should not have happened' 02/09/2009 It's the worst thing that can happen to a computer security vendor: This weekend, Moscow's Kaspersky Lab was hacked. A hacker, who identified himself only as Unu, said that he was able to break into a section of the company's brand-new U.S. support Web site by taking advantage of a flaw in the site's programming. http://www.csoonline.com/article/706400/10-hacks-that-made-headlines Network Management Centre Functions • 7 x 24 operation - it’s more than a buzzword. • Operations Support Systems for provisioning, change management, surveillance, trouble tracking, customer records • Subject experts/access to engineering support personnel or labs • Multiple & diverse communications channels • Situation (War) room • Secure and Independent Power Supply • Access to Information Databases • Contact information for support resources (level 1,2 3 support, vendor support) • Secure location • Fully redundant backup location When Disaster strikes! • If something will go wrong .. It will .. • Ice Storms (1998 & 2013)/Hurricane Katrina/Sandy & other natural disasters • Toronto Simcoe Central Office fire July 1999 • Power plant failures • Hackers and viruses (SQL Worm) • September 11/terrorist attacks • All of these test the plans of a network provider. • Are contingency plans in place? Have they been tested or gathered dust for 5 years? • Is there an escalation chain of command? • Are there agreements with other suppliers/vendors/competitors? • What contingencies are in place to get critical services restored as quickly as possible • When service is lost, the prime objective, after immediate human safety, is the restoration of service From July 1999 … TORONTO - Phones stopped ringing in several major cities in Canada on Friday after an explosion caused a major system failure at a Bell Canada building in Toronto. The failure knocked out phone lines, most cell phones, internet services and bank machines in downtown Toronto. Cantel and digital cell phones appear to be working. Police report 911 emergency systems are working, but the police are urging people to use these systems only for real emergencies. The failure was caused by an explosion on the fourth floor at the downtown bell centre at around 8:00 am. One person was reportedly injured. Immediately after the explosion, battery powered backup systems kicked in. But they ran out of power a few hours later. The Toronto Stock Exchange is back up and running after it suspended trading briefly but brokerages are having trouble communicating. Phone systems in Ottawa and Montreal and as far away as Halifax and Vancouver have also been affected as calls that normally routed through Toronto are rerouted through other cities. Bell Canada says it hopes to have services restored by midafternoon. The Globe and Mail Published Thursday, Oct. 10 2013, 11:18 AM EDT Rogers Communications Inc. said a software glitch created a big spike in “signalling traffic” that caused one of the worst wireless network outages in the company’s history. Canada’s largest wireless carrier determined that root cause on Thursday roughly 18 hours after implementing a fix that restored voice and text services for customers across the country. DISASTERS CAN HAPPEN? How will your network provider handle the trouble? • Another aspect of Network Management is Planning • A carrier will have a plan for a disaster situation, as well anticipating potential issues • Examples of planning for potential issues include • Y2K • more recently, the change in dates for Daylight Savings Time • Other various clock rollover issues • A carrier may also do periodic disaster simulations to test the response of various groups as well as procedures SPOC Function What is a SPOC? In Bell Canada, the GNMC is the Single Point of Contact (SPOC) for all Fault Management and Change Management between Canadian Help Desks and Test Centres and all the global carriers that Bell uses to provide international reach for our customer circuits SPOC for all other carriers to get their issues fixed within Canada One door for all trouble management into or out of Canada Avoids having many different groups learn the processes for dealing with each of the carriers, or the carriers having to learn about all the various ops centers within Canada Provides flexibility to move quickly and customize for customer reasons, with centralized expertise As a SPOC, we get to compare service levels provided by different global carriers and use this info to get better performance Operational Support Systems • Successful network management uses standardized protocols or vendor-specific mechanisms to transmit alarms and commands (e.g. Simple Network Management Protocol) • Operational control data can be transmitted over conventional data networks, over the same network (inband), or over another network (out of band). • The systems which receive alarms, allow for network configuration, troubleshooting, and control is commonly called Operational Support Systems (OSS). • OSS may be more than 10 times the cost of the network infrastructure! • OSSs may consist of Workstations, Databases, network elements, scripts, provisioning systems, security systems, offline databases and billing systems. • Without a good OSS structure, a great network infrastructure will fail. The network objectives cannot be met without this. Operational Support Systems • No one OSS does it all - if fact, many OSSs are required, and these must interact with each other. This is typically via Application Program Interfaces (API) or some standard format for information exchange. • The interaction can be simple - or complex. Often, simple format changes in one OSS will impact many other ‘downstream’ OSSs. • Remember where the money is spent - Not on the network infrastructure, but on the systems that make the network run. • The following diagram shows a SAMPLE interaction between various systems. Sample Operational Support Systems Fault Mgmt/ trouble shooting OSSs Test Centres, NDNC BILLING SYSTEM (Customer receives bill for service/usage) Order system CUSTOMER ORDERS SERVICE BILLING FILES BILLING Billing OSS BILLING FILES Call detail/ usage OSS BILLING RECS ORDER INFO PROV ORDER ENTRY/ Assignment system Recs Network Provisioning system (Customer gets service) SNMP TRAPS Customer and assignment dumps (feed other OSSs) PROV RECS Cust.. Stats Data Trouble Fault Mgmt OSS Collection Sys. ALERTS Ticket system Telco local assignment system Change Mgmt NETWORK Elements Surveillance Centres ALERT DISPLAY Metrics – Key Performance Indicators • Each network needs some means of measuring its success, and to see where improvement can be made. Public networks may be regulated. Metrics may be stipulated in Service level agreements (SLAs) between provider and customer • To the end user/customer, the most critical metrics are the following: • Mean time to repair (MTTR) • Network Availability ((Total available time-total downtime)/(Total avail. Time)) • Quality of Service (QOS) • round trip delay • Network congestion/blocking • frame/packet/cell loss • repeat failures • To the network provider, the following are important metrics: • Network Availability • EBITDA (Earnings Before Interest Taxes Depreciation & Amortization) • Cost / Revenue (return on investment) • Market Share • Network capacity Metrics •To the shareholder the following are important: • Dividend • share price • Return on Investment Summary • Networks can be simple, or extremely complex and mission critical • Network quality , reliability, diversity, and low cost are essential • The operation of a high quality reliable, cost effective network requires effective Network Management Centre(s), along with skilled people and good support tools (operational support systems) • As networks continue to evolve, customers will manage more and more of their own networks. • Challenges for the future include global coverage, scaling for growth, new technologies, telco mergers, acquisitions, failures - an industry always in flux.