2001 Systems Engineering Capstone Conference • University of Virginia THE DEVELOPMENT OF METRICS FOR THE INFORMATION TECHNOLOGY DEPARTMENT OF AN INTERNET SERVICE PROVIDER Student Team: Carolyn Bleck, Eugene Choi, Shantanu Rudra, Rohit Saroop, Mousmi Sharma Faculty Advisors: Stephen D. Patek Department of Systems Engineering Client Advisors: Dorian Deane, Greg Gum, Adam Joseph, Frank Ludlow UUNET, Inc. Information Technology Team Ashburn, VA KEYWORDS: bad bugs, capacity planning, defects, metrics, network planning, security incident ABSTRACT Internet Service Providers (ISPs) play a major role in providing access to the Internet for commercial and residential customers all over the world. The Information Technology (IT) Departments of ISPs are often tasked with (1) maintaining and developing enterprise software, (2) designing and building a network of servers to support business operations, and (3) managing security of the enterprise. This Capstone project worked with the IT Department of a leading ISP to assist in the development of metrics to improve performance. Metrics assist managers in justifying budget demands, staffing issues, and modifying current processes. The three areas of greatest concern to the IT Department were network capacity and planning, software evaluation metrics, and costing of security incidents. Through extensive research and design, defects in enterprise software were identified and statistically characterized in consecutive version releases, a process for conducting network and capacity and planning was developed, and a cost-calculator was developed to assist in the costing of security incidents. INTRODUCTION The IT Department of a leading ISP sought the assistance of this UVA Systems Engineering Capstone team in evaluating its current operations and in helping to ensure that optimal service is being consistently provided to their customers (other business units within the ISP). The IT Department has concerns over their current operations in the areas of internally developed software products, network & capacity planning, and security. With regard to software development, the IT Department is concerned with the occurrence and mitigation of defects in consecutive releases of its code. Defect correction requires additional resources, time, and costs. As a part of this Capstone project, we developed metrics that allow the IT Department to (1) evaluate defect occurrence patterns and (2) make recommendations for minimizing the impact of defects. Metrics for characterizing defects have been chosen and implemented for measuring the department’s software development process. With regard to enterprise software infrastructure, the IT Department wants to ensure that decisions to purchase servers and networking equipment are beneficial to the business. The network and capacity planning group within the IT Department is responsible for allocating servers and network capacity to ensure sufficient processing resources for user applications. As a part of this Capstone project, we developed a generic process for making informed decisions about network and capacity planning. We developed metrics to measure the effectiveness of this process. With regard to security, the IT Department is concerned with availability and integrity of its software and network infrastructure. In general, ISPs currently face a multitude of attacks from insider and outsider threats, natural disasters and hackers. Unfortunately, there is no current method in place for security personnel to estimate the damages caused by each potential incident. As a part of this Capstone project, we created a costing tool to be used by security personnel to determine the cost and risks of potential incidents. In the remainder of this paper we describe (1) conclusions from the software defect analysis, (2) the process for the network and capacity planning, and (3) the results from the security incidents study. SOFTWARE ANALYSIS 141 The Development of Metrics for the IT Department of an ISP In studying the IT Department’s software development process, defect data was collected from the development of a specific, internal product. The IT Department feels this product is an accurate representation of its typical development process, and the analysis of this product will help the ISP better understand the impact of defects (software problems that arise during development) and the steps to mitigate the effects of defects. The product we studied has been in development over the past few years, being rereleased in various versions for all kinds of improvements and changes. Several significant trends from the collected data were noticed. One is the high frequency of Bad Bugs throughout the development cycle. Bad Bugs are those defects found in the code developed by the department’s software developers, therefore caused by internal development. Another trend is that defects occurred most frequently when the product had a major release, such as a requirement change/addition. Because version releases involve the most amount of coding work, there is a greater probability of defects arising in the corresponding release than in releases for minor changes. Another pattern is the lengthy average correction time for addressing Database, Latent Code, and Bad Bug defects. A final observation relates to the number of Priority 2 defects and their correction times. Priority level is simply the level of urgency for a defect to be corrected, with Priority 1 being most urgent and Priority 4 being least. Priority 2 defects occur more frequently than other priorities, and are averaging the most time to correct (Ludlow 2001). In trying to understand the cause and effect of defects, we conducted a statistical analysis to determine which measurable attributes correlate best with indices of performance for software development. We found that Defect Reason and Version Number correlate best with defect occurrence (number of problems arising during product development). As one might expect, the largest number of Priority 1 defects occurred in early versions of the product’s development life, rather than later versions. With regard to the time required to correct defects, especially the occurrence of long correction times, we found that Version Number, Defect Reason, Ownership Changes, and Reporting Department are the most significant predictors. The number of ownership changes per defect turns out to have a significant impact. Each defect, at any point in time, is assigned a specific owner (software developer) who is responsible for correcting the defect. The data shows for a majority of defects, as number of ownership changes increase, defect correction time 142 rises. There are various reasons why this could be occurring, such as employee turnover (Ludlow 2001). As a final aspect of our work in this area, we analyzed the costs associated with correcting defects. For proprietary reasons, we will not give specific quantitative results here; we will focus instead on highlevel insights derived from the analysis. Defect costs are split between the IT department and its customers (other business units within the ISP). Missing Requirement and New Function Code defects translate into customer costs since they are requested changes made by the customer. For these defect types, the department can charge development time to the customer. The other defects are direct faults of the department, and so are charged to the department. From the available data, it appears that the net cost in developing the software product of this study is positive. In other words, the cost of IT-related defects outweighs the revenue from customer-related revenues associates with Missing Requirement and New Function Code defects. The most expensive costs to the department are Database, Latent Code, and Bad Bug defects. Based on the patterns and trends, the IT Department should be primarily concerned with the large occurrence of Bad Bugs and Priority 2 defects. They should also be alerted by the long defect correction time for Bad Bugs, Latent Code, and Database defects, which are all costing the department 97% of their total costs for defect correction. Again, ownership changes show a strong correlation for this problem. NETWORK AND CAPACITY PLANNING In the past corporations spent billions of dollars on network and server infrastructure to support software applications (Grover and Teng 1998). Because there was a large amount of money devoted to hardware, the IT Department had the opportunity to purchase the best equipment available on the market. Servers ran between $70,000 and $100,000 per server and were much more powerful than what users really needed. But today as corporations decrease the number of IT projects and the funding towards those projects, engineers need to ensure they are buying the correct hardware for a particular application (Gum 2001). In order to correctly size hardware, the IT Department must perform network and capacity planning. In this paper, we present a generic process for network and capacity planning in the IT Department of an ISP and metrics to measure the effectiveness of this 2001 Systems Engineering Capstone Conference • University of Virginia process. Engineers use this process to guarantee sufficient processing resources before a new application is deployed and the effective allotment of existing processing resources (UUNET 2000). The network and capacity planning group (NCPG) of this ISP is responsible for allocating servers and network capacity in a corporation. The metrics are relevant to any network and capacity planning process. Senior management and executives need these metrics because they allow them to make informed business decisions and determine the effectiveness of their network and capacity planning group. As the NCPG more accurately predicts future needs of systems, they will save their corporations significant amounts of money and more efficiently use new and existing technology. NETWORK AND CAPACITY PLANNING PROCESS The main role of the NCPG is to work with IT project managers to determine hardware and network needs for a particular project. For example, if the sales department wants a new sales application to track sales leads, the NCPG needs to determine what server would best suit the needs of the application. They also must look at how the addition of that server will impact the existing infrastructure and how users will be impacted by its deployment (SUN 2000). The main elements of this process are shown in Figure 1 below. Executive Management Information Technology Department Network and Capacity Planning Group Evaluate Project Requirements and Service Level Agreements (SLA) Project Development Maintenance Set Standards and Do Benchmarking on New Technology Performance Testing and Benchmarking of Systems Vendor Management Deploy Projects Start Systems Measurements Procurement The generic network and capacity planning process begins with the determination of project requirements and service level agreements. The requirements document defines how a system should support an application and its users and may contain information on how the system should be built (Lamming and Newman 2001). During project development, application developers are finishing the application. The NCPG determines what type of hardware and network architecture is needed to support the project. The application is then tested on the designed architecture to determine what loads placed on the system will cause it to fail and if the requirements defined earlier were met with the purchased hardware (Menasce and Almeida 1998). The application is then deployed and the NCPG begins taking systems measurements of the servers and network. These include measurements of the central processing unit (CPU), memory, and disk space (Gum 2001). During maintenance the NCPG follows up with users on problems they have reported. Other areas of focus for the NCPG are to research new technologies. For a company relying heavily on technology to differentiate itself in the marketplace this is an important area (Mieritz 2000). The hardware is benchmarked and tested to determine if it will be suitable for deployment. If IT projects are outsourced to vendors, then the NCPG must also manage those vendors. Responsibilities of procurement include purchasing hardware, licensing, trade-ins, and upgrades of the hardware. The NCPG within the IT Department also interfaces with executive management when major decisions need approval. This would include hardware purchases, upgrades, and decommissions (taking servers offline). NETWORK AND CAPCITY PLANNING METRICS Rather than list the complete set of metrics for network and capacity planning developed in the Capstone project, we list only the metrics that relate to the purchase and maintenance of servers and the network as shown in Table 1. The NCPG does not want to purchase new servers for every project. Therefore they work to allocate existing servers efficiently. To avoid an increase in the number of hardware purchases, IT management will try to increase the number of recycled servers and servers available after consolidations. The number of decommissions, Figure 1. Elements of the Network & Capacity Planning Process 143 The Development of Metrics for the IT Department of an ISP number of upgrades and number of reconfigurations should remain constant over time. If over time, the number of reconfigurations, the number of upgrades, and the number of hardware purchases increases the process for network and capacity planning is not working effectively. Having to reconfigure or upgrade hardware indicates that the requirements were not defined correctly. After several months, a manager wants to see a reduction in the number of hardware purchases and an increase in the dollar amount saved from hardware purchases avoided. In order to avoid hardware purchases, underutilized servers can be consolidated to one machine, or decommissioned servers can be recycled. Most literature in the past has focused on how to perform network and capacity planning while little research is available on how to assess the process from a managerial point of view. The current plan for these metrics is to collect the relevant data at the ISP and then look to see how well they correlate to the success of the department. Future work for the metrics development will include reviewing the metrics after several months for possible improvements. # of Hardware Purchases # of Hardware Decommissions # of Recycled servers # of Servers Available after Consolidations # of Hardware Purchases Avoided # of Reconfigurations # of Upgrades Dollar Amount Saved from Hardware Purchases Avoided Dollar Amount of Hardware Purchases Table 2. Hardware Metrics IT SECURITY Most IT Departments within ISPs are concerned with the availability and integrity of its software and network infrastructure. ISPs face a multitude of attacks from both insider and outsider threats, natural disasters and hackers. In the 2000 Computer Crime and Security Survey, ninety percent of the respondents detected security breaches in the past 12 months (CSI/ FBI Survey 2000). Our goal in this project was to assess the costs associated with various types of attacks faced by an ISP. A significant constraint we faced in completing this task is the almost complete lack of publicly available, historical data relating to the cost of incidents. In response, our main contribution is the 144 development of a “cost-calculator” which (1) queries security personnel for their own estimates of cost data and (2) presents estimates of the yearly cost to address security using various security alternatives. The goal is to provide security professionals with justification for the security funding they request of senior management. We designed the calculator to provide an easy-tounderstand format in which a security professional can gauge the potential damage of an incident occurring at his/her ISP. The calculator allows for individual customization through user-entered parameter values. The first step in developing the calculator was to gain an understanding of the dangers faced by ISPs. For this project, we compiled a list of potential security risks from the available literature. We focused on incidents in three major groups: financial fraud, system penetration/ unauthorized access, and denial of service, each of which is further divided into sub-incidents. Financial fraud is defined as a financial loss incurred from the unauthorized use of a credit card or account number. System penetration is defined as an incident in which authorized or unauthorized users view or modify confidential information. A denial of service attack is the intentional shutting-down of a network server through a means of attack. Each major incident type can be the result of either insider abuse or external abuse. Insider attacks are defined as abuse by employees internal to the company who modify account information, use unauthorized access, steal data, attack other systems hiding behind their company’s system, etc. External abuse corresponds to individuals outside of the company attacking the system (Deane 2001). The next step in developing the calculator was to determine the quantitative attributes that relate to the cost of each incident type. For example, to assess the cost of a stolen password, one must take into consideration the following: Level of access given to users Preventative labor hours Software costs Hardware costs Corrective labor charges Hours lost to non-productivity Value of proprietary/confidential information Etc. The latter attribute is an ambiguous number that will most likely be estimated by the user of the calculator. The division into attributes allows the user to break down an incident into potential impacts and assess damage on a more case-by-case basis. For example, the attributes associated with a denial-of-service incident 2001 Systems Engineering Capstone Conference • University of Virginia will be significantly different than the attributes associated with a stolen password. The equation relating the attributes to cost for a stolen password incident is as follows: Cost of prevention (number of laborers * #hours * hourly rate + software costs + security training of employees) + cost of corrective action (number of laborers * # hours * hourly rate + software used and updated) + legal fees + cost of information lost or stolen The equation for denial-of-service is similar with cost of prevention and cost of corrective action included, but this time the focus shifts away from the value of the data lost, to the loss of productivity and potential business lost during the downtime. The calculator takes the form of a Microsoft Visual Basic program with a graphical user interface. The calculator gives three options to the user: estimate the cost for one incident, estimate annual costs associated with an incident-type, or estimate the probability of security incidents at the company. All of the equations that relate incidents to cost are hard-coded into the calculator, requiring the user only to input their estimates for the parameters. The calculator will approximate costs based on the estimated values on their assets, legal costs, number of people necessary for repair, etc. CONCLUSION The goal of this project was to evaluate current operations of an ISP’s Information Technology department. The evaluation is conducted in the areas of greatest concern: software development, network capacity, and security. Our results help the ISP better understand its current operations, and suggest steps for improving its operations. the department, which this project was unable to research due to data unavailability. The network and capacity planning process described in this paper is a generic process that could be implemented by any IT Department. The process covers all stages in a project’s lifecycle and also other responsibilities of the NCPG such as new technology evaluation. The metrics corresponding with a process allows IT managers to make informed business decisions. Once the metrics were determined, we developed a front-end system to record and track the metrics over time. In the future the system should be implemented using a database for the backend to store the data and graph certain metrics over time. The creation of the calculator fills a necessary void in the field of security engineering. Security engineers are inundated with information regarding the threats that the network/ system of their company faces and the methods available to them to effectively prepare and combat these. It is expected that there will be an estimated 55% surge in security spending by the year 2002 (CSI/FBI Survey 2000). This calculator helps security professionals focus their security efforts better. Unfortunately, although aware of the dangers they face, they are unable to justify to upper management the need to implement these technological innovations. If the security engineer can show that the proposed firewall is necessary to avoid incurring $300,000 in damage associated with stolen passwords, management may be more willing to allocate funding. In addition to providing justification for budget proposals, the calculator allows the user to determine what risks could be of greatest damage to the company. The calculator provides a useful method of calculating incident costs as opposed to the current method of back-of-theenvelope calculations. REFERENCES The statistical analysis of software defects can be applied to other internally developed software products for this ISP’s IT department. Though the analysis was conducted for one specific software product, the ISP’s IT Department feels the issues revealed apply more generally to other software products developed inhouse. From the results, it is apparent that the ISP needs to find methods for minimizing the occurrence of Bad Bugs, Latent Code, and Database defects. These are most detrimental to the department in terms of costs and time necessary for correcting them. As mentioned, there is a strong correlation between increasing ownership changes and longer defect correction times. This could have to do with employee turnover within Deane, Dorian. (2001) Manager of Network Security Information Technology Department, MCI Worldcom, Ashburn, VA. Grover, V., J.T.C. Teng, and K.D. Fiedler. (1998). “IS Investment Priorities in Contemporary Organizations.” Communications of the ACM 41, no. 2 (Feb.): 40-48. Gum, Greg. (April 6, 2000). Senior Engineer, Information Technology Department. MCI Worldcom, Ashburn, VA. Personal Interview. 145 The Development of Metrics for the IT Department of an ISP Huneke William. (22 Sept. 2000) Director of Planning & Infrastructure, Information Technology Department MCI Worldcom, Ashburn, VA. Personal Interview. Joseph, Adam. (2001) Manager, Capacity, Network,& Security Planning, Information Technology Department. MCI Worldcom, Ashburn, VA. Lamming M.G. and W.M. Newman. (2001) “Interactive Systems Design.” (March 19): 1-2. <http://cgi.student.nada.kth.se/cgi-bin/d95aeh/get/intsyst4eng>. Ludlow, Frank. (2001) Senior Manager Information Technology Department, MCI Worldcom, Ashburn, VA. Menasce, D.A. and V.A.F. Almeida. 1998. Capacity Planning for Web Performance : Metrics, Models, and Methods. Prentice-Hall : Englewood Cliffs, NJ. Mieritz, L. (2000). “Track and Communicate the IS Organization’s Contribution.” (Oct. 11): 1-4. <http://www3.gartner.com>. Sun Microsystems. (2000). “Enterprise Operations: Performance Monitoring and Capacity Planning.” (Dec. 7). <http://sun.com/service/sunps/enterprise/performance.html>. 2000 CSI/ FBI Computer Crime and Security Survey. (March 2000) Computer Security Institute in collaboration with the San Francisco Federal Bureau of Investigations. UUNET. (2000). UUNET IT – P&I Capacity Planning Charter. BIOGRAPHIES Carolyn Bleck is a fourth year systems engineering student from Chelmsford, MA. Her focus is in management information systems. Her primary role in the project was the development of the network and capacity planning process, metrics, and the application to track those metrics. Following graduation, Ms. Bleck will be working as a consultant at PriceWaterhouse Coopers in Boston, MA. Eugene Choi is a fourth-year Systems Engineering student from Osan Air Base, South Korea concentrating in Information Systems. His principal concentration to the group was in the development and design of the costing calculator using Microsoft Visual Basic. Upon graduation, Mr. Choi will be commissioned as a 2nd Lieutenant of the United States Air Force and will head 146 out to Kadena Air Base, Okinawa, Japan as an aircraft maintenance officer. Shantanu Rudra is a first year graduate student in the Systems and Information engineering department His focus is in management information systems. His primary role in the project was the development of the network and capacity planning process, metrics, and the application. He plans to continue his studies as a graduate student in the Systems and Information engineering department. Rohit Saroop is a fourth-year Systems Engineering major from Fairfax Station, VA, concentrating in management information systems. His primary focus on this project was developing software metrics for determining the impact of defects from internally developed software for this ISP. Mr. Saroop has accepted a position with Cap Gemini Ernst & Young in Mclean, VA, and will begin work following May graduation. Mousmi Sharma is a fourth-year Systems Engineering major from Fairfax, VA and is concentrating in management systems. Her principal contribution to the group was in the research of security incidents, their attributes and the design of the calculator. Ms. Sharma will be working as an investment banking analyst with the firm JP Morgan H&Q in New York City upon graduation.