the development of metrics for the information technology

advertisement
2001 Systems Engineering Capstone Conference • University of Virginia
THE DEVELOPMENT OF METRICS FOR THE INFORMATION TECHNOLOGY
DEPARTMENT OF AN INTERNET SERVICE PROVIDER
Student Team: Carolyn Bleck, Eugene Choi, Shantanu Rudra, Rohit Saroop, Mousmi Sharma
Faculty Advisors: Stephen D. Patek
Department of Systems Engineering
Client Advisors: Dorian Deane, Greg Gum, Adam Joseph, Frank Ludlow
UUNET, Inc.
Information Technology Team
Ashburn, VA
KEYWORDS: bad bugs, capacity planning, defects,
metrics, network planning, security incident
ABSTRACT
Internet Service Providers (ISPs) play a major role
in providing access to the Internet for commercial and
residential customers all over the world. The
Information Technology (IT) Departments of ISPs are
often tasked with (1) maintaining and developing
enterprise software, (2) designing and building a
network of servers to support business operations, and
(3) managing security of the enterprise. This Capstone
project worked with the IT Department of a leading ISP
to assist in the development of metrics to improve
performance. Metrics assist managers in justifying
budget demands, staffing issues, and modifying current
processes. The three areas of greatest concern to the IT
Department were network capacity and planning,
software evaluation metrics, and costing of security
incidents. Through extensive research and design,
defects in enterprise software were identified and
statistically characterized in consecutive version
releases, a process for conducting network and capacity
and planning was developed, and a cost-calculator was
developed to assist in the costing of security incidents.
INTRODUCTION
The IT Department of a leading ISP sought the
assistance of this UVA Systems Engineering Capstone
team in evaluating its current operations and in helping
to ensure that optimal service is being consistently
provided to their customers (other business units within
the ISP). The IT Department has concerns over their
current operations in the areas of internally developed
software products, network & capacity planning, and
security.
With regard to software development, the IT
Department is concerned with the occurrence and
mitigation of defects in consecutive releases of its code.
Defect correction requires additional resources, time,
and costs. As a part of this Capstone project, we
developed metrics that allow the IT Department to (1)
evaluate defect occurrence patterns and (2) make
recommendations for minimizing the impact of defects.
Metrics for characterizing defects have been chosen and
implemented for measuring the department’s software
development process.
With regard to enterprise software infrastructure, the
IT Department wants to ensure that decisions to
purchase servers and networking equipment are
beneficial to the business. The network and capacity
planning group within the IT Department is responsible
for allocating servers and network capacity to ensure
sufficient processing resources for user applications.
As a part of this Capstone project, we developed a
generic process for making informed decisions about
network and capacity planning. We developed metrics
to measure the effectiveness of this process.
With regard to security, the IT Department is
concerned with availability and integrity of its software
and network infrastructure. In general, ISPs currently
face a multitude of attacks from insider and outsider
threats, natural disasters and hackers. Unfortunately,
there is no current method in place for security
personnel to estimate the damages caused by each
potential incident. As a part of this Capstone project,
we created a costing tool to be used by security
personnel to determine the cost and risks of potential
incidents.
In the remainder of this paper we describe (1)
conclusions from the software defect analysis, (2) the
process for the network and capacity planning, and (3)
the results from the security incidents study.
SOFTWARE ANALYSIS
141
The Development of Metrics for the IT Department of an ISP
In studying the IT Department’s software
development process, defect data was collected from
the development of a specific, internal product. The IT
Department feels this product is an accurate
representation of its typical development process, and
the analysis of this product will help the ISP better
understand the impact of defects (software problems
that arise during development) and the steps to mitigate
the effects of defects. The product we studied has been
in development over the past few years, being rereleased in various versions for all kinds of
improvements and changes. Several significant trends
from the collected data were noticed. One is the high
frequency of Bad Bugs throughout the development
cycle. Bad Bugs are those defects found in the code
developed by the department’s software developers,
therefore caused by internal development. Another
trend is that defects occurred most frequently when the
product had a major release, such as a requirement
change/addition. Because version releases involve the
most amount of coding work, there is a greater
probability of defects arising in the corresponding
release than in releases for minor changes. Another
pattern is the lengthy average correction time for
addressing Database, Latent Code, and Bad Bug
defects. A final observation relates to the number of
Priority 2 defects and their correction times. Priority
level is simply the level of urgency for a defect to be
corrected, with Priority 1 being most urgent and
Priority 4 being least. Priority 2 defects occur more
frequently than other priorities, and are averaging the
most time to correct (Ludlow 2001).
In trying to understand the cause and effect of
defects, we conducted a statistical analysis to determine
which measurable attributes correlate best with indices
of performance for software development. We found
that Defect Reason and Version Number correlate best
with defect occurrence (number of problems arising
during product development). As one might expect, the
largest number of Priority 1 defects occurred in early
versions of the product’s development life, rather than
later versions. With regard to the time required to
correct defects, especially the occurrence of long
correction times, we found that Version Number,
Defect Reason, Ownership Changes, and Reporting
Department are the most significant predictors. The
number of ownership changes per defect turns out to
have a significant impact. Each defect, at any point in
time, is assigned a specific owner (software developer)
who is responsible for correcting the defect. The data
shows for a majority of defects, as number of
ownership changes increase, defect correction time
142
rises. There are various reasons why this could be
occurring, such as employee turnover (Ludlow 2001).
As a final aspect of our work in this area, we
analyzed the costs associated with correcting defects.
For proprietary reasons, we will not give specific
quantitative results here; we will focus instead on highlevel insights derived from the analysis. Defect costs
are split between the IT department and its customers
(other business units within the ISP).
Missing
Requirement and New Function Code defects translate
into customer costs since they are requested changes
made by the customer. For these defect types, the
department can charge development time to the
customer. The other defects are direct faults of the
department, and so are charged to the department.
From the available data, it appears that the net cost in
developing the software product of this study is
positive. In other words, the cost of IT-related defects
outweighs the revenue from customer-related revenues
associates with Missing Requirement and New
Function Code defects. The most expensive costs to the
department are Database, Latent Code, and Bad Bug
defects.
Based on the patterns and trends, the IT Department
should be primarily concerned with the large
occurrence of Bad Bugs and Priority 2 defects. They
should also be alerted by the long defect correction time
for Bad Bugs, Latent Code, and Database defects,
which are all costing the department 97% of their total
costs for defect correction. Again, ownership changes
show a strong correlation for this problem.
NETWORK AND CAPACITY PLANNING
In the past corporations spent billions of dollars on
network and server infrastructure to support software
applications (Grover and Teng 1998). Because there
was a large amount of money devoted to hardware, the
IT Department had the opportunity to purchase the best
equipment available on the market. Servers ran
between $70,000 and $100,000 per server and were
much more powerful than what users really needed.
But today as corporations decrease the number of IT
projects and the funding towards those projects,
engineers need to ensure they are buying the correct
hardware for a particular application (Gum 2001). In
order to correctly size hardware, the IT Department
must perform network and capacity planning.
In this paper, we present a generic process for
network and capacity planning in the IT Department of
an ISP and metrics to measure the effectiveness of this
2001 Systems Engineering Capstone Conference • University of Virginia
process. Engineers use this process to guarantee
sufficient processing resources before a new application
is deployed and the effective allotment of existing
processing resources (UUNET 2000). The network and
capacity planning group (NCPG) of this ISP is
responsible for allocating servers and network capacity
in a corporation. The metrics are relevant to any
network and capacity planning process. Senior
management and executives need these metrics because
they allow them to make informed business decisions
and determine the effectiveness of their network and
capacity planning group. As the NCPG more
accurately predicts future needs of systems, they will
save their corporations significant amounts of money
and more efficiently use new and existing technology.
NETWORK AND CAPACITY PLANNING
PROCESS
The main role of the NCPG is to work with IT
project managers to determine hardware and network
needs for a particular project. For example, if the sales
department wants a new sales application to track sales
leads, the NCPG needs to determine what server would
best suit the needs of the application. They also must
look at how the addition of that server will impact the
existing infrastructure and how users will be impacted
by its deployment (SUN 2000). The main elements of
this process are shown in Figure 1 below.
Executive Management
Information Technology Department
Network and Capacity Planning Group
Evaluate Project
Requirements and Service
Level Agreements (SLA)
Project Development
Maintenance
Set Standards and Do
Benchmarking on New
Technology
Performance Testing and
Benchmarking of Systems
Vendor Management
Deploy Projects
Start Systems Measurements
Procurement
The generic network and capacity planning process
begins with the determination of project requirements
and service level agreements. The requirements
document defines how a system should support an
application and its users and may contain information
on how the system should be built (Lamming and
Newman 2001). During project development,
application developers are finishing the application.
The NCPG determines what type of hardware and
network architecture is needed to support the project.
The application is then tested on the designed
architecture to determine what loads placed on the
system will cause it to fail and if the requirements
defined earlier were met with the purchased hardware
(Menasce and Almeida 1998). The application is then
deployed and the NCPG begins taking systems
measurements of the servers and network. These
include measurements of the central processing unit
(CPU), memory, and disk space (Gum 2001). During
maintenance the NCPG follows up with users on
problems they have reported.
Other areas of focus for the NCPG are to research
new technologies. For a company relying heavily on
technology to differentiate itself in the marketplace this
is an important area (Mieritz 2000). The hardware is
benchmarked and tested to determine if it will be
suitable for deployment. If IT projects are outsourced
to vendors, then the NCPG must also manage those
vendors. Responsibilities of procurement include
purchasing hardware, licensing, trade-ins, and upgrades
of the hardware.
The NCPG within the IT Department also interfaces
with executive management when major decisions need
approval. This would include hardware purchases,
upgrades, and decommissions (taking servers offline).
NETWORK AND CAPCITY PLANNING
METRICS
Rather than list the complete set of metrics for
network and capacity planning developed in the
Capstone project, we list only the metrics that relate to
the purchase and maintenance of servers and the
network as shown in Table 1. The NCPG does not
want to purchase new servers for every project.
Therefore they work to allocate existing servers
efficiently. To avoid an increase in the number of
hardware purchases, IT management will try to increase
the number of recycled servers and servers available
after consolidations. The number of decommissions,
Figure 1. Elements of the Network & Capacity Planning Process
143
The Development of Metrics for the IT Department of an ISP
number of upgrades and number of reconfigurations
should remain constant over time.
If over time, the number of reconfigurations, the
number of upgrades, and the number of hardware
purchases increases the process for network and
capacity planning is not working effectively. Having to
reconfigure or upgrade hardware indicates that the
requirements were not defined correctly. After several
months, a manager wants to see a reduction in the
number of hardware purchases and an increase in the
dollar amount saved from hardware purchases avoided.
In order to avoid hardware purchases, underutilized
servers can be consolidated to one machine, or
decommissioned servers can be recycled.
Most literature in the past has focused on how to
perform network and capacity planning while little
research is available on how to assess the process from
a managerial point of view. The current plan for these
metrics is to collect the relevant data at the ISP and then
look to see how well they correlate to the success of the
department. Future work for the metrics development
will include reviewing the metrics after several months
for possible improvements.
# of Hardware Purchases
# of Hardware Decommissions
# of Recycled servers
# of Servers Available after Consolidations
# of Hardware Purchases Avoided
# of Reconfigurations
# of Upgrades
Dollar Amount Saved from Hardware Purchases
Avoided
Dollar Amount of Hardware Purchases
Table 2. Hardware Metrics
IT SECURITY
Most IT Departments within ISPs are concerned
with the availability and integrity of its software and
network infrastructure. ISPs face a multitude of attacks
from both insider and outsider threats, natural disasters
and hackers. In the 2000 Computer Crime and Security
Survey, ninety percent of the respondents detected
security breaches in the past 12 months (CSI/ FBI
Survey 2000). Our goal in this project was to assess the
costs associated with various types of attacks faced by
an ISP. A significant constraint we faced in completing
this task is the almost complete lack of publicly
available, historical data relating to the cost of
incidents. In response, our main contribution is the
144
development of a “cost-calculator” which (1) queries
security personnel for their own estimates of cost data
and (2) presents estimates of the yearly cost to address
security using various security alternatives. The goal is
to provide security professionals with justification for
the security funding they request of senior management.
We designed the calculator to provide an easy-tounderstand format in which a security professional can
gauge the potential damage of an incident occurring at
his/her ISP. The calculator allows for individual
customization through user-entered parameter values.
The first step in developing the calculator was to
gain an understanding of the dangers faced by ISPs.
For this project, we compiled a list of potential security
risks from the available literature. We focused on
incidents in three major groups: financial fraud, system
penetration/ unauthorized access, and denial of service,
each of which is further divided into sub-incidents.
Financial fraud is defined as a financial loss incurred
from the unauthorized use of a credit card or account
number. System penetration is defined as an incident in
which authorized or unauthorized users view or modify
confidential information. A denial of service attack is
the intentional shutting-down of a network server
through a means of attack. Each major incident type
can be the result of either insider abuse or external
abuse. Insider attacks are defined as abuse by
employees internal to the company who modify account
information, use unauthorized access, steal data, attack
other systems hiding behind their company’s system,
etc. External abuse corresponds to individuals outside
of the company attacking the system (Deane 2001).
The next step in developing the calculator was to
determine the quantitative attributes that relate to the
cost of each incident type. For example, to assess the
cost of a stolen password, one must take into
consideration the following:
 Level of access given to users
 Preventative labor hours
 Software costs
 Hardware costs
 Corrective labor charges
 Hours lost to non-productivity
 Value of proprietary/confidential information
 Etc.
The latter attribute is an ambiguous number that will
most likely be estimated by the user of the calculator.
The division into attributes allows the user to break
down an incident into potential impacts and assess
damage on a more case-by-case basis. For example, the
attributes associated with a denial-of-service incident
2001 Systems Engineering Capstone Conference • University of Virginia
will be significantly different than the attributes
associated with a stolen password. The equation
relating the attributes to cost for a stolen password
incident is as follows:
Cost of prevention (number of laborers *
#hours * hourly rate + software costs +
security training of employees) + cost of
corrective action (number of laborers * #
hours * hourly rate + software used and
updated) + legal fees + cost of information
lost or stolen
The equation for denial-of-service is similar with cost
of prevention and cost of corrective action included, but
this time the focus shifts away from the value of the
data lost, to the loss of productivity and potential
business lost during the downtime.
The calculator takes the form of a Microsoft Visual
Basic program with a graphical user interface. The
calculator gives three options to the user: estimate the
cost for one incident, estimate annual costs associated
with an incident-type, or estimate the probability of
security incidents at the company. All of the equations
that relate incidents to cost are hard-coded into the
calculator, requiring the user only to input their
estimates for the parameters. The calculator will
approximate costs based on the estimated values on
their assets, legal costs, number of people necessary for
repair, etc.
CONCLUSION
The goal of this project was to evaluate current
operations of an ISP’s Information Technology
department. The evaluation is conducted in the areas of
greatest concern: software development, network
capacity, and security. Our results help the ISP better
understand its current operations, and suggest steps for
improving its operations.
the department, which this project was unable to
research due to data unavailability.
The network and capacity planning process
described in this paper is a generic process that could be
implemented by any IT Department. The process covers
all stages in a project’s lifecycle and also other
responsibilities of the NCPG such as new technology
evaluation. The metrics corresponding with a process
allows IT managers to make informed business
decisions. Once the metrics were determined, we
developed a front-end system to record and track the
metrics over time. In the future the system should be
implemented using a database for the backend to store
the data and graph certain metrics over time.
The creation of the calculator fills a necessary void
in the field of security engineering. Security engineers
are inundated with information regarding the threats
that the network/ system of their company faces and the
methods available to them to effectively prepare and
combat these. It is expected that there will be an
estimated 55% surge in security spending by the year
2002 (CSI/FBI Survey 2000). This calculator helps
security professionals focus their security efforts better.
Unfortunately, although aware of the dangers they face,
they are unable to justify to upper management the need
to implement these technological innovations. If the
security engineer can show that the proposed firewall is
necessary to avoid incurring $300,000 in damage
associated with stolen passwords, management may be
more willing to allocate funding. In addition to
providing justification for budget proposals, the
calculator allows the user to determine what risks could
be of greatest damage to the company. The calculator
provides a useful method of calculating incident costs
as opposed to the current method of back-of-theenvelope calculations.
REFERENCES
The statistical analysis of software defects can be
applied to other internally developed software products
for this ISP’s IT department. Though the analysis was
conducted for one specific software product, the ISP’s
IT Department feels the issues revealed apply more
generally to other software products developed inhouse. From the results, it is apparent that the ISP
needs to find methods for minimizing the occurrence of
Bad Bugs, Latent Code, and Database defects. These
are most detrimental to the department in terms of costs
and time necessary for correcting them. As mentioned,
there is a strong correlation between increasing
ownership changes and longer defect correction times.
This could have to do with employee turnover within
Deane, Dorian. (2001) Manager of Network Security
Information Technology Department, MCI Worldcom,
Ashburn, VA.
Grover, V., J.T.C. Teng, and K.D. Fiedler. (1998). “IS
Investment Priorities in Contemporary Organizations.”
Communications of the ACM 41, no. 2 (Feb.): 40-48.
Gum, Greg. (April 6, 2000). Senior Engineer,
Information Technology Department. MCI Worldcom,
Ashburn, VA. Personal Interview.
145
The Development of Metrics for the IT Department of an ISP
Huneke William. (22 Sept. 2000) Director of Planning
& Infrastructure, Information Technology Department
MCI Worldcom, Ashburn, VA. Personal Interview.
Joseph, Adam. (2001) Manager, Capacity, Network,&
Security Planning, Information Technology
Department. MCI Worldcom, Ashburn, VA.
Lamming M.G. and W.M. Newman. (2001)
“Interactive Systems Design.” (March 19): 1-2.
<http://cgi.student.nada.kth.se/cgi-bin/d95aeh/get/intsyst4eng>.
Ludlow, Frank. (2001) Senior Manager Information
Technology Department, MCI Worldcom,
Ashburn, VA.
Menasce, D.A. and V.A.F. Almeida. 1998. Capacity
Planning for Web Performance : Metrics, Models, and
Methods. Prentice-Hall : Englewood Cliffs, NJ.
Mieritz, L. (2000). “Track and Communicate the IS
Organization’s Contribution.” (Oct. 11): 1-4.
<http://www3.gartner.com>.
Sun Microsystems. (2000). “Enterprise Operations:
Performance Monitoring and Capacity Planning.”
(Dec. 7).
<http://sun.com/service/sunps/enterprise/performance.html>.
2000 CSI/ FBI Computer Crime and Security Survey.
(March 2000) Computer Security Institute in
collaboration with the San Francisco Federal Bureau of
Investigations.
UUNET. (2000). UUNET IT – P&I Capacity Planning
Charter.
BIOGRAPHIES
Carolyn Bleck is a fourth year systems engineering
student from Chelmsford, MA. Her focus is in
management information systems. Her primary role in
the project was the development of the network and
capacity planning process, metrics, and the application
to track those metrics. Following graduation, Ms.
Bleck will be working as a consultant at
PriceWaterhouse Coopers in Boston, MA.
Eugene Choi is a fourth-year Systems Engineering
student from Osan Air Base, South Korea concentrating
in Information Systems. His principal concentration to
the group was in the development and design of the
costing calculator using Microsoft Visual Basic. Upon
graduation, Mr. Choi will be commissioned as a 2nd
Lieutenant of the United States Air Force and will head
146
out to Kadena Air Base, Okinawa, Japan as an aircraft
maintenance officer.
Shantanu Rudra is a first year graduate student in the
Systems and Information engineering department His
focus is in management information systems. His
primary role in the project was the development of the
network and capacity planning process, metrics, and the
application. He plans to continue his studies as a
graduate student in the Systems and Information
engineering department.
Rohit Saroop is a fourth-year Systems Engineering
major from Fairfax Station, VA, concentrating in
management information systems. His primary focus
on this project was developing software metrics for
determining the impact of defects from internally
developed software for this ISP. Mr. Saroop has
accepted a position with Cap Gemini Ernst & Young in
Mclean, VA, and will begin work following May
graduation.
Mousmi Sharma is a fourth-year Systems Engineering
major from Fairfax, VA and is concentrating in
management systems. Her principal contribution to the
group was in the research of security incidents, their
attributes and the design of the calculator. Ms. Sharma
will be working as an investment banking analyst with
the firm JP Morgan H&Q in New York City upon
graduation.
Download