Analysis of Anomalies and Failures in Dynamic Web Applications

advertisement
Analysis of Anomalies and Failures in Dynamic Web Applications
Nasser Alaeddine and Jeff Tian
Department of Computer Science and Engineering
Southern Methodist University
Dallas, Texas, USA
Email: nalaeddi@mail.smu.edu
Email: tian@engr.smu.edu
Abstract
Companies today enable digital relationships with
customers, suppliers, and employees by delivering
compelling functionality that saves time and money.
As web applications are growing in the number of
dynamic pages relative to the static pages, there is a
need for effective ways to improve web quality and
reliability. This paper presents an effective method to
analyze, classify, and prioritize anomalies in dynamic
web applications. The analysis technique addressed in
this paper introduces a formalized procedure that uses
the server access log and defect data detected in a
development or system maintenance cycle to identify
errors with high frequency. Fixing the top faults
identified provides a cost effective and efficient
mechanism to improve the reliability and quality of
web applications. The results of applying our
approach to a case study from telecommunication
industry are included to show its applicability and
effectiveness.
1. Introduction
Industries such as manufacturing, travel, banking,
education, and government are web-enabled to
improve and enhance their operations. E-commerce
has expanded quickly, cutting across national
boundaries [8]. However, the shortened development
cycles and constant evolution make it more difficult to
assure the quality and reliability of web applications
[2]. Opportunities exist for new techniques to analyze
defects and anomalies automatically or semiautomatically, guiding the user toward cost-effective
methods for fixing defects.
Web pages can be static or dynamic. While the
content of static web pages is fixed, the content of
dynamic pages is computed at run time by the server
[5], and may depend on information provided by the
user. Many studies have mostly focused on the weberror analysis of static web pages. An analysis
technique based on orthogonal defect classification for
static web pages was proposed to extract information
from existing web server logs and classify them based
on their response code [1].
However, the nature of dynamic web sites is
fundamentally different from static web sites. In this
paper, we propose a new technique to analyze failures
in dynamic web applications.
The failures are
collected from web server logs and defect data. The
server access log contains a trace of the HTTP
processed requests and responses, and is hosted on the
web server side. The defect data are the software bugs
that are detected during software development or a
system maintenance cycle, and are stored in a
centralized repository for tracking. Each defect record
corresponds to a single software failure. The term
“defect” generally refers to some problem with the
software, either with its external behavior or with its
internal characteristics [6].
The proposed analysis technique will prioritize
web failures based on their high usage frequency.
Fixing the errors guided by the web fault priority list
will result in cost-effective reliability improvement.
The paper also includes applying the analysis
approach to a case study from telecommunication
industry and lists the findings of the error
classification and web reliability improvement.
2. Components of the dynamic web
application
Dynamic sites are highly intertwined with the
environment (browsers, operating systems, database
engines, web servers, and interfaces to onsite or offsite
applications). Dynamic web applications are complex,
and integrate a wide range of technologies:
 Scripting languages that run within HTML on
the client side (JavaScript, Visual Basic
Script)
 Interpretive languages that run on the server
(Perl)
 Compiled module languages (Servlets,
activeX, applets)
 Scripted page modules that run on the server
(JSPs, ASP.Net, PHP)
 General purpose programming languages
(Java, C#)
 Programming
language
extensions
(JavaBeans, EJBs)
 Data manipulation languages (XML)
 Databases
In a dynamic web application, the HTML
document’s content and form are determined not just
by input, but also in part by the state on the server,
such as the date or time, the user, location, or session
information. Dynamic web applications can be broken
down into the following components:
1) Presentation layer: includes the following parts:
a. Static links: A static link is the same to all
users and has no dependency on user input,
time, server or location. The testing usually
focuses on link validation.
b. Dynamic links: A dynamic link is generated
by software components for specific input,
time or location. This is more difficult to test
due to the coupling between input, software,
and generated links.
c. Static pages: A static web page is unvarying
and is the same to all users. It is usually
stored as an HTML file on the server. Once
tested and passed, then all is well
d. Dynamic pages / context aware content: A
dynamic web page is created by a program on
demand, and its contents may be determined
by previous inputs from the user, the state on
the web server, and other inputs, such as the
2)
3)
4)
5)
6)
7)
location of the user, the user's browser or
operating system, and the time of day.
e. Frame / Page Layout: Instructs the browser
on how to lay out and display the data
Backend connectors / interfaces / database
connections: Web interfaces with legacy systems
and backend applications within the same domain.
The interfaces may be onsite or offsite (i.e., web
services). This also includes the data layer, which
is responsible only for receiving, storing, and
manipulating the data.
Business logic: The set of rules that details access
to the data and particulars on the presentation
side.
Databases: These store user data, such as the
items being ordered or data that the user is
requesting, such as a product catalog.
Session management: This includes the
implementation of the session management, such
as cookies, session IDs and hidden forms.
Cache: This includes the caching mechanism for
the browser, server, and database.
Environment / configuration / deployment: This
includes the configuration management of the
web application. Today’s technology allows
deploying web components dynamically during
execution, and these new components can be
detected and used.
3. Anomalies and failures in dynamic web
applications
The quality of a web application is a complex
multidimensional attribute that involves several
attributes:
correctness,
reliability,
usability,
accessibility, security, performance, and conformity
with standards [10]. In software, an error is usually
due to a programmer’s action or omission that results
in a fault. A fault is a software defect that causes a
failure, and a failure is the unacceptable departure of a
program’s operation from program requirements [7].
The term “software reliability” can be defined as the
probability of fault free operations for a specific
duration under a specific environment [7].
Following the web community conventions, we are
going to refer to faults as ‘errors’ in the sections
below.
3.1 HTTP response code standards
The HTTP server logs record all processed
requests and the corresponding responses. The
response status codes are returned to the client making
the request and also recorded in the server's log file.
Based on HTTP error standards, responses with
response codes between 400 and 599 are classified as
failures. The HTTP response code standards are listed
below:
3.
4.
5.
Response Code
Range
100 to 199
200 to 299
300 to 399
400 to 499
500 to 599
Description of response code
Informational status codes, rarely
used.
Successful, only 200 frequently used
Warning - but the request may still
be success
Client Error, the request was invalid
in some way.
Server Error, the server could not
fulfill the (valid) request
3.2 Web fault classification
Although some HTTP responses will carry
successful response codes, these responses may not
meet the software requirement specifications, and will
be considered a fault. This fault will not appear in
server access logs and may only be detected during
development or system maintenance cycles. On the
other hand, HTTP responses with a failed response
code will be recorded in the server access log.
However, these HTTP faults may not be detected by
testing, or may not be reported by customers during
operation.
We analyzed the defect data and produced a list of
the expected failures in the dynamic web applications.
The list below includes a brief description of these
errors:
1.
2.
Cache errors: These include browser, server
and database caching. For example, errors
may occur when new data is loaded and
cache is not refreshed.
Application Interface errors: These include
the proxies’ calls to backend applications,
legacy, and databases. The interfaced
applications may be onsite or offsite. For
example, errors may be due to interface
definition mismatches, or missing output
fields.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
Session / cookie errors: These include
session management and the handling of the
session states in a load-balanced network.
Errors may be due to the loss of session
information during the transaction.
Concurrence / multiple user errors: These
are due to the multi-threaded access of
resources.
Environment / configuration / deployment
errors: These occur when errors are due to
incorrect or missing configuration entities.
This also includes the compatibility issues
due to a wide variety of platforms and
versions.
Missing files: when the error is due to
missing file.
Broken or malformed links: when the error is
due to a broken or malformed link.
User interface code error: This includes the
code that runs on the client side, such as
scripts, applets, and browser plug-ins. For
example, this occurs when errors are due to
compilation, runtime, accessibility, frame,
and page layout.
Logic, computation, and algorithm errors:
This includes code that runs on the server
side. This occurs when errors are due to the
missing or wrong implementation of business
logic and programming errors.
Wrong output state: when the obtained page
is different from the expected one.
Input constraint / validation error: This is
usually a function of the client side. This
occurs when errors are due to input nonconformity to software specifications.
Missing verbiage: when verbiage within a
page is missing or incorrect.
Missing Input fields: when the input fields are
missing or incorrect, such as filters or
selections that do not include all the required
options, and input text boxes not being shown
on the page.
Data error: when the retrieved data do not
conform to the interface definition due to data
corruption or the absence of mandatory
fields.
Initialization error: when errors are due to
the initialization failure of an object or
variables.
Runtime exception error
User operational behavior error: Users may
perform unexpected actions, and applications
may not be coded to handle the novice users’
actions. The operational behavior includes
the use of the back button, the forward
button, URL rewriting, and clicking ‘submit’
multiple times.
4.
It is very effective to obtain a collective view of
errors from the server access log, and the collection of
defect data.
5.
4. Analysis technique strategy
6.
The analysis technique will consider faults from
server access log and defect data, and will prioritize
faults based on the usage frequency. The priority in
the error fixing process will be given to the faults with
high frequency, which will result in the effective
improvement of the web application reliability.
The proposed analysis technique works as
follows:
Classification
of HTTP
Responses
Faults with
usage
Frequency
Server access
log
Hits with
response code
in 200 & 300
categories
Priority list
of top faults
Defect Data
Classification
of defect
information
Faults with
usage
Frequency
Application
Operational
Profile
Defect
Impact
Schema
1.
2.
3.
Classify the defect data and find the top classes of
errors.
Response codes within the server access logs are
utilized to classify the HTTP responses. Failed
responses will include a response code in one of
the 400 or 500 HTTP failure error categories. The
HTTP error usage frequency will be calculated
from the server access log.
From step 2, select the HTTP error response code
with the highest error rate, and then find the usage
frequency of the individual faults. For example, if
7.
404 errors are the highest rate, then find the top
missing files that result in the high percentage of
failures.
Define the operational profile of the web
application. An operational profile is a list of
disjointed sets of operations and their associated
probabilities of occurrence [3].
Based on the data from step 2, find the number of
hits that have a corresponding response code in
200 and 300 categories.
Use the defined operational profile and the
number of hits calculated in step 5 to determine
the number of transactions processed every day
for each operation.
Define the defect impact schema. The web faults
will be divided into the following categories and
will be given the indicated weights:
Impact
Description
Weight
Showstopper Prevents the completion of a
100%
transaction
High
Affects a central requirement
70%
Medium
Affects non-central requirement 50%
and there is no workaround
Low
Affects non-central requirement 20%
for which there is a workaround
Exception
Affects non-conformity to a
5%
standard
8.
9.
Use the data from steps 6 and 7 to calculate the
usage frequency for the faults-per-operation and
defect impact schema.
Combine the classification list from steps 3 and 8
to identify the priority list of faults with the
highest usage frequency.
5. Case Study
To classify and evaluate the priorities of the web
defects, we applied the analysis technique to the
captured server access log file and collected defect
data of a deployed web application product “A”.
Product “A” is a web application from
telecommunication industry that consists of hundreds
of thousands of lines of code.
5.1 Classification of defect data
The first step in our analysis method is to classify
the defect data of product “A” and find the top classes
of errors. The table below shows the collective view of
Cache
200/300
1.20%
2
Interfaces
200/300
27.11%
3
200/300
200/300/
400/500
200/300/
400/500
400
0.00%
400
6.63%
200/300
18.67%
200/300
20.48%
200/300
3.01%
200/300
1.81%
12
Session/cookies
Concurrence/ multiple
users
Environment/configurati
on/deployment
Missing files
Broken or missing or
malformed links
User interface code
Code logic, computation
and algorithm
Wrong output state
Input
constraint/validation
errors
Missing verbiage
200/300
9.04%
13
Missing Input fields
200/300
2.41%
14
Data issues
200/300
3.01%
15
Initialization
200/300
0.00%
16
Runtime Failures
200/300
0.00%
17
Operational behavior
200/300
0.00%
100.00
%
4
5
6
7
8
9
10
11
30%
25%
20%
15%
10%
5%
0%
0.00%
0.00%
cache
1
Defect Data classes
Data issue
total
Missing Input
fields
Input
constraint/vali
categories
Broken or
missing or
Wrong output
state
% Of
Missing files
#
HTTP
Interfaces
Classes of errors
Code Logic,
computation
user interface
code
Missing
verbiage
Error
The top three categories represent 66.26% of the
total defects. We are going to focus on these top three
categories from the perspective of defect data.
Error percentage
the classes of errors in product “A” and the
corresponding expected HTTP error categories:
Error class
6.63%
From the defect data perspective, we found that
only 13.26% of the defect data will be found in the
server access logs. The majority of faults are beyond
the HTTP access logs. From the defect data
perspective, the top three problem areas are not related
to faults covered by the HTTP failure response codes.
5.2 Classification of HTTP response codes
The second step is to classify HTTP responses
based on the distribution of the HTTP response code.
The HTTP error usage frequency will be calculated
from the server access log. We conclude that missing
files, which are “404” errors, account for the majority
of the HTTP errors. The other HTTP error groups are
negligible compared with the “404” errors. We will
focus on the missing file or “404” error from the
perspective of HTTP. The table below shows the
classification of the HTTP responses of product “A”.
Response
Description
code
A Pareto chart is also shown below for the defect
classification of product “A”. The top three defect
types are: interface issues, logic code, and user
interface codes. From the distribution in the figure
above, it can derive that the top three categories are:
 Interface defects that consume about 27.11%
of effort
 Code logic, computations, and algorithms
consume about 20.48% of the effort
 User interface code defects consume about
18.67 % of all effort
% Of total
200
OK
48.58%
206
Partial Content
0.03%
302
Page Moved temporarily
24.75%
304
Resource modified
19.43%
400
Syntax error
0.02%
403
Access is forbidden
0.05%
404
File does not exist
7.06%
500
Server Internal Error
0.07%
Response
Description
code
503
% Of total
Service Unavailable
0.00%
5.3 Defects Analysis and Prioritization
The top areas from HTTP and defect data
perspectives are: interfaces, logic, user interface, and
“404” errors. After we determine the top problem
areas, we will perform deep analysis of the top error
groups to find the faults with the greatest usage
frequency.
We run scripts against the server access log to
determine the top “404” faults. We found the top five
missing file faults contribute to 91.39 % of the total
404 failures, as indicated below:
Frequency
% of
total
/images/dottedsep.gif
5805
32.46%
/images/gnav_redbar_s_r.gif
3687
20.62%
/images/gnav_redbar_s_l.gif
3537
19.78%
/includes/css/images/background.gif
2593
14.50%
/includes/css/nc2004style.css
721
4.03%
HTTP error
This will make “404” errors the majority from the
collective fault view. The table below shows the top
four classes of errors from the fault view:
Class of errors
HTTP 404 errors
Interfaces
Logic, computation, and algorithm
User interface code
%
Of
total
33%
21%
16%
14%
On the other hand, to determine the faults with a
high frequency from the top areas of the defect data,
we will need first to determine the number of HTTP
hits returned with response code in 200 and 300
categories per day per server. The table below shows
the number of hits for product “A”:
Average
Number of Hits with Number of
response code 200 and
hits per
Number of
300
transaction transactions
235142
40
5880
The next step is to define the operational profile
for product “A”. The operational profile is a
quantitative characterization of the way a software
system is or will be used. The table below shows the
defined operation of the profile for product “A” with
the corresponding number of transactions per day per
server:
Operation
New order
Change order
Move order
Order Status
Operation
Probability
0.1
0.35
0.1
0.45
Number of
transactions
588
2058
588
2646
Using the number of transactions calculated from
the operational profile and the defined fault’s impact
schema, we will calculate the fault usage frequency.
The table below shows the fault usage frequency of
the order status and the change in the order
components of product “A”:
Application
Aspect
Order status
Order status
Order status
Order status
Order status
Impact
Number of Frequency
transactions
Showstopper
2646
2646
High
2646
1852
Medium
2646
1323
Low
2646
529
Exception
2646
132
This leads to the conclusion that any order status
fault that is classified as showstopper will produce
2646 failures per day per server for product “A”. The
same will be calculated for other operations.
Since we have the usage frequency of the top
faults from HTTP and defect data perspectives, we can
define the priority list of the top failures in product
“A”. The table below shows the top individual faults
for product “A”, along with the usage frequency:
Response
Code
404
404
404
200/300
404
200/300
Faults
Failure
Frequency
/images/dottedsep.gif
/images/gnav_redbar_s_r.gif
/images/gnav_redbar_s_l.gif
Order status – showstopper
/includes/css/images/background.gif
Change order- showstopper
5805
3687
3537
2646
2593
2058
Response
Code
200/300
200/300
200/300
200/300
404
Faults
Order status – high
Change order – high
Order status – medium
Change order – medium
/includes/css/nc2004style.css
Failure
Frequency
1852
1441
1323
1029
721
5.4 Results analysis
We found that some HTTP missing file errors
were recorded in the server access log, but were not
detected during testing due to the fact that some
missing files errors have very low usage frequencies.
Also, we noticed that a large number of failures were
caused by a small number of errors with high usage
frequencies.
Since we prioritized the testing by focusing on
problems areas, we first fixed those errors with a high
usage frequency and a high error rate, so that we could
achieve
better
cost-efficiency
in
reliability
improvement. By fixing the top 6.8% faults of the total
defects, the total failures were reduced by about 57%.
The commonly cited 80:20 rule (80% of problems
were caused by 20% of the components) seems to hold
here where few faults dominate the overall failure
distribution.
6. Conclusion
In this paper, we developed a web-error
classification and analysis method for maintaining
web applications. The web errors were classified, and
high-risk areas were identified and analyzed for
effective reliability improvement. We successfully
applied our web-error classification and analysis
method
to
a
web
application
in
the
telecommunications domain. The top classes of errors
were: missing files, interface, logic, and user interface.
We found that the server access log is very effective in
identifying errors that may not be detected during
testing or operation. The top individual faults were
prioritized based on the failure frequency, which led to
cost-effectiveness for reliability improvement upon
fixing these top faults.
This analysis goes beyond what has been done for
static web sites in that it identifies a collective view of
faults. The wide availability of web server logs and
defect data makes this approach widely applicable for
web applications in operation mode and our
formalized analysis procedure makes it easy to
implement.
This analysis and its application in other domains
may lead to the definition of an error profile for
dynamic web applications that can be generalized to
ensure improved reliability and customer satisfaction
for web applications. Error profiles derived from the
analysis may help researchers to identify areas where
new methods of error prevention and detection are
needed most.
References
[1] L. Ma and J. Tian, “Web Error Classification and
Analysis for Reliability Improvement”, Journal of Systems
and Software, Vol. 80, No. 6, pp. 795-804, June 2007
[2] Offutt, J., Mar, “Quality attributes of web applications”,
IEEE software 19 (2), 25-32, 2002
[3] Musa J.D, “Operational profiles in software reliability
engineering”, IEEE software, 10(2): 14-32, 1993
[4] Ricca and Tonella, ”Tools for anomaly and failure
detection in web applications”, IEEE Multimedia magazine,
2006
[5] J. Conallen. Building web applications with UML.
Addison-Wesley publishing company, Reading, MA, 2000
[6] Kan, S.H. Metrics and Models in software quality
engineering, 2/e. Addision-wesley, Reading, MA, 2002
[7] Lyu, M.R. (Ed.). Handbook of software reliability
engineering. MCGraw-Hill, New York, 1995
[8] Athula Ginige and San Murugesan, “ Web Engineering:
An Introduction”, IEEE Multimedia, 2001
Download