Scalable and Highly Available Infrastructure for

advertisement
Scalable and
Highly Available Infrastructure
for J2EE Applications
A Case Study:
ETA- Education and Training Administration
System
Embry-Riddle Aeronautical University
written by
John Vaughan, DataRoad, Inc.
Marty Smith, Embry-Riddle Aeronautical University
Introduction
In this white paper we will discuss the development of a highly available, scalable and secure
infrastructure designed to support the operation of a web-based J2EE application. The project
involved implementation of a flight training management application for Embry-Riddle
Aeronautical University. This application provides management for flight training operations to a
variety of organizations and is the main focus of efforts at Embry-Riddle to standardize, automate
and secure control over the activities of flight training globally.
Applications that provide these services must be able to combine existing information with new
business functions that deliver services to a broad range of users. These services need to be:

Highly available, to meet the demands of a extended business environment

Secure, to protect privacy and integrity of data

Reliable and scalable, to guarantee that business transactions are accurately and
promptly processed
In reality, Java technology is only as scalable, available, and manageable as the infrastructure on
which it runs. When the platform can’t keep up with growth in the number of users, transactions
per user, or transaction bandwidth, applications perform poorly and websites slow to a crawl.
Like any other enterprise application, a server-side Java application can be brought down by a
hardware fault, a software fault, a network fault, or an environment fault. Whatever the reason,
there is no room for downtime in an e-business environment. Properly designed data centers
address the network and environment fault issues by providing redundant power and internet
connectivity.
This paper presents a case study of implementing a highly available and scalable solution that
combines Oracle9iAS, Oracle9i RAC, SSL accelerators, and hardware load balancers. This
solution was designed and implemented for Embry-Riddle Aeronautical University, the largest
flight-training school in the world. ERAU's J2EE application supports every aspect of the school's
flight training program, so scalability and 24/7 availability are critical.
Embry-Riddle Approaches the Future
Embry-Riddle AeronauticalUniversity is the largest independent aeronautical university in the
world. The not-for-profit institution educates more than 24,000 students annually through thirty
degree programs. Its ROTC detachments train more Air Force pilots and commissioned officers
than any other institution except the Air Force Academy.
The flagship program at Embry-Riddle, however, is the education of commercial pilots. This
training is a demanding process that requires a comprehensive and continuously reviewed
program of advanced learning. Consequently the university is launching a new curriculum to
revolutionize pilot training brining information technology to bear on the issues involved.
Embry-Riddle oversees all the usual elements of an academic program such as student records,
human resources, financials and facilities. But when it comes to sending a student airborne, the
school must also track a host of factors that can change by the minute, such as weather, air
traffic, condition of the plane or simulator, and the health and certifications of both the student and
instructor/pilot. For every sortie, the instructor, student and craft must meet specific qualifications.
Relying on manual methods, Embry-Riddle staff scramble to match craft, student and instructor.
Compounding the challenges is the scale of Embry-Riddle’s operation, which includes campuses
in Daytona and Prescott, Arizona, 130 education centers and a distance-learning network. At the
Daytona campus alone, 80 instructor/pilots supervise 550 student flights daily using a fleet of 139
instructional aircraft.
Until recently the tracking of training programs worldwide has largely been a paper and pencil
procedure. This practice is inexpensive and requires little training to maintain but is fraught with
opportunity for error that can, inevitably, lead to inaccurate information being provided to the
training institution, trainers and students alike.
Their approach to dealing with the increasingly dynamic requirements of this state of affairs was
to develop an automated, Internet-capable information management system for tracking flighttraining data for their students. This allowed Embry-Riddle to re-engineer the training of
commercial pilots to provide them with a better education at less cost. This new curriculum blends
all of the required skills into one seamless course.
This innovative approach to curriculum management is Embry-Riddle’s pioneering Education and
Training Administration (ETA) system, which applies “just-in-time” methods to orchestrate the
costly human and capital assets required for flight training. Embry-Riddle expects the ETA system
to enhance instructional quality while reducing student expenses and institutional overhead.
As was mentioned earlier, until ETA, Flight Training Management was largely organized through
pencil and paper operations. These operations are antiquated, time consuming and inaccurate.
Difficulty to maintaining currency of data and human error compound the problem. This method
also was very fragmented and lacked in comprehensive communications across disciplines and
organizations.
The priority then was to develop a system that would keep the current operations running
smoothly while improving on the integrity, availability and accuracy of the data being managed.
This was absolutely essential to the overall perception of the solution as being beyond criticism
and doubt. To accomplish this, the ETA system would need to exceed expectations for all Service
Levels, Academic/classroom support, Student Services and Daily campus support.
It would need to be available 24 x 7, anytime, anywhere and the network, infrastructure, and
applications would need to operate flawlessly.
The answer to meet these demands was to develop a real-time information management system
for tracking flight-training data that is highly available, highly scalable and immediate with fast
web access any time, anywhere. The system also needed to address usability issues with userfriendly interfaces and the portal based individualization. The data must be continually updated
and current. The system must also be secure with authentication and intrusion proof data.
Education and Training Administration – Aviation Learning Management System
To meet the challenge of establishing a system that would provide such a system Embry-Riddle
Aeronautical University, DataRoad, Inc. and Talon Systems collaborate to produce an Aviation
Learning Management System called ETA: Education and Training Administration.
This system is the most comprehensive Flight Training Management Program ever and serves as
an enterprise model for those using it. ETA is a Flight Training Management Tool; a 100%
Internet-based J2EE application accessed through standard web browser. It is completely
electronic and supports concurrent operations at Daytona, Prescott, Affiliate Operations and now
the US Air Force Academy. The number of locations it supports is continuing to grow.
Integrating data from Embry-Riddle’s maintenance, HR, payroll, accounting and student record
systems, the ETA system provides students, instructor/pilots and managers with secure, Webbased access to all of the information and tools they require to participate in the new curriculum.
The ETA system translates Embry-Riddle’s new course for commercial pilots into a continuum of
stages, lessons and units, each structured by line-item objectives. Students can extract
customized training plans that guide and track their progress through the curriculum. An
electronic grade sheet automatically posts any incomplete line items until their satisfactory
completion.
Whether a lesson takes place in a cockpit or a classroom, the ETA system identifies and
schedules all human and capital resources required to fulfill the session’s line-item objectives and
confirms the readiness of these resources.
Flagging any issues, the system automatically checks all relevant details, including the student’s
prerequisite courses, flight hours and registration, financial and health records; the instructor’s
pilot ratings and certifications for the prescribed craft and sortie; and the maintenance status of
the vehicle.
While translating documents into real-time data, the ETA system also streamlines execution of
paper-laden processes—from FAA-mandated safety and security documentation to EmbryRiddle’s own internally generated paperwork.
Far more than scheduling software, the ETA system is a repository of real-time information and
tools The ETA system’s tools reinforce best practices in managing human and capital assets,
enabling not-for-profit Embry-Riddle to more efficiently deploy its educational resources.
Embry-Riddle worked with Talon Systems LLC to develop the entirely Web-based(J2EE) system.
Oracle partner DataRoad, Inc. designed and implemented the infrastructure for ETA at one of its
secure data centers in Jacksonville, Florida. This infrastructure uses Oracle 9i Application
Server(9iAS) and Oracle 9i RDBMS software, HP Servers and Alteon Load Balancers and SSL
accelerators.
Preparing for Growth
Due to the growth potential of the user base for the ETA system scalability and high availability
were essential. As more users come on to the system it must scale up appropriately and be
available immediately, 24x7. Why? Embry-Riddle provides multi-national flight training at both day
and night in multiple time zones.
For the ETA project, the system runs on a real-time, 24x7 platform that utilizes Oracle 9i Real
Application Clusters (RAC) database, HP TruCluster Server software, and DataRoad’s technical
experience to provide a highly available, highly scalable solution to meet Embry-Riddle’s needs.
Unique to the solution is the single-system manageability of the software, which makes operating
multiple servers as simple and economical as managing one.
DataRoad’s end-to-end solution exploits all of these advantages to efficiently meet EmbryRiddle’s requirements for high availability, security and data integrity. DataRoad provides a
dedicated platform for the ETA system that comprises servers, software, and networking.
DataRoad hosts and administers both the system and the application, which users access
through a secure VPN.
Definitions
Prior to discussing specific configurations it is important to discuss general architecture terms and
definitions appropriate for the deployment of highly available and scalable infrastructure.
Firewalls
Firewalls are devices that restrict access between different LAN segments for security purposes.
Firewalls perform this function by analyzing traffic and can make restrictions based on IP
address, port, protocol used, protocol transitions and message content. For example, Check
Point Firewall-1 products provide a software solution that includes a feature called "stateful
inspection" that can restrict access based on illegal Internet protocol transitions. Cisco's PIX is an
example of an integrated hardware-software firewall solution.
Some devices that are called firewalls are software-only products that are loaded into client or
server machines. These may be useful but are inadequate for corporate firewalls that should
always be deployed in separate machines than those deploying application or infrastructure
software. Firewalls are a main defense for sites providing Internet access. Different firewall
products vary considerably in features and performance. Appropriate use of firewalls can protect
against many common vulnerabilities by prohibiting Internet access to services such as FTP or
rsh (especially if such services were inadvertently left running on Internet servers).
Load Balancers
Load balancers have two essential functions. The first is to load balance traffic across multiple
servers thus resulting in better scalability. In high traffic situations this can be very important. The
second essential function is to provide fault tolerance for servers. In this case the load balancer
ensures that a single failing server does not result in loss of a critical resource. The load balancer
accomplishes this by routing new requests to alternate servers if one server fails. So, Load
balancing hardware is used both to provide scalability by spreading load across multiple
processors and also to provide fault tolerance in case of processor failures.
Load balancers typically are able to route traffic in both situations where the infrastructure keeps
application state also in situations were it does not keep state. In the case of stateless
communication the load balancer can route to any of its managed servers since there is no state
in any particular server that is needed to correctly process the message. This is generally more
efficient since requests can always go to the least busy server but stateless operation often puts
an unacceptable burden on application writers. Many Oracle products require that the
infrastructure maintain application state.
For transactions where the infrastructure keeps state, load balancers switch incoming messages
to the server containing the state. Switching criteria are determined by analyzing cookies,
headers or other attributes. Sometimes only a single server contains the state. In that case
processor failures result in the failure of all transactions that have state in the failed processor
and such transactions must be restarted. In some situations there are preferred processors but all
processors can obtain the state. When failures occur in these situations, a redirect due to
failure will result in successful processing although there may be added overhead for transactions
that had state in failed processors.
SSL Accelerators
In many sites, SSL key exchange operations can dominate CPU usage. For such sites HTTPS
accelerator appliances can result in significant cost reductions and improved performance.
Expanding HTTPS use improves security. Where HTTPS use is limited by performance
considerations, HTTPS accelerators should be considered. The term "sticky" or "persistent"
transaction is often used to denote transactions that should be routed to particular, load balancer
managed hardware containing intermediate application transaction state.
There are different types of SSL Accelerators. One type is basically a math coprocessor
that offloads expensive cryptographic operations from general purpose CPUs . A second type is
a stand-alone device that converts HTTPS to HTTP protocols. That is to say, it takes incoming
HTTPS protocols and converts them to HTTP. Since the SSL processing of the HTTPS protocol
can consume a large percentage or even most of a CPUs time, offloading SSL processing may
result in a significant reduction in the number of CPUs required to support a workload. Such
reduction can result in both cost savings as well as improved scalability.
A current problem with HTTPS to HTTP appliances occurs when client side X.509 certificates are
used. This is because these appliances terminate the SSL session and there is no standard way
to provide the client side X.509 certificate information with the forwarded message. If client side
certificates are only used to allow/deny access to a site or virtual host this may be acceptable.
However if the application or other infrastructure items need certificate information, custom
solutions are currently required. Since client side certificates are infrequently used at this time,
this consideration is not important for most sites. Customers interested in use of X.509 client side
certificates with such devices should contact Oracle or appliance providers as progress toward
standard, supported solutions is being made.
Clustering
Clustering, while complex in practice, is fairly simple in definition. Clustering is the grouping
together of hardware and software into nodes that work together as a single system to ensure
that an application remains online for users during excessive loads, or if one of the nodes fails.
Clustering enables you to construct a multi-node system that makes several independent servers
appear like one. Multiple servers are connected together to form a single integrated system. If
any part of the systems goes down – either intentionally or unintentionally - failover masks the
failure to the end users, thereby making the system more available. The down member of the
cluster is then reactivated, if possible, through a restart. This reduces the need for administrator
intervention. The system can also be scaled more effectively support more users through load
balancing. Advanced tools for managing the cluster also assist in monitoring the activities of the
system and alerting administrators to potential issues.
Availability
High Availability requires a variety of approaches to deliver. Each goes hand in hand to contribute
to a highly available service to the end user.
As mention earlier, in clustered environment multiple servers act in concert with each other to
present a single source. For a member within the clustered environment to take the place of
another that is experiencing trouble, the state of requests must be shared across all members of
the cluster. When a new cluster member takes over for a failing member the process is executed
more smoothly due to the share requests.
In the event of a failure, transparent failover enables a member of a clustered system to take the
place of another member without the end user being aware that a change is taking place; in
essence totally transparent. This gives users a sense of continuity to the system. The individual
member of the cluster experiencing the downtime does not effect the operation on the user side
at all.
Once failover is executed and the system is stable again, which happens rapidly, quick
automated restarts then take place. The down member is identified and restarted automatically. If
it cannot be restarted an error is generated and administrators are notified of the situation for
further attention. This process reduces the need for direct intervention on the administrative level,
thereby minimizing downtime and increasing availability.
In the event that the system has a serious failure that requires significant downtime, the cluster
can gracefully degrade the service provided to the end user. This provides a limited level of
service, rather than presenting a total failure. Single points of failure are also reduced or
eliminated thereby limiting the risks of significant failure and unnecessary downtime.
High availability is also improved through the use of load balancers. Load balancing is necessary
because multiple servers servicing one application can quickly be overwhelmed and crash if the
workload is not split up. Load balancing divides work between two or more computers. The work
gets done in the same amount of time without any one computer getting overloaded. Cluster
resources are dynamically re-balanced for optimal cluster utilization.
Scalability
Scalability is also essential to maintaining acceptable levels of service while keeping costs under
control. System growth must be progressive and easily expandable to meet increasing demand
from the user base. A clustered environment provides the most appropriate solution. Nodes
added to the cluster are automatically utilized; no manual re-allocation of resources is needed.
This enables low cost incremental scaling, allowing DataRoad to reduce the hosting expenses to
Embry-Riddle by using only the server power it needs at any time. Due to the flexible nature of
the environment, more equipment can be brought online at a moments notice to address any
scaling requirements the system may demand. This provides ERAU with a “Scale as you grow”
option that minimizes the initial capital outlay for equipment, thereby significantly reducing hosting
fees. This approach provides for more effective costing of hosting fees based upon real and not
just anticipated growth.
In a rapidly changing environment, opportunities for growth appear and disappear rapidly. It is
difficult to accurately predict the demand for a database or application server two years out, yet
having too little computing horsepower at any given time is unacceptable. Even if growth is
initially underestimated, the scalability of the system will allow for cost effective sizing of
infrastructure. Real Application Clusters give scalability on demand because it is no longer
necessary to predict scalability needs.
Application Server (9iAS)
This paper focuses on the ‘core’ components of Oracle9iAS Release2. Hence, a reference to
Oracle9iAS in this paper in general is a reference to Oracle9iAS Release 2 J2EE and Web Cache
Install.
The components that fall in the core category are:



Web Cache: This is typically the first component of Oracle9iAS to receive the request.
For both static and dynamic requests, it can cache the result and then replay the results,
thus reducing the workload of the machines behind. In addition, these Web Cache
instances can themselves be clustered.
Oracle HTTP Server (OHS): This is the next in line after Web Cache to receive a request
– this sub-system comprises a web server (based on Apache), a perl execution
environment, and a PLSQL and OC4J routing system.
Oracle Containers for J2EE (OC4J): This is the J2EE compliant container in
Oracle9iAS. It provides clustering capabilities for the J2EE components – Servlets, JSP,
and EJB. It also contains other mechanisms, such as Java Object Cache, which provides
distributed caching capabilities.
Real Application Clusters (9iRAC)
Real Application Clusters is an option for an Oracle9i database. Oracle9i Real Application
Clusters provides both scalability and availability as a single, easy to manage database product.
With Oracle9i Real Application Clusters, your enterprise database delivers scale out economics
with the ease of use and power of a scale up approach. For any database application, a Real
Application Cluster database looks just like an Oracle9i database on a single server. Real
Application Clusters supports all types of applications, from update-intensive online transaction
processing to read-intensive data warehousing.
Oracle9i Real Application Clusters database not only appears like a standard Oracle9i database
to users, but the same maintenance tools and practices used for a single Oracle9i database can
be used on the entire cluster. All of the standard backup and recovery operations, including the
use of Recovery Manager, work transparently with Real Application Clusters. All SQL operations,
including data definition language and integrity constraints, are also identical for both
configurations.
Real Application Clusters provides rapid, automatic failover for users if their servers go down.
This automatic failover capability can prevent having to go through a complex serious of
operations to restore access to a database, actions which, if not performed promptly or correctly,
can increase the duration of downtime or even jeopardize the integrity of your data.
The Solution
Download