svr001_1 - Project Management at the University of Edinburgh

advertisement
Site Visit Report
Edinburgh University SOA Review
Doc Ref: CONS/300399457/SVR/001
06-Sep-2013
SITE VISIT REPORT
Client:
Project:
Edinburgh University
SOA Review
Oracle Consultants:
Date(s) of work covered by this report:
Alan Maxwell
28-Aug-2013
Description of work performed, including any advice given:
Oracle Project Code:
300399457
Total Billable Time:
1 day
The purpose of the visit was to carry out an initial high level review of the SOA 11g work currently being
undertaken within the University’s Information Services Team. This constitutes, at least initially, a migration of
existing SOA Suite 10g logic to release 11g, but will in future include further services developed to meet current
or future business needs.
The review was based on a series of discussions with various team members, and covered 3 main aspects :



SOA infrastructure (ie installation and configuration)
Development and deployment
Run time monitoring
All of these areas were addressed during the day. The limited time meant that the review was necessarily high
level, but the main conclusions are discussed below for each area.
SOA infrastructure
The SOA infrastructure comprises 3 environments – development, test and production. Test and production
are replicas of each other.
Each environment contains one WebLogic domain including 3 servers : one Admin server, one Managed Server
for SOA, and one Managed Server for BAM (which is not used at present). The underlying database is a single
instance (non-RAC) Oracle one in each case.
The databases run on physical servers, and the WebLogic servers run on virtualized servers implemented on
VMWare.
Although each environment is single server throughout, and therefore does not have high availability features,
there is a plan for a disaster recovery capability. On the database level, this will involve use of Active Data
Guard, and at the application server level, this will involve the use of a product called VEEAM to replicate the
VM data to the standby site.
In addition to the components already described, the production installation (at least) includes a load balancing
router, which acts as an SSL termination point. There is also an Oracle HTTP Server (OHS) installation located
between the load balancer and the SOA Suite domain, acting purely as a passthrough proxy at present.
There are several observations and recommendations to be made from this brief description :

Use of VMWare. Oracle Support has a published policy on support for products running on VMWare,
which is described in Support Note 249212.1. One of the first paragraphs in the note states :
‘Oracle has not certified any of its products on VMware virtualized environments. Oracle Support will assist customers
running Oracle products on VMware in the following manner: Oracle will only provide support for issues that either are
known to occur on the native OS, or can be demonstrated not to be as a result of running on VMware.’

Given this situation, it is also the case that use of VEEAM as a disaster recovery solution for the middle
tier is not supported either. The supported solution, as described in the current Fusion Middleware
Disaster Recovery Guide at :
1
Document1 (Issue 1)
Site Visit Report
Edinburgh University SOA Review
Doc Ref: CONS/300399457/SVR/001
06-Sep-2013
http://docs.oracle.com/cd/E28280_01/doc.1111/e15250/intro.htm#sthref4
involves the use of Data Guard to replicate the database contents, and disk replication technology to
support the middle tier data movement.

Versions. The version of Fusion Middleware which is in use is release 11.1.1.6. The currently shipping
version is 11.1.1.7.
Oracle Support has a stated policy and timeline for different levels of support, contained in Note
944866.1. To quote one of the relevant sections :
‘Grace Period: …..
You have up to one year from the initial release of the patch set to install the new patch set, and can receive new bug fixes
for the previous patch set during that time. The patch set grace period became effective with the release of FMW 10g …..
This continues through the 11g patch sets (e.g. 11.1.1.3 is supported for one year from the initial release of 11.1.1.4, etc.).’
In effect, this means that once 11.1.1.7 has been available for a year (which is likely to be March-April
2014), the degree of bug fix support available for release 11.1.1.6 will change.
As a result, there would be a recommendation to move to release 11.1.1.7 at a suitable point, preferably
before the support date mentioned above. This could be done now (by running the upgrade procedure
for SOA Suite and the infrastructure database contents), or at a suitable point in the system’s lifecycle
(eg creation of a new production environment – see below).
Recommendation (added after site visit) : Consider development of plan for migration to current 11.1.1.7
release of SOA Suite.

High Availability/Scalability. The current production environment contains only one managed server
running SOA Suite (and also uses a single instance, non-RAC database). As the team are well aware,
this imposes restrictions on the environment’s availability (since the managed server and the database
both constitute single points of failure), and its scalability, since there is no ability to add more
managed servers to increase capacity if this proves necessary.
The reasons for the decision to go into production in this configuration are well understood, and will
not be rehearsed here. In addition, the nature of the initial workload, where most SOA composites will
be triggered by a database adapter poll, and so initiated by the SOA Suite itself, means that outages
will not have such a dramatic impact on the external applications – the changed records will simply
accumulate in the database until SOA Suite restarts.
Nevertheless, there are also some SOAP based Web Services being offered by the SOA environment
and used by external applications, which means that a service outage will already be visible to other
systems. In addition, this type of traffic may increase over time.
Furthermore, at the time of writing, no performance testing has been carried out. There is a possibility
that the new environment may already be required to process traffic volumes which exceed the
capacity of a single server. (Note : According to the team, there are moves afoot to reduce the
throughput in SOA, by reducing the number of source database updates which trigger SOA processes.
This may reduce the initial capacity related risks.)
As a result, the recommendation would be to move to a multi managed server configuration as soon as
possible, both for availability and scalability reasons. Depending on how this is carried out, this could
entail a simple copy of the existing managed server configuration, or it could involve creation of a new
environment, especially if Oracle’s recommended approach of ‘Whole Server Migration’ is adopted.
The pros and cons of each approach were briefly discussed on site. This point should be explored in
more detail during a planned visit by another Oracle consultant in the near future (time permitting).
Recommendation : Explore options, and develop plan, to migrate production environment to a
2
Document1 (Issue 1)
Site Visit Report
Edinburgh University SOA Review
Doc Ref: CONS/300399457/SVR/001
06-Sep-2013
configuration with superior high availability and scalability characteristics.

Disaster Recovery Environment. While it is good that a disaster recovery environment is being set up,
there are well known issues which need to be considered as part of the decision to fail over to a DR site.
For example :
o
Assuming that the database and middleware components both replicate asynchronously and
independently, there may be timing and therefore data discrepancies between the two tiers
after a failover. This can lead to issues in the immediate post failover period.
o
In addition, should other applications which interact with SOA Suite failover as well, then
there can be similar timing issues related to the (presumably asynchronous) failover
mechanisms used by these other applications. This can lead to ‘impossible’ situations, such as
SOA Suite receiving the ‘same’ update twice, because the source application has recovered to a
different point in time than SOA Suite.
These considerations are not specific to Edinburgh University, nor are they specific to SOA Suite.
They are instead related to creation of a distributed disaster recovery environment by asynchronous
data replication. However, they do mean that the decision to fail over from the primary to the
standby site is not something which should be taken lightly. In particular, it would not be wise to
consider the DR site as representing a suitable alternative to including High Availability features in
the primary site.
The observations above are based on a brief discussion with the team members looking after the creation of the
environments. However, the environment setup was not reviewed in detail, on time grounds. It has been
agreed between Oracle and Edinburgh University that another consultant, who specializes in this area, should
review the production environment setup at least.
Development and deployment
Again, this area was briefly reviewed, mainly on the basis of a discussion with the team leader and one of the
team members. The outline summary from this discussion is :

The development aspects which the team are considering, such as service versioning, use of a generic
error handling component, automated unit testing, and automation of deployment as far as possible,
are representative of good practices which the author has either seen or has recommended to other
customers.

Based on this high level discussion, the decision was taken not to devote more of the time available to
review development and deployment practices in more detail. This could certainly be done, but a full
review (covering areas such as development and coding standards, design and development approach,
procedures and approaches to testing of all types, and deployment practices) would take a significant
amount of effort – well in excess of the budget available at present.

Instead, the remaining time spent with the development team was spent in answering some specific
questions which the team had sent prior to the visit. The answers were given verbally on the day. In
some cases, follow-up actions were taken by the author to send other information after the visit. This
will be done under separate cover.

In addition to these existing questions, some recommendations were given to the team to consider
other areas, such as use of a continuous integration server, and use of the MetaData Store (MDS) to
hold common artefacts such as WSDLs and XSDs.
Run time monitoring
Again, this area was discussed in the context of current SOA 10g monitoring practices. The key points are :

The current monitoring approach, from the description given, appears to work for the current scenario.
3
Document1 (Issue 1)
Site Visit Report
Edinburgh University SOA Review

Doc Ref: CONS/300399457/SVR/001
06-Sep-2013
However, as a general observation, the current monitoring approach is not necessarily something
which could be ‘scaled up’ to deal with a larger service portfolio.
o
There is a significant degree of manual monitoring required, including inspection of database
states in the applications integrated with SOA.
o
In addition, there is a close working relationship with the business users for the key
applications involved – something which could not necessarily be guaranteed if the application
landscape, in terms of SOA integration, were to be expanded.

In addition to the monitoring of external applications, the current approach involves using the ESB and
BPEL consoles in 10g to track the behaviour of the environment as a whole, as well as specific process
instances.

In SOA Suite 11g, the same type of console based monitoring approach is available. There are some
enhancements which are available immediately, such as the ability to track processes end to end across
multiple composite instances.

There are also some enhancements which can be leveraged with code changes. One key one, which
would be recommended, and is very straightforward, would be the use of so-called ‘composite sensors’
to allow key data item values to be recorded for each instances. The composite instances can then be
searched on these data item values, greatly assisting in locating specific process instances (the ‘where is
order number XXX’ type of question). Composite sensors are described in the SOA Suite Developer’s
Guide at :
http://docs.oracle.com/cd/E28280_01/dev.1111/e10224/sca_compsensors.htm#CIHGIDDE
Recommendation : Include composite sensors in all 11g composites, or at least those which represent ‘entry
points’ for external applications.

In addition to these ‘like for like’ monitoring features, there are also some other areas which could be
considered. Examples include :
o
Integration with Oracle Business Activity Monitor (BAM). If the composite applications are
instrumented to send data to BAM, then this can be used to display near real time information
on business process execution behaviour (data can be sent from SOA to a BAM report in a few
seconds). Although BAM is presented as a tool for business level monitoring, business issues
(such as non-completion of processes in time) frequently have underlying technology causes.
As a result, BAM information is potentially valuable to IT support staff.
o
Oracle Enterprise Manager Grid Control can be used to monitor the behaviour of the entire
Oracle environment, both database and middle tier, from the infrastructure through to the
different service engines (BPEL, Mediator etc) within SOA Suite. In addition, with the use of
the SOA Management Pack, extra facilities such as end to end transaction tracing and
monitoring are available.
o
Both BAM and Enterprise Manager include the ability to carry out automated pro-active
monitoring, albeit in different contexts (business oriented and technical infrastructure
oriented). This kind of capability is likely to be required if the SOA environment starts to grow
in scale.
In terms of costs, BAM is part of SOA Suite, and so is already available, although it may require some
code additions to integrate BAM and SOA in the optimal fashion.
Enterprise Manager is a distinct product from SOA Suite, and has its own licencing model. In addition,
installation and configuration of a product as sophisticated and feature-rich as Enterprise Manager to
monitor a relatively small environment may be felt to be a disproportionate action.
4
Document1 (Issue 1)
Site Visit Report
Edinburgh University SOA Review
Doc Ref: CONS/300399457/SVR/001
06-Sep-2013
However, it is possible that the University may already have Enterprise Manager licences and/or
expertise as part of its monitoring solution for other parts of its Oracle estate. If this is the case, then it
may be possible to leverage existing installation(s) to monitor the new environment as well.
As can be seen, there is scope for monitoring the new 11g environment in a similar way to the current
10g one, and that may be sufficient for the initial stages. However, once some expertise has been gained
in 11g, some more effort should be devoted to exploring potential extensions to the current monitoring
approach.
Recommendation : As 11g expertise grows, review default monitoring options available for suitability.
Recommendation : Explore use of BAM for monitoring key activities in environment.
Recommendation : Review Enterprise Manager as a potential external monitoring solution.

On a more specific topic, there are some issues being experienced with 10g components (ESB services)
ceasing to operate, but without giving any external signs of failure. It is not clear if these are product
issues, or expected behaviour.
ESB in its current form does not exist in SOA 11g, with other components taking its place. As a result, it
is not possible to say whether these issues would persist in an 11g implementation.
It would be possible for Oracle Consulting to review the issues being encountered with ESB in 10g, but
the feeling was that this was probably not worthwhile. The issues are known and can be worked
round, for the limited ESB 10g lifetime left.
Security
From a brief discussion on security within the SOA environment, the following points were noted :

The University has a policy of using SSL for internal traffic, with the exception of the ‘last mile’ within
a trusted network. In the case of the SOA environment, the SSL offload takes place at a load balancing
router, and plain text traffic is used between the router, HTTP server and SOA Suite.
While perfectly feasible, and by no means unique, the author found this approach slightly surprising.
The university is effectively treating most of its own intranet as an untrusted and insecure network,
which is not necessarily what would have been expected. No further comment will be made.

In terms of SOA service security, any SOAP accessible web service has an authentication policy, at
least, defined on it. The credentials being used correspond to a small number of ‘service’ accounts – in
effect, there is currently one service account per consuming application.
There is no perceived requirement, now or in the future, to propagate the ‘true’ user identity from the
calling application to the called service. This is not an unusual situation in this kind of environment,
although the approach of not propagating user identities should be considered and documented as a
conscious decision.

In addition, a form of role based access control is in operation. Each SOAP service has an authorization
policy attached to it, restricting access to the service to users in a particular group. A different group is
used for each service. When an application wishes to use a particular SOA service, that application’s
‘service’ account is added to the relevant group, permitting access to the service.
This approach can be made to work, but there are potential concerns with scalability and
manageability as the number of services and applications grows.
It is currently believed that most of the access control decisions will be related to a particular ‘School’
within the University, and that each consuming application will be associated with a particular school.
However, there is also a desire to monitor usage by individual applications.
5
Document1 (Issue 1)
Site Visit Report
Edinburgh University SOA Review
Doc Ref: CONS/300399457/SVR/001
06-Sep-2013
One possible solution would be to define a series of ‘roles’ (in practice, LDAP groups) for each School,
and control service access on the basis of this more limited set of roles. The monitoring of individual
applications could be done by (eg) including a common, University specific, ‘header’ element in every
message, and including the application name as an element in this header – allowing usage by
application to be monitored, while still retaining the school level access controls.
Recommendation : Review current security policy for manageability/scalability, and map against defined
business requirements.
Any Problems or Issues raised (business or technical) and actions taken during this visit:
See above
Conclusions and any client response:
See above
Future plans (e.g. next visit or follow-up actions):
It was agreed on the day that another visit, by a second Oracle consultant, should be scheduled, to review the
production setup in more detail. While it is perfectly possible that such a review might not find anything of
significance, it is still felt to be a prudent step to take by both Oracle and the University.
The author undertook to send some relevant links to blog entries etc to some of the people involved in the
discussions during the day. This will be done in a separate email.
In addition, once the monitoring requirements in 11g become clearer, it would potentially be useful to have an
Oracle consultant visit and review the requirements in the light of BAM and Enterprise Manager’s capabilities.
Details of any deliverables given to the client or any documentation/software left on the client’s machine
(including location and filenames):
No specific deliverables.
Knowledge Transfer (eg. Explain what actions you have taken to share knowledge of the work you have
undertaken):
The day was based around a series of discussions with University team members, so knowledge transfer took
place during these discussions.
This report approved by:
Ian Fiddes (deemed approved, since no comments received) 6 September 2013
6
Document1 (Issue 1)
Download