Appendix A: Sample wcsri.cfg

High-availability and Scalability

Architectural Alternatives and Configurations for

NCAR NNEW WCSRI 3.0

Version 1.0

May, 2011

National Center for Atmospheric Research

1

Document Revision History:

Version

1.0

Date

May 18, 2011

Changes

Initial revision

Contributors

Rob

Weingruber,

Marcel Casado

Please direct comments or questions to:

Rob Weingruber or Aaron Braeckel

National Center for Atmospheric Research

Research Applications Laboratory

3450 Mitchell Lane

Boulder, CO 80301 weingrub@ucar.edu

or braeckel@ucar.edu

Terms of Use – NNEW Documentation

The following Terms of Use applies to the NNEW Documentation.

1.

Use. The User may use NNEW Documentation for any lawful purpose without any fee or cost. Any modification, reproduction and redistribution may take place without further permission so long as proper copyright notices and acknowledgements are retained.

2.

Acknowledgement. The NNEW Documentation was developed through the sponsorship of the National Oceanic and Atmospheric Administration and the Federal Aviation

Administration.

3.

Copyright. Any copyright notice contained in this Terms of Use, the NNEW Documentation, any software code, or any part of the website shall remain intact and unaltered and shall be affixed to any use, distribution or copy. Except as specifically permitted herein, the user is not granted any express or implied right under any patents, copyrights, trademarks, or other intellectual property rights with respect to the NNEW Documentation.

4.

No Endorsements. The names, UCAR, NCAR, MIT, and Lincoln Labs may not be used in any advertising or publicity to endorse or promote any program, project, product or commercial entity.

5.

Limitation of Liability. The NNEW Documentation, including all content and materials, is provided "as is." There are no warranties of use, fitness for any purpose, service or goods, implied or direct, associated with the NNEW Documentation and UCAR and MIT expressly disclaim any warranties. In no event shall UCAR or MIT be liable for any damages of any nature suffered by any user, or any third party resulting in whole or in part from use of the

NNEW Documentation.

Copyright © 2007-2011 University Corporation for Atmospheric Research and Massachusetts

Institute of Technology Lincoln Laboratory

2

1.

Overview

This document describes some of the architectural and configuration options for the WCSRI

Version 3. As of version 3.0, the WCSRI provides an approach for high-availability/failover and load-balancing/scalability - desirable features for a production system. These features can be achieved in a number of different ways, depending on the deployment’s requirements and hardware availability. This document will describe some of the different approaches, as well as some of the advantages and disadvantages to each.

2.

WCSRI 3.0 Overview

The WCSRI consists of a number of different functional components internal to the WCSRI server:

 Data Ingest

 Data Retriever

 Subscription Management

 Pub/Sub Notification Producer

2.1.

Data Ingest

The Data Ingest functionality of the server watches for new weather data files that are ingested into the system. The Data Ingest uses the wcs.rootDataDir (as found in the wcsri.cfg file) as the root directory for the server, and frequently polls the /staging directories of each of the coverage or data source directories found below the wcs.rootDataDir. The Data Ingest uses a relational database to store information about the weather data files it has found, and uses the ActiveMQ message broker to post messages regarding new data availability to a queue.

2.2.

Data Retriever

The Data Retriever functionality of the server is responsible for receiving external client requests and processing them. This includes requests for server metadata, coverage descriptions, and data subsetting operations. The Data Retriever component accesses the weather data files found in

wcs.rootDataDir, and has the option to either process the requests in a synchronous or asynchronous mode. When asynchronous, the Data Retriever will leverage ActiveMQ queues to process the weather data requests.

2.3.

Subscription Management

The Subscription Management component does exactly that – it manages subscriptions. It receives requests for subscriptions, and creates subscription entries in the relational database. It also takes care of subscription expirations, renewals, etc.

3

2.4.

Pub/Sub Notification Producer

The Pub/Sub Notification Producer is the core of the real-time notification system. It uses

ActiveMQ to receive messages of new weather data availability from a queue (from the Data

Ingester), creates notifications for interested subscribers, and persists the notifications to the relational database. Then, the Producer again uses ActiveMQ to deliver the notification messages to the subscribers, if they are indeed connected at the time (otherwise, the messages will be delivered when they connect).

4

Figure 1. Different internal and external components of the system as they may be installed on a single machine.

5

3.

The WCSRI and Other Components

To achieve the functionality described in the previous section, the following additional components are used:

 Servicemix - the ESB/web container for the WCSRI server/services. This is what we call the

“WCSRI”.

 ActiveMQ - the message broker used by the WCSRI. ActiveMQ uses its own database called

KahaDB.

 Pub/Sub and Idempotency Database – the RDBMS used by the WCSRI.

 Weather Data File Storage – the backing store for the weather data (ie wcs.rootDataDir) and storage for temporary weather files (ie wcs.tempDir).

These are also depicted in Figure 1 in a single node configuration. As you may imagine, running all of these components on a single node may pose problems depending on the demand and/or service-level requirements of the system. Although it is possible (and the developers of the WCSRI frequently do), it is not recommended to run all components on a single machine, especially if the deployed WCSRI will be used in any sort of production or operational environment. Clearly, a single node will not provide failover or load-balancing.

The first necessity for failover/load-balancing is to have more than one machine available for use.

The following subsections describe some advantages, disadvantages, considerations and configurations for providing failover and load-balancing for the components listed above.

6

Figure 2. A scenario with 4 nodes used for the WCSRI cluster.

7

3.1.

WCSRI / Servicemix

3.1.1

Request/Reply Mode of Operation

The WCSRI can process requests (ie: GetCapabilities, DescribeCoverage, GetCoverage,

GetMetadata requests/replies) in a synchronous or asynchronous mode. The synchronous mode is appropriate when running a single WCSRI node, whereas the asynchronous mode is required for a

WCSRI clustered configuration. In asynchronous mode, requests are processed as follows:

1.

Requests are posted to an AMQ queue by the initiating WCSRI node (the one that received the initial client request on an exposed endpoint)

2.

Requests are asynchronously processed by any WCSRI node participating in the cluster

3.

Results may be either written to the processing node’s disk shared by the cluster, or are streamed back to the initiating node

4.

From the initiating node, results are returned back to the client.

Clearly in this case, adding WCSRI nodes to the cluster improves scalability. In order to leverage the asynchronous nature of the WCSRI, the following properties (found in wcsri.cfg – See

Appendix A.) should be modified:

# Flags for enabling/disabling asynchronous mode for request/reply wcs

# operations. These must be true for a clustered WCSRI, and improves performance

# even for a single node.

wcs.async.getCvg=true

wcs.async.descvg=true

wcs.async.getCaps=true

wcs.async.getMeta=true

# Streaming is required when using a WCSRI cluster where each WCSRI node in the cluster

# has its own local output directory/disk for temporary weather files (see “Weather Data

# Files Storage” in this document below)

gds.datasource.streaming=true

datastreamer.publishAddress=127.0.0.1 [change to node of IP address of node]

datastreamer.port=10099

In asynchronous mode, each node in a WCSRI cluster can process a configurable number of requests at a time, specified by the gds.numberConcurrentConsumers property as below. Since this property specifies the number of simultaneous weather subsetting operations permitted, care should be taken to make sure the value is neither too high nor too low. Each weather subsetting operation uses a significant amount (depending on the weather data) of disk I/O, and hence, the property should be tuned accordingly. The suggested value is somewhere between 5 and 10 for the best scalability. In addition, if a node receives a number of simultaneous requests that exceeds the number of concurrent consumers, requests will be enqueued to be processed as consumers become available. This approach “guarantees” that the WCSRI node will not be overloaded with

8

requests, thereby killing performance. However, note that requests can be “starved” for resources if the number of requests is unreasonably high for the architectural scenario.

# WCSRI competing consumers for async req/reply. The number should be around 5 – 10.

gds.numberConcurrentConsumers=5

As mentioned before, any WCSRI node in a WCSRI cluster will process request/reply operations regardless of the initiating node. Each WCSRI node is mostly independent from the others. The node that receives the initial request (on its wcs.frontend.publish.url endpoint) will enqueue the request to ActiveMQ, to be processed by any consumer on any node (including the initial node).

Hence, adding WCSRI nodes to a cluster can dramatically improve request throughput (ie scalability).

3.1.2

Reverse Proxy

Since it does not matter which node receives (or processes) the initial request from a client, we recommend that a reverse proxy be used to balance client requests (eg round robin) across available WCSRI nodes. This will improve request/reply performance in that much of the request/reply processing (everything other than the actual weather data subsetting) will be balanced across nodes. In addition, a properly configured reverse proxy will provide highavailability/failover for the WCSRI, by only forwarding requests to nodes or endpoints that are alive or responsive.

3.2.

Weather Data File Systems (Input and Output)

The WCSRI reads native weather data files from disk, and in addition, writes temporary weather data subsets to disk. The root directories for the input and output weather data are specified by the following properties, and do not have to be on the same partition/disk:

# This is the root data directory of the WCSRI. All wx data accessible through the WCSRI

# must be under this directory.

wcs.rootDataDir=/d1/wx/testData

# This is the temporary directory where dynamic weather data will be written.

# Responses to data requests and other temporary information is stored here.

wcs.tempDir=/tmp

In a WCSRI cluster scenario, the input and output disks can either be shared across the nodes, or local to each node. No special WCSRI configuration is required for either scenario, other than specifying the wcs.rootDataDir in the wcsri.cfg file on each node. If the input data is local to each node, then replicated data disks must be used. Otherwise, there may be instances where a request to one node can not be satisfied by a processing node. Replication should be achieved by a commercial grade replication solution, rather than as a custom software solution, in order to prevent discrepancies. The alternative to a replicated input disk solution is to use a shared disk, shared by each WCSRI node within the cluster. Since each WCSRI node will only read the weather data from this disk (writes are discussed below), we have found that the performance of this

9

solution is satisfactory for high load and guarantees that each node in a cluster has access to the same data at any time. Hence, it seems this is the recommended approach especially if cost of your architecture is to be considered. However, care must be taken to ensure that you have a properly working shared file system, one in which exclusive file locks work properly (see Appendix B). In addition, high-availability/failover requirements may mandate the use of a fault-tolerant disk such as a SAN (beyond the scope of this document).

The output disk, where each WCSRI node writes its temporary weather data subsets, can be shared across the nodes in the cluster, or local to each node. The output directory is specified by the wcs.tempDir property in the wcsri.cfg file on each node. If the output disk is shared, performance may be hindered with high request/reply traffic, since each node will be competing for resources when writing weather data subsets to disk. On the other hand, output disks local to each node in the cluster is an alternative solution. This scenario requires that weather data subsets be streamed from the processing node (from the processor’s local disk) back to the initiating node, which is achieved with the properties listed below. Though this solution involves the additional step of streaming, we have found, through load-testing and experimentation, that this solution is far superior to using a shared output disk.

# Streaming is required when using a WCSRI cluster where each WCSRI node in the cluster

# has its own local output directory/disk for temporary weather files

gds.datasource.streaming=true

datastreamer.publishAddress=127.0.0.1 [change to node of IP address of node]


Note that the values specified by the datastreamer.publishAddress and datastreamer.port properties should be changes to match the IP address/port of the node, and must be accessible by each node in the cluster.

In summary, the recommended configuration (depending on requirements!) is to share the input disk across all WCSRI nodes (with a suitable shared file system), and use local output disks with streaming turned on.

3.3.

Pub/Sub and Idempotency Database (ie: RDBMS)

The WCSRI uses a relational database to store information about idempotent entities.

“Idempotent” is a term adopted from ActiveMQ and Camel, that simply means processing things once only. For example, in a WCSRI cluster using a shared input disk, only one of the WCSRI instances should process the retrieval of a specific weather data file from an upstream origin server. To keep it simple, the WCSRI stores information about ingested files, subscriptions, connected subscribers, and notification messages in the relational database.

The WCSRI supports the use of Oracle, Postgres, MySQL or Derby to be used as the backend database. The database-related properties specified in the wcsri.cfg are listed below. Since each database has its own mature (or not) solution for failover and scalability, the WCSRI implementation does not account for these. In other words, high-availability and load-balancing of the database is left to the adopted database implementation. However, the WCSRI will use any

10

JDBC URL specified in the jdbc.url property, and hence, will support failover and load-balancing if they are supported by the specified database drivers.

jdbc.driverClassName=org.hsqldb.jdbcDriver

jdbc.url=jdbc:hsqldb:hsql://127.0.0.1/wcsri_pubsub

jdbc.username=sa

jdbc.password=

jdbc.initialSize=5

jdbc.maxActive=10

jdbc.maxIdle=10

jdbc.defaultAutoCommit=false

transactionType=RESOURCE_LOCAL

# flag to disable the creation of the embedded hsql server/database. It’s wise to set this to

# false if other databases are being used.

useProvidedHsqlDBServer=true

Note that the default database for the WCSRI is HSQLDB, which should not be used in production.

Clearly, the RDBMS is a heavily used component used by the WCSRI. Therefore, the database should optimally run on its own machine in order to decrease the CPU and disk I/O on a cluster’s

WCSRI nodes.

3.4.

ActiveMQ Message Broker

The ActiveMQ message broker provides the infrastructure for the asynchronous nature and notification mechanism of a clustered WCSRI. ActiveMQ can be run internally to the WCSRI

(within servicemix) for a single node configuration, but in a clustered WCSRI, ActiveMQ must be run externally as a standalone application since each node in the cluster will need to access it.

Since ActiveMQ uses significant resources (both CPU and disk I/O), it ideally should be run on its own machine accessible from each node of the WCSRI cluster. The wcsri.cfg properties below show those that are applicable to ActiveMQ.

High-availability/failover of ActiveMQ can be achieved through a couple of different approaches documented in the ActiveMQ documentation. Master/Slave is a typical approach to failover (not scalability), as depicted in Figure 2. In this configuration, one node running ActiveMQ is the

“master”, while one or more other nodes running ActiveMQ are the “slaves”. While the master is healthy, it locks the use of ActiveMQ by other slave instances via a shared file system lock, and it will handle all ActiveMQ workload. The slaves, meanwhile, will be inactive until the master goes down and the file system lock is released. At that point, one of the slaves will then acquire the lock and become the master to begin the cycle all over again. Note that only a single ActiveMQ instance is “working” at a time, and hence, scalability is not improved. In addition, the master/slave scenario requires that the clients of ActiveMQ instance(s) must use failover URLs to connect, similar to the commented out examples shown below (eg: see amq.connection.url,

amq.client.connection.url, and amq.http.connector.url).

11

ActiveMQ uses an internal database called KahaDB, whose location is specified in the

$AMQ_HOME/conf/activemq.xml file. In order to use the Master/Slave failover configuration, each instance (masters and slaves) of ActiveMQ have access to the same, shared KahaDB (that is indeed what is exclusively locked as described above). Hence, the KahaDB must exist on a shared file system, one in which exclusive file locks work reliably (see Appendix B).

# The following parameters allow you to set the AMQ connection pool parameters. This is used internally by

# the WCSRI and is not advertised to clients. The port used (eg 61616) must match the port used by the

# openwire transportConnector used by your AMQ instance. If you are using the AMQ from within servicemix,

# then verify the openwire transportConnector in $FUSE_ESB_HOME/etc/activemq-broker.xml. For better

# performance, run AMQ external to servicemix on a different machine, and verify the openwire

# transportConnector in $AMQ_HOME/conf/activemq.xml.

amq.connection.url=tcp://localhost:61616?wireFormat.cacheEnabled=false&wireFormat.tightEncodingEna

bled=false&wireFormat.maxInactivityDurationInitalDelay=30000

amq.connection.maxConnections=8

amq.connection.maximumActive=500

# For AMQ failover in a master/slave configuration, the amq.connection.url will likely look similar to (with

# slashes "\" removed, machine names and ports replaced, etc):

# amq.connection.url=failover:(tcp://machine1:61916?wireFormat.cacheEnabled=false& \

# wireFormat.tightEncodingEnabled=false&wireFormat.maxInactivityDurationInitalDelay=30000, \

# tcp://machine2.rap.ucar.edu:61916?wireFormat.cacheEnabled=false& \

# wireFormat.tightEncodingEnabled=false&wireFormat.maxInactivityDurationInitalDelay=30000)

# Note that master/slave AMQ instances must share the Kahadb location (ie shared file system), which is

# specified in the $AMQ_HOME/conf/activemq.xml file in the persistenceAdapter's kahaDB property.

#Same as above but used for low prefetch values and uses

amq.client.connection.url=tcp://localhost:61616?wireFormat.cacheEnabled=false&wireFormat.tightEncod

ingEnabled=false&wireFormat.maxInactivityDurationInitalDelay=30000&jms.prefetchPolicy.all=1

amq.client.session.cache.size=10

# This is the http amq transport connector for firewall access. The domain name (don't use localhost or

# 0.0.0.0 in production since this has to be accessed from other machines) or IP address where AMQ runs.

# This is the AMQ host name that will be used by *clients* connecting for temporary queues and

# notifications. If you use a proxy machine, then this will likely be the domain name or IP of the proxy

# machine. Also, specify the port used for this AMQ transport connector. If you are using the AMQ from within

# servicemix, then verify that the port number matches that used by the "http_firewall" transportConnector

# in $FUSE_ESB_HOME/etc/activemq-broker.xml. For better performance, run AMQ external to servicemix on

# a different machine, and verify the port number matches that used by the "http_firewall"

# transportConnector in $AMQ_HOME/conf/activemq.xml.

amq.http.connector.url=http://granite.rap.ucar.edu:16200/tempQueue

#amq.http.connector.url=failover:(http://master:16300/tempQueue,http://slave:16300/tempQueue)/pubsu b.handshake

Figure 3 shows a more advanced WCSRI cluster configuration. An ActiveMQ master/slave configuration accounts for failover, but does not account for scalability. The bulk of the WCSRI

CPU load and I/O processing is due to weather data processing, and not due to the processing by

ActiveMQ. In other words, it seems ActiveMQ is not too burdened with the load placed on it by the

WCSRI’s asynchronous processing or notification messaging. However, if due to your service-level requirements it becomes necessary to improve the load-balancing of the messaging broker,

12

ActiveMQ supports a “Network of Brokers” configuration. In this scenario, different ActiveMQ instances are placed on different nodes with their own KahaDB, and each instance is communicating with the other and handling ActiveMQ workload. Each instance can also be set up for a master/slave failover scenario. For details on this configuration, see the ActiveMQ documentation.

4.

Minimum Cluster Configuration

Figure 2 shows a 4-node WCSRI cluster. However, a minimum WCSRI cluster configuration can be achieved with 2 nodes. A good architecture for a 2-node cluster would look as follows:

Cluster Node 1

 WCSRI/Servicemix

 ActiveMQ - Master

Cluster Node 2

 WCSRI/Servicemix

 ActiveMQ – Slave

 Pub/Sub and Idempotency Database

In addition, the WCSRI nodes would a use a wcs.rootDataDir that points to a shared input disk.

The shared input disk would also contain the KahaDB used by the ActiveMQ master and slave.

Each node would use their own local disk for temporary weather data files as specified by

wcs.tempDir, with streaming enabled in the WCSRI instances. This scenario does not take into account database failover or load-balancing, which could be achieved by putting another instance of the database on Cluster Node 1.

13

[Page “Intentionally” Left Blank]

14

Figure 3. Advanced WCSRI Cluster Configuration (Network of Brokers)

15

Appendix A: Sample wcsri.cfg

# WCSRI v 3.0-beta-2 configuration file. Put this file in [FUSE_ESB_HOME]/etc/config

#----------------------------------------------------------------------------

#------------- WCSRI bundles configuration --------------

#----------------------------------------------------------------------------

# This is the root data directory of the WCSRI. All data accessible through the WCSRI must be

# under this directory

wcs.rootDataDir=/d1/torp/myWcsriValidation/testData

# This is the temporary directory where dynamic data will be written. Responses to data requests

# and other temporary information is stored here

wcs.tempDir=/tmp

# WCS Frontend - this is the single endpoint for ALL wcsri services. 0.0.0.0 (loopback) is a likely

# value for the "hostname". eg: If the domain name for the exposed machine is

# myMachine.rap.ucar.edu, and you use http://0.0.0.0:8280/wcs here, then the services will be

# available at http://myMachine.rap.ucar.edu:8280/wcs

wcs.frontend.publish.url=http://0.0.0.0:8280/wcs

# This is the official advertised domain name/IP of the WCSRI services. If you use a proxy machine

# (eg myProxy.rap.ucar.edu) then this url should contain the name/port of the proxy (eg:

# myProxy.rap.ucar.edu:proxyPort/wcs). If no proxy, then use the domain name/IP/port of the

# exposed machine (NOT 0.0.0.0 or 127.0.0.1!).

wcs.frontend.external.url=http://0.0.0.0:8280/wcs

# True if you want SSL enabled

wcs.frontend.enableSSL=true

# Use the value of wcs.frontend.publish.url but with https and a different port

wcs.frontend.ssl.publish.url=https://0.0.0.0:8281/wcs

# Use the value of wcs.frontend.external.url but with https and a different port

wcs.frontend.ssl.external.url=https://0.0.0.0:8281/wcs

# The following parameters allow you to set the AMQ connection pool parameters. This is used

# internally by the WCSRI and is not advertised to clients. The port used (eg 61616) must match

# the port used by the openwire transportConnector used by your AMQ instance. If you are using

# the AMQ from within servicemix, then verify the openwire transportConnector in

# $FUSE_ESB_HOME/etc/activemq-broker.xml. For better performance, run AMQ external to

# servicemix on a different machine, and verify the openwire transportConnector in

# AMQ_HOME/conf/activemq.xml.

16

amq.connection.url=tcp://localhost:61616?wireFormat.cacheEnabled=false&wireFormat.tightE

ncodingEnabled=false&wireFormat.maxInactivityDurationInitalDelay=30000

amq.connection.maxConnections=8

amq.connection.maximumActive=500

# For AMQ failover in a master/slave configuration, the amq.connection.url will likely look similar

# to (with slashes "\" removed, machine names and ports replaced, etc):

# amq.connection.url=failover:(tcp://machine1:61916?wireFormat.cacheEnabled=false& \

#wireFormat.tightEncodingEnabled=false&wireFormat.maxInactivityDurationInitalDelay=30000,

# tcp://machine2.rap.ucar.edu:61916?wireFormat.cacheEnabled=false& \

#wireFormat.tightEncodingEnabled=false&wireFormat.maxInactivityDurationInitalDelay=30000)

# Note that master/slave AMQ instances must share the Kahadb location (ie shared file system),

# which is specified in the $AMQ_HOME/conf/activemq.xml file in the persistenceAdapter's

# kahaDB property.

#Same as above but used for low prefetch values and uses

amq.client.connection.url=tcp://localhost:61616?wireFormat.cacheEnabled=false&wireFormat.

tightEncodingEnabled=false&wireFormat.maxInactivityDurationInitalDelay=30000&jms.prefetch

Policy.all=1

amq.client.session.cache.size=10

# Example for the amq.client.connection.url with failover. This connection is provided to a

# org.springframework.jms.connection.CachingConnectionFactory and for some yet unknown

# reason do not like extra amq parameter in the url when using 'failover' only

# jms.prefetchPolicy.all=1

#amq.client.connection.url=failover:(tcp://machine1:61916,tcp://machine2 \

#:61916)?jms.prefetchPolicy.all=1

# This is the http amq transport connector for firewall access. The domain name (don't use

# localhost or 0.0.0.0 in production since this has to be accessed from other machines) or IP

# address where AMQ runs. This is the AMQ host name that will be used by *clients* connecting

# for temporary queues and notifications. If you use a proxy machine, then this will likely be the

# domain name or IP of the proxy machine.

#

# Also, specify the port used for this AMQ transport connector. If you are using the AMQ from

# within servicemix, then verify that the port number matches that used by the "http_firewall"

# transportConnector in $FUSE_ESB_HOME/etc/activemq-broker.xml. For better performance,

# run AMQ external to servicemix on a different machine, and verify the port number matches that

# used by the "http_firewall" transportConnector in $AMQ_HOME/conf/activemq.xml.

amq.http.connector.url=http://granite.rap.ucar.edu:16200/tempQueue

#amq.http.connector.url=failover:(http://master:16300/tempQueue,http://slave:16300/tempQu eue)/pubsub.handshake

# This is the url to connect and manage the jmx server where amq broker mbeans can be accessed

# this default is if an amq broker is embedded in WCSRI and using the default fuse management

17

# parameters. If the management parameters are changed in tc/org.apache.karaf.management.cfg

# then change appropriately.

amq.jmx.connection.url=service:jmx:rmi:///jndi/rmi://localhost:1099/karaf-root

# Example for an standalone amq. Change host and port accordingly to your system.

#amq.jmx.connection.url=service:jmx:rmi:///jndi/rmi://changeMeToMyRealCompleteDomainNa meOrIPAddres:1098/jmxrmi

# Example for master/slave amq setup. Change host (master/slave) and port accordingly to your

# system

#amq.jmx.connection.url=service:jmx:rmi:///jndi/rmi://master:1098/jmxrmi,service:jmx:rmi://

/jndi/rmi://slave:1098/jmxrmi

# credentials for remote access if necessary. If not, leave the "value" empty but do not comment

# it out

amq.jmx.connection.username=smx

amq.jmx.connection.password=smx

#-------------------------------------------------------------------------------

#-------------- Persistence -------------

#-------------------------------------------------------------------------------

# Persistence - HSQLDB is required for the WCSRI validation tests to work, but do not use HSQLDB

# in production (set useProvidedHsqlDBServer=false below). In cluster environment (more than

# one WCSRI sharing db) replace 127.0.0.1 with the IP address where the db server runs

jdbc.driverClassName=org.hsqldb.jdbcDriver

jdbc.url=jdbc:hsqldb:hsql://127.0.0.1/wcsri_pubsub

jdbc.username=sa

jdbc.password=

jdbc.initialSize=5

jdbc.maxActive=10

jdbc.maxIdle=10

jdbc.defaultAutoCommit=false

transactionType=RESOURCE_LOCAL

# Here are some examples of the properties for an external database – don’t forget to disable

# HSQLDB when using your own database. Also make sure you use the appropriate DB driver for

# your DB, and put the driver jar into [FUSE_ESB_HOME]/lib.

# mysql

# jdbc.driverClassName=com.mysql.jdbc.Driver

# jdbc.url=jdbc:mysql://128.replaceMe.xxx.xx/wcsri_pubsub

# postgresql

# jdbc.driverClassName=org.postgresql.Driver

# jdbc.url=jdbc:postgresql://127.0.0.1:5432/wcsri_pubsub

18

# Derby

# jdbc.driverClassName=org.apache.derby.jdbc.ClientDriver

# jdbc.url=jdbc:derby://127.0.0.1:1527/wcsri_pubsub

#-------------------------------------------------------------------------------

#------------- HSQLDB --------------

#-------------------------------------------------------------------------------

# HSQL internal memory DB

# type of hsql tables when not specified in the sql script (memory/cached)

dbTableType=cached

# flag to disable the creation of the embedded hsql server/database

# its wise to set this to false if other databases are being used.

useProvidedHsqlDBServer=true

# Directory where to store HSQL DB logs

hsqlLogFileDir=${karaf.base}/etc/hsql/logs

###################################################################

###################################################################

#############

############## The remainder of the properties in this file #########

############## (below) do NOT typically require modification #########

###################################################################

#############

###################################################################

# OGC WCS supported specification

wcs.version=1.1.2

#-------------------------------------------------------------------------------

#-------------- Services Metadata --------------

#-------------------------------------------------------------------------------

# This is the name of the services meta data xml file, that must reside in the wcs.rootDataDir (as

# defined above)

wcs.servicesMetaFile=servicesMeta.xml

# This is the optional name of the ISO services metadata xml file. If it's defined, it must reside

# relative to the wcs.rootDataDir as defined above wcs.isoServicesMetaFile=iso-metadata-service-

# NCAR-WCS-01.xml

#-------------------------------------------------------------------------------

#-------------- netcf and Janitor -------------

#-------------------------------------------------------------------------------

# Configuration for type of netcdf output file writing. nujan-java is the recommended pure java

19

# netcdf writer that does not require JNI.

wcs.outputFileWrapperType=nujan-java

#wcs.outputFileWrapperType=mit-jni

# janitor configuration for tempDir

# Cron expression that instructs the janitor to check every 15 minutes

wcs.tempCleanupFreq=0 0/15 * * * ?

# Duration expression that instructs the janitor to remove files that have not been modified in the

# last 15 minutes

wcs.tempMaxAge=-PT15M

#-------------------------------------------------------------------------------

#-------------- WCSRI Clustering -------------

#-------------------------------------------------------------------------------

# Flags for enabling/disabling asynchronous under the hood req/reply for wcs operations. These

# must be true for a clustered WCSRI.

wcs.async.getCvg=true

wcs.async.descvg=true

wcs.async.getCaps=true

wcs.async.getMeta=true

# Timeouts in ms for asynchronous req/reply for wcs operations

wcs.async.getCvg.timeout=60000

wcs.async.descvg.timeout=60000

wcs.async.getCaps.timeout=60000

wcs.async.getMeta.timeout=60000

# GDS competing consumers & async req/reply with jms. gds.numberConcurrentConsumers

# should be around 5

gds.numberConcurrentConsumers=5

# name of the gds competing consumer processing queue

gds.processorQueue=gds.processor

#-------------------------------------------------------------------------------

#-------------- SWA vs MTOM Endpoints --------------

#-------------------------------------------------------------------------------

# This are the internal soap http endpoint for the WCSRI OGC Request/Reply services. Use "http"

# as the protocol and a port number. 127.0.0.1 is a likely value for the "hostname". For now there

# are two endpoints for SOAP - one for MTOM and one for Soap With Attachments (SWA).

# MTOM is deprecated and soon only SWA will be supported. However, for now, specify an MTOM

# endpoint with wcs.httpEndpoint, but only expose the SWA endpoint with

# wcs.soap.httpEndpoint.

wcs.httpEndpoint=http://127.0.0.1:8887/nnew/fy10/wcs/soap

20

# This is the SWA endpoint

wcs.swa.httpEndpoint=http://127.0.0.1:8877/nnew/fy10/soap

# Only ONE of wcs.httpEndpoint OR wcs.swa.httpEndpoint will be exposed.

# Use the wcs.swa.httpEndpoint since MTOM is deprecated. ie: do not modify the value below.

wcs.soap.httpEndpoint=${wcs.swa.httpEndpoint}

# This is if you want MTOM enabled for SOAP responses. true or false. MTOM is deprecated.

wcs.mtomEnabled=false

#-------------------------------------------------------------------------------

#------------- Internal KVP Endpoint --------------

#-------------------------------------------------------------------------------

# This is the internal http kvp (key/value pairs) endpoint for the WCSRI OGC Request/Reply

# services. Use "http" as the protocol and a port number. 127.0.0.1 (loopback) is a likely value for

# the "hostname".

wcs.kvp.httpEndpoint=http://127.0.0.1:8883/nnew/fy10/wcs/kvpService

# To enable or disable kvp endpoint. KVP is not required, hence you may disable it

wcs.kvp.disabled=false

#-------------------------------------------------------------------------------

#-------------- Pub/Sub -------------

#-------------------------------------------------------------------------------

# These are endpoints for pubsub services. 127.0.0.1 (loopback) is a likely value for the

# "hostname".

wcs.subscribe.httpEndpoint=http://127.0.0.1:8888/NotificationProducer

wcs.mngSubscription.httpEndpoint=http://127.0.0.1:8889/SubscriptionManager

# Subscription Management

submgt.abandoned.days=7

submgt.deleted.hold.days=7

submgt.transformer.list=DESCRIPTION COVERAGE|jms:pubsub.xformer.DescribeCoverage,GET

COVERAGE|jms:pubsub.xformer.GetCoverage

#-------------------------------------------------------------------------------

#-------------- Security -------------

#-------------------------------------------------------------------------------

# Set this to true if you want to enable authentication

wcs.securityEnabled=false

# Specify the keystoreType, such as JKS. You should likely ALWAYS generate a JKS certificate, since

# our CXF implementation currntly only works with JKS.

wcs.keystoreType=JKS

21

# Enter the password for your keystore

wcs.keystorePassword=password

# Password for decrypting the private key

wcs.keyPassword=password

# Generate your server certificate, and set the path here.

wcs.keystoreFile=${karaf.base}/etc/server.jks

# Specify the name of the realm as specified in the security bundle's nnew_security.cfg file

wcs.securityRealm=nnew

#-------------------------------------------------------------------------------

#-------------- Remote Management -------------

#-------------------------------------------------------------------------------

# remote management

wcs.remote_mng.context=remote

wcs.remote_mng.httpEndpoint=http://0.0.0.0:8194

#-------------------------------------------------------------------------------

#-------------- Other -------------

#-------------------------------------------------------------------------------

# datasource-poller

datasource-poller.reprocess=false

datasource-poller.overwrite=false

datasource-poller.idempotentDir=duplicatedFiles

# WCS Frontend - this is the single endpoint for ALL wcsri services.

wcs.frontend.headerBuffer=101024

wcs.frontend.requestBuffer=101024

#-------------------------------------------------------------------------------

#-------------- Repeaters -------------

#-------------------------------------------------------------------------------

# A file containing username/password pairs to be used to authenticate upstream.

# If not provided the repeater must not require authentication upstream

coverage-repeater.security.credentials=

# flag indicating that repeaters who's upstream server is an https/ssl endpoint will download

# and save the server certificate automatically.

coverage-repeater.installCert=true

# flag indicating the https URL hostname does not need to match the Common Name (CN) on

# the server certificate". Not recommended in production. Use disableCNCheck = false

coverage-repeater.disableCNCheck=true

22

#-------------------------------------------------------------------------------

#-------------- Delegators -------------

#-------------------------------------------------------------------------------

# A file containing username/password pairs to be used to authenticate upstream. If not provided

# the delegator must not require authentication upstream

datasource-delegator.security.credentials=

# flag indicating that delegators who's upstream server is an https/ssl endpoint will download

# and save the server certificate automatically.

datasource-delegator.installCert=true

# flag indicating the https URL hostname does not need to match the Common Name (CN) on the

# server certificate". Not recommended in production. Use disableCNCheck = false

datasource-delegator.disableCNCheck=true

#-------------------------------------------------------------------------------

#-------------- Streamer -------------

#-------------------------------------------------------------------------------

gds.datasource.streaming=false

datastreamer.publishAddress=127.0.0.1


#-------------------------------------------------------------------------------

#-------------- Other -------------

#-------------------------------------------------------------------------------

# WCSRI configuration file containing all external properties

wcsri.cfg.file=${karaf.base}/etc/config/wcsri.cfg

# Bundles properties metadata file

wcsri.bundles.props.file=${karaf.base}/etc/config/wcsri-bundle.properties

# Location where the generated bundle cfg files should be placed

wcsri.bundles.cfg.dir=${karaf.base}/etc

#

wcsri.cfg.watchPeriod=30000

wcsri.cfg.track=true

wcsri.bundles.track=false

23

Appendix B. Shared File Systems

Below are some excerpts found on the ActiveMQ website regarding shared file systems.

4.1.

OCFS2

OCFS2 was tested and both brokers thought they had the master lock - this is because "OCFS2 only supports locking with 'fcntl' and not 'lockf and flock', therefore mutex file locking from Java isn't supported."

From http://sources.redhat.com/cluster/faq.html#gfs_vs_ocfs2 :

OCFS2: No cluster-aware flock or POSIX locks

4.2.

NFS3

In the event of an abnormal NFSv3 client termination (i.e., the ActiveMQ master broker), the

NFSv3 server will not timeout the lock that is held by that client. This effectively renders the

ActiveMQ data directory inaccessible because the ActiveMQ slave broker can't acquire the lock and therefore cannot start up. The only solution to this predicament with NFSv3 is to reboot all ActiveMQ instances to reset everything.

4.3.

NFS4

Use of NFSv4 is another solution because it's design includes timeouts for locks. When using

NFSv4 and the client holding the lock experiences an abnormal termination, by design, the lock is released after 30 seconds, allowing another client to grab the lock.

24

Appendix A: Sample wcsri.cfg - RAL

High-availability and Scalability

Architectural Alternatives and Configurations for

NCAR NNEW WCSRI 3.0

1.

Overview

2.

WCSRI 3.0 Overview

2.1.

Data Ingest

2.2.

Data Retriever

2.3.

Subscription Management

2.4.

Pub/Sub Notification Producer

3.

The WCSRI and Other Components

3.1.

WCSRI / Servicemix

3.2.

Weather Data File Systems (Input and Output)

3.3.

Pub/Sub and Idempotency Database (ie: RDBMS)

3.4.

ActiveMQ Message Broker

4.

Minimum Cluster Configuration

[Page “Intentionally” Left Blank]

Appendix B. Shared File Systems

4.1.

OCFS2

4.2.

NFS3

4.3.

NFS4

Related documents

Products

Support