® IBM Software Group How WLM routing and HA Manager work together in WebSphere Application Server ND Krishna Jaladhi (krishnaj@us.ibm.com) Kumaran Nathan(kumaran@us.ibm.com) WebSphere Application Server Level 2 support May 14 2015 WebSphere® Support Technical Exchange Click to add text IBM Software Group Agenda WLM Overview Clusters HA Basic WLM routing Logic How nodeagent involved in WLM How ORB involved in WLM Why bridging Coregroups Request Flow Important Custom Property Common issues WebSphere® Support Technical Exchange 2 IBM Software Group WLM Overview WebSphere® Support Technical Exchange 3 IBM Software Group WLM WorkLoad Management (WLM) is a WebSphere Application Server (WAS) facility that provides load balancing , fail over and affinity between application servers in a WebSphere clustered environment. Types of Workload Management (WLM) in WebSphere Application Server HTTP Server Plug-In WLM Enterprise Java™ Bean (EJB™) WLM EJB WLM balances WLM enabled RMI/IIOP requests between clients and clusters JNDI™ Lookups, EJB creates, EJB business methods and EJB removes WebSphere® Support Technical Exchange 4 IBM Software Group WLM - HTTP Server Plug-In Detects Failure Marks Container as Unavailable Tries Next Server in ServerCluster Round Robin or Random for Web Server Web Container Servlet Requests HTTP Server App Server Plug-in HTTP(S) Protocol Traffic Web Container App Server WebSphere® Support Technical Exchange 5 IBM Software Group WLM – EJB Client ORB Plug-in Detects Failure Marks Container as Unavailable Tries Next Server in ServerCluster Weighted Round Robin Request Distribution WebSphere® Support Technical Exchange 6 IBM Software Group Clusters WebSphere® Support Technical Exchange 7 IBM Software Group Clusters Clusters are sets of servers that are managed together and participate in workload management. Clusters also enable enterprise applications to be highly available because requests are automatically routed to the running servers in the event of a failure. The servers that are members of a cluster can be on different host machines. A cell can include no clusters, one cluster, or multiple clusters. Servers that belong to a cluster are members of that cluster set and must all have identical application components deployed on them. WebSphere® Support Technical Exchange 8 IBM Software Group Types of clusters Vertical Cluster A vertical cluster has cluster members on the same node, or physical machine. Horizontal Cluster A horizontal cluster has cluster members on multiple nodes across many machines in a cell. • You can configure either type of cluster, or have a combination of vertical and horizontal clusters WebSphere® Support Technical Exchange 9 IBM Software Group WLM Routing Policy WebSphere® Support Technical Exchange 10 IBM Software Group WLM Routing Policy Routing is based on weights associated with cluster members. Round robin algorithm used when weights are equal Weights can be modified to send more requests to a particular cluster member or members More information about WLM routing from the V8.5 Knowledge Center: http://www14.software.ibm.com/webapp/wsbroker/redirect?version=phil&product=was-nd-dist&topic=crun_srvgrp If the EJB client is on the same physical box as the cluster member, the “Prefer Local” setting will ensure that all requests from the client go to the local cluster member. WebSphere® Support Technical Exchange 11 IBM Software Group WLM Routing Policy – Cont .. Change the configuration and runtime weights by using the “Update” button Weights are only meaningful when they are compared to the other member weights. “Make Idle” button sets the selected member’s configuration and runtime weight to 0 WebSphere® Support Technical Exchange 12 IBM Software Group HA Basics WebSphere® Support Technical Exchange 13 IBM Software Group HA Basics The HAManager Framework (HAM) An integral part starting WAS* 6.0 (and WAS XD) designed to provide an infrastructure for making selected WAS services highly available. • Present in all JVM™s including Deployment Manager and Node Agents Can be used by other internal WebSphere components to provide automatic failover support. Significant only in ND** environments • HAM provides no extra value for Base or any configuration consisting of a single server. Provides various asynchronous callbacks for interested internal WAS components. The Configuration of High Availability Systems is Simplified Works out of the box in most cases - No additional administration required for most commonly used topology *WAS – WebSphere Application Server ** ND – Network Deployment WebSphere® Support Technical Exchange 14 IBM Software Group HA Basics Four basic HAManager services Bulletin Board (BB) • • A server state data exchange mechanism Typically used for carrying routing information (WLM and ODC) High Availability Groups (HAGroups) • • • Provides the HAManager policy based fail-over mechanism Users : ME, Transaction Manager, JCA Adapter and more The most visible service Agent Framework • Provides hot backup service from primary to N backup members • High throughput service used by DRS for data replication Partitioned Managed Group (PMG) • • A distributed state manager (conceptually a distributed cache). Used by the core group bridge service to forward bulletin board communication between core groups WebSphere® Support Technical Exchange 15 IBM Software Group CoreGroup A core group is a collection of processes (JVM’s) A core group is statically defined in the cell-scoped coregroup.xml configuration file A cluster can not span across core groups All of the cluster members must belong to the same core group A core group can have more than one clusters A core group can not span across cells All the members of a core group must belong to same cell A cell can contain more than one core group The recommended number of active members in a core group is 50 but can be extend to 100 with IBM_CS_WIRE_FORMAT_VERSION custom property enabled. WebSphere® Support Technical Exchange 16 IBM Software Group Multiple Core Groups - Bridging HA Manager services are normally restricted to a single core group (HA domain). There is one exception. The HAM state data exchange (bulletin board) service can be configured to communicate across core groups. The Core Group Bridge Service (CGBS) is the component that allows this cross core group communication. CGBS allows HA Manager “bulletin board” communication to span core groups. Core Groups within a cell can be bridged (intra-cell) Core Groups across cells can be bridged (inter-cell) WebSphere® Support Technical Exchange 17 IBM Software Group Role of Nodeagent and ORB in WLM WebSphere® Support Technical Exchange 18 IBM Software Group How Nodeagent involved in WLM Node Agent process provides service called Location Service Daemon(LSD) Application Server will register with Node agent during server startup LSD provides known direct (Interoperable Object Reference) IORs to caller WLM uses LSD cluster info for routing the EJB requests Don'ts Node agent should not be used for bootstrap in distributed OS environments. Only z/OS® environment is allowed for bootstrap.. WebSphere® Support Technical Exchange 19 IBM Software Group How Object Request Broker (ORB) is involved WebSphere Application Server uses the ORB to facilitate client/server communication EJB container depends on ORB for interactions between client/server ORB manages inbound and outbound requests for remote Java objects ORB provides a framework for clients to locate EJBs on the server Inter ORB communication is accomplished via the IIOP/GIOP using Interoperable Object Reference (IOR) which ORBs can understand and act. WebSphere® Support Technical Exchange 20 IBM Software Group Why bridging Coregroups WebSphere® Support Technical Exchange 21 IBM Software Group Why bridging Coregroups are important ? WebSphere Application Server large topologies can be configured with multiple HA core groups to reduce usage of system resources such as memory and CPU. HA core group provides bulletin board service for each core group managed independently and bulletin board information can be shared within a core group WLM depends on the HAManager component’s BulletinBoard service to aggregate and propagate the run time cluster description information. Due to practical limitations it may not be possible to have a nodeagent in each cluster that belongs different coregroups. HA manager core group bridge service allows communication between multiple core groups. Bridging between coregroups can be done between core groups within a cell or across multiple cells. WebSphere® Support Technical Exchange 22 IBM Software Group Request Flow WebSphere® Support Technical Exchange 23 IBM Software Group Request Flow serverName="nodeagent" BOOTSTRAP_ADDRESS = "2810" ORB_LISTENER_ADDRESS= "9101" serverName="ClientServer1" BOOTSTRAP_ADDRESS = "2827" ORB_LISTENER_ADDRESS =”0” serverName="EJBServer1" BOOTSTRAP_ADDRESS = "2829" ORB_LISTENER_ADDRESS =”9401” ic = new InitialContext(); Object ejbObject = ic.lookup("java:comp/env/BeenThereBean"); BeenThereHome beenThereHome = (BeenThereHome)javax.rmi.PortableRemoteObject.narrow(ejbObject, BeenThereHome.class); beenThere = beenThereHome.create(); WebSphere® Support Technical Exchange 24 IBM Software Group Request Flow Contd.. After obtaining InitialContext code does ic.lookup("java:comp/env/BeenThereBean"); Lookup Request sent from ClientServer1: ================================== OUT GOING: Locate Request Message Date: April 30, 2015 1:23:58 PM EDT Thread Info: WebContainer : 0 Local Port: 61248 (0xEF40) Local IP: 169.254.218.10 Remote Port: 2829 (0xB0D) Remote IP: 169.254.218.10 -Request ID: 68 (0x44) Object Key: length = 21 (0x15) Lookup Response from EJBServer1: ====================================== OUT GOING: Locate Reply Message Date: April 30, 2015 1:23:58 PM EDT Thread Info: ORB.thread.pool : 0 Local Port: 2829 (0xB0D) Local IP: 169.254.218.10 Remote Port: 61248 (0xEF40) Remote IP: 169.254.218.10 -Request ID: 68 (0x44) Reply Status: OBJECT_FORWARD IOR forwarded by EJBServer1 is indirect IOR pointing to LSD [4/28/15 15:06:31:209 EDT] 00000099 WLMIOR [4/28/15 15:06:31:209 EDT] 00000099 WLMIOR [4/28/15 15:06:31:209 EDT] 00000099 WLMIOR [4/28/15 15:06:31:209 EDT] 00000099 WLMIOR [4/28/15 15:06:31:210 EDT] 00000099 WLMIOR > getWLMIOR Entry 3 getWLMIOR - typeid= IDL:com.ibm/WsnOptimizedNaming/NamingContext:1.0 3 getWLMIOR - host= ADMINIB-NDLUO2T 3 getWLMIOR - port= 9101 3 getWLMIOR - objectKey= 0x4a4d4249000000124710c01238613730343731306330313265346238000000240 WebSphere® Support Technical Exchange 25 IBM Software Group Request Flows Contd.. Response sent from Node agent ========================== Request sent from ClientServer1:lookup contd.. ================================= [4/30/15 13:23:58:491 EDT] 00000094 ORBRas 3 com.ibm.rmi.ras.Trace dump:84 WebContainer : 0 OUT GOING: Request Message Date: April 30, 2015 1:23:58 PM EDT Thread Info: WebContainer : 0 Local Port: 61130 (0xEECA) Local IP: 169.254.218.10 Remote Port: 9101 (0x238D) Remote IP: 169.254.218.10 -Message header fragmented Request ID: 70 [4/30/15 13:23:58:542 EDT] 000000c6 ORBRas dump:84 ORB.thread.pool : 0 OUT GOING: Reply Message Date: April 30, 2015 1:23:58 PM EDT Thread Info: ORB.thread.pool : 0 Local Port: 9101 (0x238D) Local IP: 169.254.218.10 Remote Port: 61130 (0xEECA) Remote IP: 169.254.218.10 3 com.ibm.rmi.ras.Trace Fragment to follow: Yes Message size: 1012 (0x3F4) -Message header fragmented Request ID: 70 IOR forwarded by Node agent is indirect IOR pointing to EJBServer1 [4/30/15 13:23:58:545 EDT] 00000094 WLMIOR [4/30/15 13:23:58:545 EDT] 00000094 WLMIOR [4/30/15 13:23:58:545 EDT] 00000094 WLMIOR [4/30/15 13:23:58:545 EDT] 00000094 WLMIOR [4/30/15 13:23:58:545 EDT] 00000094 WLMIOR > 3 3 3 3 WebSphere® Support Technical Exchange getWLMIOR Entry getWLMIOR - typeid= getWLMIOR - host= ADMINIB-NDLUO2T getWLMIOR - port= 9104 getWLMIOR - objectKey= R - typeid= 26 IBM Software Group Request Flows Contd.. Request sent from ClientServer1 lookup contd ..: ======================================== [4/28/15 15:06:31:524 EDT] 00000099 ORBRas 3 com.ibm.rmi.ras.Trace dump:84 WebContainer : 0 OUT GOING: Request Message Date: April 28, 2015 3:06:31 PM EDT Thread Info: WebContainer : 0 Local Port: 61250 (0xEF42) Local IP: 169.254.218.10 Remote Port: 9104 (0x2390) Remote IP: 169.254.218.10 Response sent from EJBServer1:Lookup contd.. ======================================== [4/30/15 13:23:58:959 EDT] 0000008f ORBRas 3 com.ibm.rmi.ras.Trace dump:84 ORB.thread.pool : 0 OUT GOING: Reply Message Date: April 30, 2015 1:23:58 PM EDT Thread Info: ORB.thread.pool : 0 Local Port: 9104 (0x2390) Local IP: 169.254.218.10 Remote Port: 61250 (0xEF42) Remote IP: 169.254.218.10 -Message header fragmented Request ID: 76 -Request ID: 76 Reply Status: NO_EXCEPTION 0050: 636F6D2E 69626D2E 77656273 70686572 com.ibm.webspher 0060: 652E7361 6D706C65 732E6265 656E7468 e.samples.beenth 0070: 6572652E 4265656E 54686572 65486F6D ere.BeenThereHom 0080: 653A3030 30303030 30303030 30303030 e:00000000000000 0090: 303000BD 00000001 00000000 000001EC 00.............. WebSphere® Support Technical Exchange 27 IBM Software Group Request Flow Contd.. Request sent from ClientServer1 create: ================================= [4/30/15 13:23:59:002 EDT] 00000094 ORBRas com.ibm.rmi.ras.Trace dump:84 WebContainer : 0 OUT GOING: Request Message Date: April 30, 2015 1:23:59 PM EDT Thread Info: WebContainer : 0 Local Port: 61250 (0xEF42) Local IP: 169.254.218.10 Remote Port: 9104 (0x2390) Remote IP: 169.254.218.10 3 -Message header fragmented Request ID: 78 0060: AC000200 01002900 00005F5F 686F6D65 ......)...__home 0070: 4F66486F 6D657323 5F5F686F 6D654F66 OfHomes#__homeOf 0080: 486F6D65 73235F5F 686F6D65 4F66486F Homes#__homeOfHo 0090: 6D657308 4265656E 54686572 65234265 mes.BeenThere#Be 00A0: 656E5468 6572652E 6A617223 4265656E enThere.jar#Been 00B0: 54686572 65426561 6EBDBDBD 00000007 ThereBean....... 00C0: 63726561 746500BD 00000003 49424D25 create......IBM% WebSphere® Support Technical Exchange Response received from EJBServer1 create: ==================================== [4/30/15 13:23:59:030 EDT] 0000008f ORBRas 3 com.ibm.rmi.ras.Trace dump:84 ORB.thread.pool : 0 OUT GOING: Reply Message Date: April 30, 2015 1:23:59 PM EDT Thread Info: ORB.thread.pool : 0 Local Port: 9104 (0x2390) Local IP: 169.254.218.10 Remote Port: 61250 (0xEF42) Remote IP: 169.254.218.10 -Request ID: 78 Service Context: length = 2 (0x2) Context ID: 1229081874 (0x49424D12) Reply Status: NO_EXCEPTION 0050: 636F6D2E 69626D2E 77656273 70686572 0060: 652E7361 6D706C65 732E6265 656E7468 0070: 6572652E 4265656E 54686572 653A3030 0080: 30303030 30303030 30303030 303000BD com.ibm.webspher e.samples.beenth ere.BeenThere:00 00000000000000.. 28 IBM Software Group Important Custom Properties WebSphere® Support Technical Exchange 29 IBM Software Group IBM_CLUSTER_FEEDBACK_MECHANISM By default WLM plug-in uses a combination of the cluster member weights and the number of outstanding requests for each cluster member. Feedback mechanism can be changed by setting the cell custom property IBM_CLUSTER_FEEDBACK_MECHANISM on the target cell. Allowed values are: 0: Use only the configured weights to determine routing. 1: Use blending of weights and outstanding requests (default behavior). 2: Use only the outstanding requests to determine routing. 3: No extra feedback mechanism, does not take configured weights or outstanding requests into account. This is functionally equivalent to routing based on all servers having equal weights; any changes to the configured weights would be ignored When trace enabled following trace point can be used to check if default value has been changed. [4/28/15 15:00:36:394 EDT] 00000001 WLMCustomProp 3 forceFeedbackString is currently set to false WebSphere® Support Technical Exchange 30 IBM Software Group com.ibm.CORBA.ConnectTimeout The com.ibm.CORBA.ConnectTimeout property specifies the maximum time, in seconds, that the client ORB waits prior to timing out when attempting to establish an IIOP connection with a remote server ORB. Typically, client applications use this property. You can specify the property for each individual application server through the administrative console. Note : The default for the com.ibm.CORBA.ConnectTimeout property for Version 8 and later is 10. Earlier than Version 8, the default is 0 WebSphere® Support Technical Exchange 31 IBM Software Group com.ibm.CORBA.RequestTimeout Specifies the number of seconds to wait before timing out on a request message Default Value is 180 seconds Click Servers > Server Types > WebSphere application servers > server_name > Container services > ORB service If you use standalone client or thin client, set com.ibm.CORBA.RequestTimeout as system property WebSphere® Support Technical Exchange 32 IBM Software Group Common Issues WebSphere® Support Technical Exchange 33 IBM Software Group Issue 1: Forward limit reached Caused by: org.omg.CORBA.NO_IMPLEMENT: Forward limit reached vmcid: 0x49421000 minor code: 40 completed: No at com.ibm.ws.cluster.router.selection.SelectionManager.targetForwarded(SelectionManager.java:366) at com.ibm.ws.wlm.client.WLMClientRequestInterceptor.receive_other(WLMClientRequestInterceptor.java:363) at com.ibm.rmi.pi.InterceptorManager.invokeInterceptor(InterceptorManager.java:599) ... at com.ibm.CORBA.iiop.ClientDelegate.invoke(ClientDelegate.java:1320) NO_IMPLEMENT exception means that a requested object could not be located. For example, a NO_IMPLEMENT error is raised when a server does not exist or is not running when a clients initiates a request. Creating multiple instances of ORB can cause this issue. This can be easily identified using javacore or thread dump This is what an ORB reader thread looks like in javacore RT=383:P=570960:O=123:WSSSLTransportConnection[addr=XXX.XXX.XXX,port=47765,local=46289]" (TID:0x57FD8938, This is what an ORB listener thread id looks like. "LT=496:P=570960:O=123:port=46288" (TID:0x57FD8990, sys_thread_t:0x9AB1EF10, state:R, native ID:0x22E7FB) prio=5 The O= value is the ORB instance id. WebSphere® Support Technical Exchange 34 IBM Software Group Issue 2: No Cluster Data Available org.omg.CORBA.NO_IMPLEMENT: No Cluster Data Available vmcid: 0x49421000 minor code: 42 completed: No at com.ibm.ws.cluster.router.selection.WLMLSDRouter.select(WLMLSDRouter.java:295) at com.ibm.ws.cluster.propagation.ServerClusterContextListenerImpl.forwardRequest(ServerClusterContextListenerImpl.java:625) at com.ibm.ws.cluster.propagation.ServerClusterContextListenerImpl.validateRequest(ServerClusterContextListenerImpl.java:669) at com.ibm.ws.wlm.server.WLMServerRequestInterceptor.notifyValidationListeners(WLMServerRequestInterceptor.java:317) at com.ibm.ws.wlm.server.WLMServerRequestInterceptor.receive_request_service_contexts(WLMServerRequestInterceptor.java:206) at com.ibm.rmi.pi.InterceptorManager.invokeInterceptor(InterceptorManager.java:621) This message is seen in Nodeagent , When a client makes the first request to a cluster, WLM plugin has no information yet about the target cluster members in order to do routing in an attempt to route the request, it sends it to node agents in the target cell. The node agents are expected to have data about the clusters which they can use to forward the request to a cluster member. If there are multiple core groups, make sure they are bridged. Verify the core groups are bridged in the coregroupbridge.xml Inside every <coregroupAccessPoint> there must be a <bridgeInterface> defined. If there is not a bridgeinterface for every coregroupAccessPoint, the core groups are not bridged. WebSphere® Support Technical Exchange 35 IBM Software Group Issue 3 : NoAvailableTargetExceptionImpl com.ibm.ws.cluster.selection.NoAvailableTargetExceptionImpl: Removal () Applicable Targets [] Removal () com.ibm.ws.cluster.selection.SelectionCriteriaImpl@7666a71[{CELLNAME=kumaranCell01, CLUSTERNAME=cluster1}:{rules.restriction=[Lcom.ibm.wsspi.cluster .selection.SelectionRule;@28334}]] at com.ibm.ws.cluster.selection.SelectionCriteriaImpl.select(SelectionCriteriaImpl.java:261) Above message inform that WLM doesn't know about endpoints it can select to route the request to a target member Check for if HA Manager is enabled in all core group members Bridge Coregroups on the server side (if the nodeagent is not part of the cluster member coregroup) WebSphere® Support Technical Exchange 36 IBM Software Group Issue 4 : HA Manager View instability In order for WLM to work properly, the HA Manager view should be stable HA Manager view may not be stable, if there are any (OutOfMemory) OOM Error, Firewall blocking DCS Ports, Port conflict issue or Network Issues Refer https://ibm.biz/BdXjWX and https://ibm.biz/BdXjWi for more information You can confirm the HA View stability using DCSV8050I in the SystemOut.log file DCSV8050I: DCS Stack DefaultCoreGroup at Member dmgr\dmgr\dmgr: New view installed, identifier (291:0.dmgr\dmgr\dmgr), view size is 10 (AV=10, CD=10, CN=10, DF=40) “Split Brain" (Split View) can occur due to connection/network problem. During this condition multiple views can be formed in a single coregroup. This scenario should be avoided and the problem should be fixed for WLM to spray the request properly Refer https://ibm.biz/BdXjWX WebSphere® Support Technical Exchange 37 IBM Software Group Reference Object Request Broker (ORB) Problem Determination and Best Practices http://www.ibm.com/support/docview.wss?uid=swg27012101 Understanding how EJB calls operate in WebSphere Application Server V6.1 http://www.ibm.com/developerworks/websphere/techjournal/0807_pape/0807_pape.html Ensuring enterprise availability when deploying Enterprise JavaBeans in WebSphere Application Server http://www.ibm.com/developerworks/websphere/techjournal/1109_col_vanrun/1109_col_vanru n.html Best Practices for Large WebSphere Topologies http://www.ibm.com/developerworks/websphere/library/techarticles/0710_largetopologies/0710 _largetopologies.html WebSphere® Support Technical Exchange 38 IBM Software Group Summary What is WLM and Clusters Why and How HA is important for WLM to work properly Importance of nodeagent and ORB Common WLM issues WebSphere® Support Technical Exchange 39 IBM Software Group Connect with us! 1. Get notified on upcoming webcasts Send an e-mail to wsehelp@us.ibm.com with subject line “wste subscribe” to get a list of mailing lists and to subscribe 2. Tell us what you want to learn Send us suggestions for future topics or improvements about our webcasts to wsehelp@us.ibm.com WebSphere® Support Technical Exchange 40 IBM Software Group Questions and Answers WebSphere® Support Technical Exchange 41 IBM Software Group Additional WebSphere Product Resources Learn about upcoming WebSphere Support Technical Exchange webcasts, and access previously recorded presentations at: http://www.ibm.com/software/websphere/support/supp_tech.html Discover the latest trends in WebSphere Technology and implementation, participate in technically-focused briefings, webcasts and podcasts at: http://www.ibm.com/developerworks/websphere/community/ Join the Global WebSphere Community: http://www.websphereusergroup.org Access key product show-me demos and tutorials by visiting IBM Education Assistant: http://www.ibm.com/software/info/education/assistant View a webcast replay with step-by-step instructions for using the Service Request (SR) tool for submitting problems electronically: http://www.ibm.com/software/websphere/support/d2w.html Sign up to receive weekly technical My Notifications emails: http://www.ibm.com/software/support/einfo.html WebSphere® Support Technical Exchange 42