Connnection Cluster/Resundancy TOI

advertisement
Unity Connection 7.0
Cluster/Redundancy
TOI
EDCS – 623130
Ramesh Achuthan
Radha Radhakrishnan
© 2006 Cisco Systems, Inc. All rights reserved.
1
Agenda

Overview

Deployment

Cluster Behavior

Troubleshooting

Upgrading

Future Enhancements
© 2006 Cisco Systems, Inc. All rights reserved.
2
Overview –
Active/Active
CCM
DNS
Http/Imap
Voice Calls
User Interfaces:
• Voice calls
• Web Admin/CPCA
• IMAP
Web
Clients
Dyn. load balancing and failover
with DNS.
- Auto load balancing
and failover
(SCCP/SIP).
Load balancing & Failover:
• Involve external entities
(DNS, CCM etc.)
• PIMG for legacy
integration
SRM
ServM
Roles:
• Primary & secondary
• SRM manages roles
CsMgr
Heart-beat
Remote Write for MbxDb
ServM
SRM
CsMgr
Singletons
Singletons
DB
DB
Runs on top of
CCM Platform Cluster
Replication DB/files
UC-0 Publisher
Normally - Primary Role (active)
© 2006 Cisco Systems, Inc. All rights reserved.
Other...
Other...
UC-1..N Subscriber
Normally - Secondary Role (active)
3
Some Terminology
CCM Platform Cluster: Publisher and Subscriber
 Pub is the first node – fixed at install.
 Other nodes are Subs.
UC Roles: Primary and Secondary
 The singleton processes run only in the primary server.
– Notifier, MTA, SysAgent-tasks and more.
 UnityMbxDb writes are done only through primary server.
 Certain master files: encrypt key, certificates are managed in the
primary. These are replicated to secondary.
 In normal operation - primary will be the Pub in the cluster.
© 2006 Cisco Systems, Inc. All rights reserved.
4
User access
 TUI/VUI - will access servers transparently.
 IMAP/CPCA clients - will access servers transparently.
 Admin:
– Transparent server access - administration at either server.
– Voice ports etc. will need node selection.
 Serviceability:
– Trace/Alarm settings will be common for both servers.
– Service start/stop information will need node selection.
– All singleton processes run on the primary server.
 Licensing:
– Voice ports are server specific – need server specific license.
– User licenses are not server specific – can be put in any server.
 RTMT will have to access each server explicitly.
 Log files will not be replicated.
© 2006 Cisco Systems, Inc. All rights reserved.
5
Deployment
Load balancing & Failover
© 2006 Cisco Systems, Inc. All rights reserved.
6
Installing UC in Cluster
1. Install the first node – Answer yes to the question “Is it the first
node in cluster?”
2. Administer the first node and get it running
3. Adding the Second node in the cluster:
1. Using the Admin GUI, add the secondary node under “System
Settings  Cluster”.
2. Install the second node – Answer no to the question “Is it the
first node in the cluster?”
3. Provide the IP/Hostname of the first node.
Once the second node comes up it will be in the cluster with the first
one.
© 2006 Cisco Systems, Inc. All rights reserved.
7
CCM setup - SCCP
Dynamic load balancing and failover with Hunt-pilot, Huntlist, & Line-Group (CCM 4 and above)
Hunt Pilot
Hunt List
Distribution
Algorithm
- Circular,
most-idle etc.
Line Group
*
UC - 1
DN
UC - 2
© 2006 Cisco Systems, Inc. All rights reserved.
8
CCM setup - SIP
Approaches:
1. With DNS-SRV
 Route-Pattern  Sip-Trunk  DNS-SRV FQDN
2. With Route-List (Simpler)
 Route-Pattern  Route-List  Route-Group Sip-Trunk
 Uses distribution algorithm in Route-Group
1. With Sip Gateway DNS-SRV
 Route-Pattern  Sip-Trunk  Sip-GW  DNS-SRV FQDN
© 2006 Cisco Systems, Inc. All rights reserved.
9
Other Integrations
PIMG

PIMG pings a primary UC and can redirect calls to secondary
when primary fails.
 Load balancing is done at PBX.
 PIMG failures are handled at PBX.
Primary link
UC - A
PIMG
Secondary link
PBX
PIMG
UC - B
© 2006 Cisco Systems, Inc. All rights reserved.
10
IMAP & CPCA Clients
Load balancing and failover transparent to users.
•
DNS – name lookup
•
•
Add A-records in DNS
Users need to re-login after failover.
© 2006 Cisco Systems, Inc. All rights reserved.
11
Cluster Behavior
© 2006 Cisco Systems, Inc. All rights reserved.
12
Cluster State displayed:
 None – means only one node in the cluster
 Normal – means there is more than one node – not failedover
 Failedover – Publisher is not the primary at that time
 Admin changes made to one server should be visible on the other in few
seconds.
 Messages left on one server should be available from the other server in
few seconds.
 TUI/VUI, CPCA, IMAP, & Admin shall not notice any login issues when
one of the servers is down.
 MWI and other notifications shall continue to work when one of the
servers is down.
© 2006 Cisco Systems, Inc. All rights reserved.
13
Manual failover
Admin client
make_primary( B )
Singletons
will be started
here.
Failover
Primary
Node A
© 2006 Cisco Systems, Inc. All rights reserved.
Secondary
Node B
14
Cluster management
© 2006 Cisco Systems, Inc. All rights reserved.
15
Manual failback
Admin client
Singletons
will be started
here.
make_primary( A )
Failback
Acting
Secondary
Node A
© 2006 Cisco Systems, Inc. All rights reserved.
Acting Primary
Node B
16
Manual Deactivate
 Deactivating a server stops all critical services and base services
in it.
 The database replication will continue in the deactivated state.
 Only secondary servers can be deactivated.
 The Administration and the Serviceability GUI are available in the
deactivated state.
 This state is used for maintenance purposes, wherein all calls,
and web user interactions are directed to the other server.
 A deactivated server can be activated back to service (as shown).
© 2006 Cisco Systems, Inc. All rights reserved.
17
Manual activate
© 2006 Cisco Systems, Inc. All rights reserved.
18
Auto failover
ServM
Critical Service
Failure
Failover
Primary
Node A
© 2006 Cisco Systems, Inc. All rights reserved.
Secondary
Node B
19
Acting-Primary failure
Crash
Acting
Secondary
Node A
© 2006 Cisco Systems, Inc. All rights reserved.
Node A tries
to be primary
Acting Primary
Node B
20
CPCA servlet failure and redirection
CPCA
Node A
CPCA
Web
Client
redirect
CPCA
Node B
© 2006 Cisco Systems, Inc. All rights reserved.
21
Tomcat failure and DNS resolution
DNS
name resolution
lookup
T omcat
Node A
Web
Client
T omcat
Node B
© 2006 Cisco Systems, Inc. All rights reserved.
22
Reasons for failover
• Failover can be caused by 30 sec heartbeat failure.
• Failover can be manually initiated also.
• Conditions for auto-failover:
• Critical process cannot be started or fails
•SRM, ServM, DB, DbEventPublisher, CuCsMgr, CuMixer, Notifier etc.
• Too many restarts in some interval
• CuCsMgr - allow single restart, but maybe 3 deaths in 5 or 10 min
exceeds threshold
• Non-critical processes will not cause failover. ServM will restart them on
same box
© 2006 Cisco Systems, Inc. All rights reserved.
23
Failover
Upon Failover (when primary fails)  Any existing calls or IP traffic to primary will likely be lost.
 SRM in secondary will detect the failure and update status in DB.
 SRM in secondary will instruct ServM to start singleton
processes.
 Switch/PIMG/DNS will determine failover condition and route
incoming call traffic to secondary box.
 If using DNS, CPCA/IMAP traffic will be sent to secondary
© 2006 Cisco Systems, Inc. All rights reserved.
24
Two Generals’ Problem
(split-brain)
Cause:
 Unreliable communication link between primary and secondary
 Byzantine failure of SRM
 Secondary thinks primary is dead and assumes “acting-primary”
role, while primary continues its operation
Issues
 DB updates will continue in primary and secondary after failover.
 Solution – Split Brain Resolution (SBR) (done automatically)
© 2006 Cisco Systems, Inc. All rights reserved.
25
Troubleshooting – tip 1
CLI: show cuc cluster status – shows the current status of the cluster.
 Member ID 0 means publisher (i.e., first-node).
 Exactly one server must be in the primary role.
 If both servers are primary, then they are not talking to each other. Check if the
server hostnames are correct and if they can communicate.
© 2006 Cisco Systems, Inc. All rights reserved.
26
Tip 2 – Check certain required services
 Check that these services are running on both servers:
– Server Role Manager,
– Conversation Manager and Mixer,
– File Sync,
– DB Event Publisher
 Check that these services are running in the primary server:
– Notifier and
– Message Transfer Agent
© 2006 Cisco Systems, Inc. All rights reserved.
27
Tip 3: Log files
 Check Server Role Manager (SRM) logs for cluster issues.
– /var/opt/cisco/connection/log/diag_CuSrm_*.uc
 From RTMT select the component “Connection Server Role
Manager” to download the SRM log.
 Logs are not replicated, so it is required to check them on both
servers.
© 2006 Cisco Systems, Inc. All rights reserved.
28
Upgrading a cluster
 Upgrade process is very similar to UC 2.x
 First upgrade the first node (primary) – do not switchover.
 Then upgrade the second node (secondary) – do not switchover
 At a convenient time, switchover the first node.
 Then switchover second node after the first node switchover is
successful.
© 2006 Cisco Systems, Inc. All rights reserved.
29
Future Enhancements
© 2006 Cisco Systems, Inc. All rights reserved.
30
Site Redundancy – Active/Passive
Differences from A/A
• Deployment model:
• No load balancing
DNS
SRV
CCM
Web/
Client
Http/Imap
SIP
Auto failover (SCCP)
No load balancing.
Auto failover with
DNS
No load balancing.
Call/request
delivered only
if primary fails.
Heart-beat
SRM
ServM
ServM
SRM
CsMgr
CsMgr
WAN
Singletons
Singletons
DB
DB
Other...
Other...
Replication DB/files
UC-A (active) Primary
© 2006 Cisco Systems, Inc. All rights reserved.
UC-B (passive) Secondary
31
Multi-server Cluster (N >2)
 Current approach implies a single primary to manage singletons
and UnityMbxDb updates.
 This means 1 primary + N secondary in a N + 1 scenario.
 When failover happens, one of the N secondary servers
assumes acting-primary role based on some pre-defined criteria.
© 2006 Cisco Systems, Inc. All rights reserved.
32
Q&A
© 2006 Cisco Systems, Inc. All rights reserved.
33
© 2006 Cisco Systems, Inc. All rights reserved.
34
Download
Related flashcards
Create Flashcards