132221-CUCM db problem

advertisement
Contents
Publisher restart: ...................................................................................................................................... 1
Subscriber restart:..................................................................................................................................... 2
Publisher CLI:............................................................................................................................................ 3
Subscriber CLI:........................................................................................................................................... 9
System Reports : ..................................................................................................................................... 12
Publisher restart:
admin:utils system restart
Do you really want to restart ?
Enter (yes/no)? yes
Appliance is being Restarted ...
Warning: Restart could take up to 5 minutes.
Shutting down Service Manager. Please wait...
/2012-07-25 15:47:09,869 ERROR [Thread-9] ncs.NcsClient$ReceiveThread - java.net.SocketException:
Connection reset
2012-07-25 15:47:09,870 ERROR [Thread-9] ncs.NcsClient NcsClient:com.cisco.ccm.serviceability.conf.TableChangeSubscriberImpl$MyNcsClient@102799c IOExc :
java.net.ConnectException: Connection refused
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at java.net.Socket.connect(Socket.java:478)
at java.net.Socket.<init>(Socket.java:375)
at java.net.Socket.<init>(Socket.java:189)
at com.cisco.ccm.util.ncs.NcsClient.connect(NcsClient.java:342)
at com.cisco.ccm.util.ncs.NcsClient$ReceiveThread.run(NcsClient.java:447)
\ Service Manager shutting down services... Please Wait
Broadcast message from root (Wed Jul 25 15:47:13 2012):
The system is going down for reboot NOW!
1
Waiting .
Operation succeeded
restart now.
admin:
Connection timed out
Subscriber restart:
admin:utils system restart
Do you really want to restart ?
Enter (yes/no)? yes
Appliance is being Restarted ...
Warning: Restart could take up to 5 minutes.
Shutting down Service Manager. Please wait...
-2012-07-25 15:50:29,286 ERROR [Thread-9] ncs.NcsClient NcsClient:com.cisco.ccm.serviceability.conf.TableChangeSubscriberImpl$MyNcsClient@c44b88 IOExc :
java.net.ConnectException: Connection refused
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at java.net.Socket.connect(Socket.java:478)
at java.net.Socket.<init>(Socket.java:375)
at java.net.Socket.<init>(Socket.java:189)
at com.cisco.ccm.util.ncs.NcsClient.connect(NcsClient.java:342)
at com.cisco.ccm.util.ncs.NcsClient$ReceiveThread.run(NcsClient.java:447)
\
2012-07-25 15:50:59,292 ERROR [Thread-9] ncs.NcsClient NcsClient:com.cisco.ccm.serviceability.conf.TableChangeSubscriberImpl$MyNcsClient@c44b88 IOExc :
java.net.ConnectException: Connection refused
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
2
at java.net.Socket.connect(Socket.java:529)
at java.net.Socket.connect(Socket.java:478)
at java.net.Socket.<init>(Socket.java:375)
at java.net.Socket.<init>(Socket.java:189)
at com.cisco.ccm.util.ncs.NcsClient.connect(NcsClient.java:342)
at com.cisco.ccm.util.ncs.NcsClient$ReceiveThread.run(NcsClient.java:447)
\ Service Manager shutting down services... Please Wait
Broadcast message from root (Wed Jul 25 15:51:13 2012):
The system is going down for reboot NOW!
Waiting .
Operation succeeded
restart now.
admin:2012-07-25 15:51:29,298 ERROR [Thread-9] ncs.NcsClient NcsClient:com.cisco.ccm.serviceability.conf.TableChangeSubscriberImpl$MyNcsClient@c44b88 IOExc :
java.net.ConnectException: Connection refused
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at java.net.Socket.connect(Socket.java:478)
at java.net.Socket.<init>(Socket.java:375)
at java.net.Socket.<init>(Socket.java:189)
at com.cisco.ccm.util.ncs.NcsClient.connect(NcsClient.java:342)
at com.cisco.ccm.util.ncs.NcsClient$ReceiveThread.run(NcsClient.java:447)
Publisher CLI:
admin:utils service list
Requesting service status, please wait...
System SSH [STARTED]
Cluster Manager [STARTED]
Service Manager is running
Getting list of all services
>> Return code = 0
A Cisco DB[STARTED]
A Cisco DB Replicator[STARTED]
Cisco AMC Service[STARTED]
Cisco AXL Web Service[STARTED]
3
Cisco Audit Event Service[STARTED]
Cisco Bulk Provisioning Service[STARTED]
Cisco CAR DB[STARTED]
Cisco CAR Scheduler[STARTED]
Cisco CAR Web Service[STARTED]
Cisco CDP[STARTED]
Cisco CDP Agent[STARTED]
Cisco CDR Agent[STARTED]
Cisco CDR Repository Manager[STARTED]
Cisco CTIManager[STARTED]
Cisco CTL Provider[STARTED]
Cisco CallManager[STARTED]
Cisco CallManager Admin[STARTED]
Cisco CallManager Cisco IP Phone Services[STARTED]
Cisco CallManager Personal Directory[STARTED]
Cisco CallManager SNMP Service[STARTED]
Cisco CallManager Serviceability[STARTED]
Cisco CallManager Serviceability RTMT[STARTED]
Cisco Certificate Authority Proxy Function[STARTED]
Cisco Certificate Change Notification[STARTED]
Cisco Certificate Expiry Monitor[STARTED]
Cisco Change Credential Application[STARTED]
Cisco DRF Local[STARTED]
Cisco DRF Master[STARTED]
Cisco Database Layer Monitor[STARTED]
Cisco Dialed Number Analyzer[STARTED]
Cisco Dialed Number Analyzer Server[STARTED]
Cisco DirSync[STARTED]
Cisco Extended Functions[STARTED]
Cisco Extension Mobility[STARTED]
Cisco Extension Mobility Application[STARTED]
Cisco IP Manager Assistant[STARTED]
Cisco IP Voice Media Streaming App[STARTED]
Cisco License Manager[STARTED]
Cisco Log Partition Monitoring Tool[STARTED]
Cisco RIS Data Collector[STARTED]
Cisco RTMT Reporter Servlet[STARTED]
Cisco SOAP - CDRonDemand Service[STARTED]
Cisco SOAP - CallRecord Service[STARTED]
Cisco Serviceability Reporter[STARTED]
Cisco Syslog Agent[STARTED]
Cisco TAPS Service[STARTED]
Cisco Tftp[STARTED]
Cisco Tomcat[STARTED]
Cisco Tomcat Stats Servlet[STARTED]
Cisco Trace Collection Service[STARTED]
Cisco Trace Collection Servlet[STARTED]
Cisco Trust Verification Service[STARTED]
4
Cisco UXL Web Service[STARTED]
Cisco Unified Mobile Voice Access Service[STARTED]
Cisco WebDialer Web Service[STARTED]
Host Resources Agent[STARTED]
MIB2 Agent[STARTED]
Native Agent Adapter[STARTED]
Platform SOAP Services[STARTED]
SNMP Master Agent[STARTED]
SOAP - Diagnostic Portal Database Service[STARTED]
SOAP -Log Collection APIs[STARTED]
SOAP -Performance Monitoring APIs[STARTED]
SOAP -Real-Time Service APIs[STARTED]
System Application Agent[STARTED]
Cisco DHCP Monitor Service[STOPPED] Service Not Activated
Cisco Messaging Interface[STOPPED] Service Not Activated
Cisco User Data Services[STOPPED] Service Not Activated
Primary Node =true
admin:show tech network hosts
-------------------- show platform network -------------------/etc/hosts File:
#This file was generated by the /etc/hosts cluster manager.
#It is automatically updated as nodes are added, changed, removed from the cluster.
127.0.0.1 localhost
::1 localhost
10.1.157.6 CMS
10.1.55.5 CMP
admin:utils diagnose module validate_network
Log file: platform/log/diag5.log
Starting diagnostic test(s)
===========================
test - validate_network : Passed
Diagnostics Completed
admin:utils diagnose module validate_network
Log file: platform/log/diag1.log
Starting diagnostic test(s)
===========================
test - validate_network : Passed
Diagnostics Completed
5
admin:show network cluster
10.1.157.6 cms Subscriber not authenticated - INITIATOR since Wed Jul 25 15:51:16 2012
10.1.55.5 cmp Publisher authenticated
admin:utils dbreplication runtimestate
DB and Replication Services: ALL RUNNING
Cluster Replication State: REPLICATION RESET cms Started at 2012-07-20-13-17
DB Version: ccm8_6_1_20000_1
Number of replicated tables: 541
Cluster Detailed View from PUB (2 Servers):
PING
REPLICATION REPL. DBver& REPL. REPLICATION SETUP
SERVER-NAME IP ADDRESS (msec) RPC? STATUS
QUEUE TABLES LOOP? (RTMT) & details
----------- ------------ ------ ---- ----------- ----- ------- ----- ----------------CMP 10.1.55.5
0.030 Yes Connected
0
match N/A (0) PUB
CMS 10.1.157.6 1.32 No Off-Line
N/A ?
No (?) N/A
admin:utils dbreplication stop
*************************************************************************************
*******
This command will delete the marker file(s) so that automatic replication setup is stopped
It will also stop any replication setup currently executing
*************************************************************************************
*******
Deleted the marker file, auto replication setup is stopped
Service Manager is running
Commanded Out of Service
A Cisco DB Replicator[NOTRUNNING]
Service Manager is running
A Cisco DB Replicator[STARTED]
Completed replication process cleanup
Please run the command 'utils dbreplication runtimestate' and make sure all nodes are
RPC reachable before a replication reset is executed
admin:utils dbreplication runtimestate
DB and Replication Services: ALL RUNNING
Cluster Replication State: REPLICATION RESET cms Started at 2012-07-20-13-17
DB Version: ccm8_6_1_20000_1
Number of replicated tables: 541
6
Cluster Detailed View from PUB (2 Servers):
PING
REPLICATION REPL. DBver& REPL. REPLICATION SETUP
SERVER-NAME IP ADDRESS (msec) RPC? STATUS
QUEUE TABLES LOOP? (RTMT) & details
----------- ------------ ------ ---- ----------- ----- ------- ----- ----------------CMP 10.1.55.5
0.030 Yes Connected
0
match N/A (0) PUB
CMS 10.1.157.6 1.29 No Off-Line
N/A ?
No (?) N/A
admin:utils dbreplication clusterreset
*************************************************************************************
*******
This command will repair replication on all nodes in the cluster
Before running, execute dbreplication stop <server> on all subscribers
then execute dbreplication stop on the publisher
After clusterreset, "utils dbreplication reset all" should be executed
followed by the reboot of all subscribers
*************************************************************************************
*******
This command can take considerable amount of time, and will tear down replication and build it back
again.
Are you sure you want to continue? (y/n):y
Repairing of replication is in progress.
You may tail /cm/trace/dbl/sdi/clusterReset_20120725163734.out to observe progress
admin:file view activelog /cm/trace/dbl/sdi/clusterReset_20120725163734.out
CLI execution started at 16:37:37
publisher is
To run this CLI...
You must run a utils dbreplication stop
on all the servers in the cluster before continuing.
You have 10 seconds to hit Ctrl-C to abort.
Output is in the following files:
/tmp/deftemp.out
7
/tmp/cdrcheckrepair.out
/tmp/sub_cdrlist
options: q=quit, n=next, p=prev, b=begin, e=end (lines 1 - 20 of 73) :
/tmp/pub_cdrlist
16:37:47 Setting the publisher
16:37:47 Publisher is set to cmp_ccm8_6_1_20000_1
16:37:47 Sublist: cms_ccm8_6_1_20000_1
16:37:47 Setting the database
16:37:47 Database is set to ccm8_6_1_20000_1
***********************
Wed Jul 25 16:37:47 EEST 2012
16:37:47 Starting Listing Servers
16:37:47 Checking status of sub cms_ccm8_6_1_20000_1
16:37:54 Done Deleting Subscribers
***********************
Wed Jul 25 16:37:54 EEST 2012
16:37:54 Starting Listing Servers
options: q=quit, n=next, p=prev, b=begin, e=end (lines 21 - 40 of 73) :
16:37:54 Listing the publisher g_cmp_ccm8_6_1_20000_1
16:37:54 Deleting server g_cmp_ccm8_6_1_20000_1 from the pub.
16:38:12 Done Deleting Publisher
16:38:12 ***********************
Wed Jul 25 16:38:12 EEST 2012
16:38:12 Defining the Publisher
16:38:12 Defining the pub g_cmp_ccm8_6_1_20000_1.
16:38:15 Done Defining the Publisher
***********************
Wed Jul 25 16:38:15 EEST 2012
16:38:15 Defining Subscribers
16:38:15 Linecount is set to 1
16:38:15 Defining Subscriber cms_ccm8_6_1_20000_1
connect to cms_ccm8_6_1_20000_1 failed
Attempt to connect to database server (cms_ccm8_6_1_20000_1) failed.
(-908)
command failed -- unable to connect to server specified (5)
options: q=quit, n=next, p=prev, b=begin, e=end (lines 41 - 60 of 73) :
8
16:38:22 Deleting server g_cms_ccm8_6_1_20000_1 from the sub.
connect to g_cms_ccm8_6_1_20000_1 failed
Attempt to connect to database server (g_cms_ccm8_6_1_20000_1) failed.
(-908)
command failed -- unable to connect to server specified (5)
16:38:29 Deleting server g_cms_ccm8_6_1_20000_1 from the pub.
command failed -- undefined server (37)
16:38:29 The Sub g_cms_ccm8_6_1_20000_1 can't be defined.
16:38:29 Please analyze the logs in cm/trace/dbl and fix accordingly.
16:38:29 This may be due to a corrupt admin database. If so, execute utils dbreplication dropadmindb
16:38:29 on the node which has indicated failure.
16:38:29 Exiting with errors.
end of the file reached
options: q=quit, n=next, p=prev, b=begin, e=end (lines 61 - 73 of 73) :
admin:
Subscriber CLI:
admin:utils service list
Requesting service status, please wait...
System SSH [STARTED]
Cluster Manager [STARTED]
Service Manager is running
Getting list of all services
>> Return code = 0
A Cisco DB[STARTED]
A Cisco DB Replicator[STARTED]
Cisco AMC Service[STARTED]
Cisco AXL Web Service[STARTED]
Cisco Audit Event Service[STARTED]
Cisco CAR DB[STOPPED] Commanded Out of Service
Cisco CAR Scheduler[STOPPED] Commanded Out of Service
Cisco CDP[STARTED]
Cisco CDP Agent[STARTED]
Cisco CDR Agent[STARTED]
Cisco CDR Repository Manager[STOPPED] Commanded Out of Service
Cisco CTIManager[STARTED]
Cisco CTL Provider[STARTED]
Cisco CallManager[STARTED]
Cisco CallManager Admin[STARTED]
Cisco CallManager Cisco IP Phone Services[STARTED]
9
Cisco CallManager Personal Directory[STARTED]
Cisco CallManager SNMP Service[STARTED]
Cisco CallManager Serviceability[STARTED]
Cisco CallManager Serviceability RTMT[STARTED]
Cisco Certificate Change Notification[STARTED]
Cisco Certificate Expiry Monitor[STARTED]
Cisco Change Credential Application[STARTED]
Cisco DRF Local[STARTED]
Cisco DRF Master[STOPPED] Commanded Out of Service
Cisco Database Layer Monitor[STARTED]
Cisco Dialed Number Analyzer[STARTED]
Cisco Extended Functions[STARTED]
Cisco Extension Mobility[STARTED]
Cisco Extension Mobility Application[STARTED]
Cisco IP Manager Assistant[STARTED]
Cisco IP Voice Media Streaming App[STARTED]
Cisco License Manager[STARTED]
Cisco Log Partition Monitoring Tool[STARTED]
Cisco RIS Data Collector[STARTED]
Cisco RTMT Reporter Servlet[STARTED]
Cisco SOAP - CallRecord Service[STOPPED] Commanded Out of Service
Cisco Serviceability Reporter[STARTED]
Cisco Syslog Agent[STARTED]
Cisco Tftp[STARTED]
Cisco Tomcat[STARTED]
Cisco Tomcat Stats Servlet[STARTED]
Cisco Trace Collection Service[STARTED]
Cisco Trace Collection Servlet[STARTED]
Cisco Trust Verification Service[STARTED]
Cisco UXL Web Service[STARTED]
Cisco WebDialer Web Service[STARTED]
Host Resources Agent[STARTED]
MIB2 Agent[STARTED]
Native Agent Adapter[STARTED]
Platform SOAP Services[STARTED]
SNMP Master Agent[STARTED]
SOAP - Diagnostic Portal Database Service[STARTED]
SOAP -Log Collection APIs[STARTED]
SOAP -Performance Monitoring APIs[STARTED]
SOAP -Real-Time Service APIs[STARTED]
System Application Agent[STARTED]
Cisco Bulk Provisioning Service[STOPPED] Service Not Activated
Cisco CAR Web Service[STOPPED] Service Not Activated
Cisco Certificate Authority Proxy Function[STOPPED] Service Not Activated
Cisco DHCP Monitor Service[STOPPED] Service Not Activated
Cisco Dialed Number Analyzer Server[STOPPED] Service Not Activated
Cisco DirSync[STOPPED] Service Not Activated
Cisco Messaging Interface[STOPPED] Service Not Activated
10
Cisco SOAP - CDRonDemand Service[STOPPED] Service Not Activated
Cisco TAPS Service[STOPPED] Service Not Activated
Cisco Unified Mobile Voice Access Service[STOPPED] Service Not Activated
Cisco User Data Services[STOPPED] Service Not Activated
Primary Node =false
admin:
admin:show tech network hosts
-------------------- show platform network -------------------/etc/hosts File:
#This file was generated by the /etc/hosts cluster manager.
#It is automatically updated as nodes are added, changed, removed from the cluster.
127.0.0.1 localhost
::1 localhost
10.1.55.5 CMP
10.1.157.6 CMS
admin:utils diagnose module validate_network
Log file: platform/log/diag2.log
Starting diagnostic test(s)
===========================
test - validate_network : Passed
Diagnostics Completed
admin:show network cluster
10.1.55.5 cmp Publisher not authenticated - INITIATOR since Wed Jul 25 15:47:18 2012
10.1.157.6 cms Subscriber authenticated
admin:utils dbreplication runtimestate
DB and Replication Services: ALL RUNNING
Cluster Replication State: Only available on the PUB
DB Version: ccm8_6_1_20000_1
Number of replicated tables: 0
Cluster Detailed View from SUB (2 Servers):
PING
REPLICATION REPL. DBver& REPL. REPLICATION SETUP
SERVER-NAME IP ADDRESS (msec) RPC? STATUS
QUEUE TABLES LOOP? (RTMT)
----------- ------------ ------ ---- ----------- ----- ------- ----- ----------------CMP
10.1.55.5
1.35
No Off-Line
N/A ?
No (?)
CMS
10.1.157.6 0.048
Yes Off-Line
N/A match Yes (0)
11
admin:utils dbreplication stop
*************************************************************************************
*******
This command will delete the marker file(s) so that automatic replication setup is stopped
It will also stop any replication setup currently executing
*************************************************************************************
*******
Deleted the marker file, auto replication setup is stopped
Service Manager is running
Commanded Out of Service
A Cisco DB Replicator[NOTRUNNING]
Service Manager is running
A Cisco DB Replicator[STARTED]
Completed replication process cleanup
Please run the command 'utils dbreplication runtimestate' and make sure all nodes are
RPC reachable before a replication reset is executed
admin:utils dbreplication runtimestate
System Reports :
Unified CM Database Access
Local and publisher databases accessible.
View Details
Server Publisher DB Reachable Local DB Reachable
true
10.1.55.5 true
true
10.1.157.6 true
Unified CM Database Status
RTMT Counter Information
Connection to RTMT on 10.1.157.6 could not be established
All servers have a replication count of 541.
Not all servers have a good replication status. See the details.
View Details
Server
Number of Replicates Created
Replicate_State
12
10.1.55.5 541
3 - bad
10.1.157.6 N/A
N/A
See also Database Summary Screen in
RTMT.
Run CLI command (show tech dbstateinfo) for more detail.
Replication Server List (cdr list serv) from every server for debugging purposes only.
View Details
Server
cdr list serv
10.1.55.5
SERVER
ID STATE
STATUS
QUEUE CONNECTION
CHANGED
---------------------------------------------------------------------g_cmp_ccm8_6_1_20000_1
2 Active
Local
0
10.1.157.
6
Replication Server Template (cdr list
template) from every server for debugging
purposes only.
View Details
Database Prefs File
View Details
Unified CM Hosts
All servers have equivalent host files
View Details
Server
Host Information
#This file was generated by the /etc/hosts cluster manager.
#It is automatically updated as nodes are added, changed, removed
from the cluster.
10.1.55.5 127.0.0.1 localhost
::1 localhost
10.1.55.5 CMP
10.1.157.6 CMS
#This file was generated by the /etc/hosts cluster manager.
#It is automatically updated as nodes are added, changed, removed
from the cluster.
10.1.157.6 127.0.0.1 localhost
::1 localhost
10.1.55.5 CMP
10.1.157.6 CMS
Unified CM Rhosts
All servers have equivalent rhosts files.
13
View Details
Server rhosts File
localhost
10.1.55.5 CMS
10.1.157.6
CMP
localhost
CMP
CMS
Unified CM Sqlhosts
All servers have equivalent sqlhosts files.
View Details
Server
10.1.55.5
10.1.157.6
sqlhosts File
g_hdr
group
i=1
g_cmp_ccm8_6_1_20000_1 group
i=2
cmp_ccm8_6_1_20000_1
onsoctcp
10.1.55.5
cmp_ccm8_6_1_20000_1
g=g_cmp_ccm8_6_1_20000_1
b=32767,rto=300
g_cms_ccm8_6_1_20000_1 group
i=3
cms_ccm8_6_1_20000_1
onsoctcp
10.1.157.6
cms_ccm8_6_1_20000_1
g=g_cms_ccm8_6_1_20000_1
b=32767,rto=300
###NOTE: Need to use ipv4 address in host column of sqlhosts file
and not hostname
cmp_car8_6_1_20000_1
onsoctcp
10.1.55.5
cmp_car8_6_1_20000_1
b=32767
g_hdr
group
i=1
g_cmp_ccm8_6_1_20000_1 group
i=2
cmp_ccm8_6_1_20000_1
onsoctcp
10.1.55.5
cmp_ccm8_6_1_20000_1
g=g_cmp_ccm8_6_1_20000_1
b=32767,rto=300
g_cms_ccm8_6_1_20000_1 group
i=3
cms_ccm8_6_1_20000_1
onsoctcp
10.1.157.6
cms_ccm8_6_1_20000_1
g=g_cms_ccm8_6_1_20000_1
b=32767,rto=300
14
Download
Study collections