03/05/2023, 12:07 Document 2772938.1 Copyright (c) 2023, Oracle. All rights reserved. Oracle Confidential. OLVM: DataCenter Non-Responsive with Error "Cannot find master domain" (Doc ID 2772938.1) In this Document Symptoms Cause Solution References APPLIES TO: Linux OS - Version Oracle Linux 7.9 with Unbreakable Enterprise Kernel [5.4.17] and later Linux x86-64 SYMPTOMS All hosts in the DC become non-operational: 2021-04-27 22:19:47,441-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE VDSM xxx1 command ConnectStoragePoolVDS failed: Cannot find master domain: u'spUUID=7c7903b6-c199-4ef1-97fb 2021-04-27 22:22:39,857-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE VDSM xxx2 command ConnectStoragePoolVDS failed: Cannot find master domain: u'spUUID=7c7903b6-c199-4ef1-97fb 2021-04-27 22:22:40,406-04 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] ( ConnectStoragePoolVDSCommandParameters:{hostId='e3768de0-0baa-4576-8ff7-afbcee94f605', vdsId='e3768de0-0baa execution failed: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Cannot find master do 2021-04-27 22:22:41,057-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE VDSM xxx3 command ConnectStoragePoolVDS failed: Cannot find master domain: u'spUUID=7c7903b6-c199-4ef1-97fb DataCenter becomes non reponsive: 2021-04-27 16:17:35,146-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-M SYSTEM_CHANGE_STORAGE_POOL_STATUS_PROBLEMATIC_WITH_ERROR(987), Invalid status on Data Center xxx. Setting Da By checking system logs from all hosts, they report path offline and I/O errors with the master domain: Apr Apr Apr Apr Apr Apr Apr Apr Apr ... 27 27 27 27 27 27 27 27 27 15:20:04 15:20:09 15:20:09 15:20:09 15:20:09 15:20:09 15:20:09 15:20:09 15:20:09 host1 host1 host1 host1 host1 host1 host1 host1 host1 multipathd: 3600144f0d329c44b00005ee2686e0003: sdj - path offline multipathd: 3600144f0d329c44b00005ee2686e0003: sdj - path offline kernel: blk_update_request: I/O error, dev dm-8, sector 1124478346 op 0x1:(WRITE) flag kernel: blk_update_request: I/O error, dev dm-8, sector 104762199904 op 0x1:(WRITE) fl kernel: blk_update_request: I/O error, dev dm-8, sector 12758732825 op 0x1:(WRITE) fla kernel: blk_update_request: I/O error, dev dm-8, sector 26579493016 op 0x1:(WRITE) fla kernel: blk_update_request: I/O error, dev dm-8, sector 11248957761 op 0x1:(WRITE) fla kernel: blk_update_request: I/O error, dev dm-8, sector 9314783624 op 0x0:(READ) flag multipathd: 3600144f0d329c44b00005ee2686e0003: Disable queueing Apr Apr Apr Apr Apr Apr Apr Apr 27 27 27 27 27 27 27 27 22:58:48 22:59:05 22:59:07 22:59:08 22:59:10 22:59:15 22:59:17 22:59:20 host2 host2 host2 host2 host2 host2 host2 host2 kernel: blk_update_request: I/O error, dev dm-8, sector 0 op 0x0:(READ) flags 0x0 phy multipathd: 3600144f0d329c44b00005ee2686e0003: sdj - path offline kernel: blk_update_request: I/O error, dev dm-8, sector 264192 op 0x0:(READ) flags 0x0 kernel: blk_update_request: I/O error, dev dm-8, sector 0 op 0x0:(READ) flags 0x0 phy multipathd: 3600144f0d329c44b00005ee2686e0003: sdj - path offline multipathd: 3600144f0d329c44b00005ee2686e0003: sdj - path offline kernel: blk_update_request: I/O error, dev dm-8, sector 264192 op 0x0:(READ) flags 0x0 multipathd: 3600144f0d329c44b00005ee2686e0003: sdj - path offline Apr 27 23:02:48 host3 kernel: blk_update_request: I/O error, dev dm-8, sector 0 op 0x0:(READ) flags 0x0 phy Apr 27 23:02:48 host3 vdsm[5068]: ERROR Unhandled exception in <Task discardable <UpdateVolumes vm=464 2Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, 91, in __call__#012 self._callable()#012 File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line riodic.py", line 357, in _execute#012 self._vm.updateDriveVolume(drive)#012 File "/usr/lib/python2.7/site-pa /lib/python2.7/site-packages/vdsm/virt/vm.py", line 6101, in _getVolumeSize#012 (domainID, volumeID))#012Sto lume 0281c278-7834-48a0-90b7-110a384b1561 https://support.oracle.com/epmos/faces/DocumentDisplay?_adf.ctrl-state=toiftfxm2_4&id=2772938.1 1/2 03/05/2023, 12:07 Document 2772938.1 .... Apr 27 23:02:48 host3 kernel: blk_update_request: I/O error, dev dm-8, sector 0 op 0x0:(READ) flags 0x0 phy CAUSE The I/O errors indicate the storage end issue which makes VDSM the victim. SOLUTION Please engage the storage team to check from storage logs to see if there are any following issues from the storage end: - Disk issue/faulty - Network connection problems(Switch issue, bad cables, lot iSCSI target etc) REFERENCES NOTE:2727849.1 - OLVM: Frequent VM Paused With Error "unknown storage error" Didn't find what you are looking for? https://support.oracle.com/epmos/faces/DocumentDisplay?_adf.ctrl-state=toiftfxm2_4&id=2772938.1 2/2