Uploaded by 十亩菠萝

VMware Corrupted Snapshot

advertisement
Top Support issues and
how to solve them –
Part II
Darren Burnett
Senior Technical Support Engineer
Unable to connect to the
Service Console
Rebuild Networking
Network Connection Problem.
•
•
•
•
•
Deleting the vSwitch that vSwif0 is connected
Connecting the wrong NICs to vSwitch0
Upgrade issues
Incorrect IP Address
External Network Changes
Rebuild Networking
At this stage you can no longer connect to
your ESX server using VI Client or SSH.
• You can connect to the Service Console
remotely if you have ILO, DRAC an IP KVM or
something similar.
• Otherwise, it‟s time to use some shoe leather
and walk to the server room.
Rebuild Networking
The following procedure will work. However it
is a quick and inelegant way to get your VI
client connected.
Other options include,
• Crossover cable connected to a laptop
• Adding or removing NICs to a vSwitch
Rebuild Networking
Use the esxcfg-vswitch –l command to list all of
you vSwitches
Rebuild Networking
[root@newross root]# esxcfg-vswitch -l
Switch Name Num Ports Used Ports Configured Ports Uplinks
vSwitch0
32
4
32
vmnic0
PortGroup Name
Internal ID VLAN ID Used Ports Uplinks
VM Network
portgroup1 0
0
vmnic0
Service Console portgroup0 0
1
vmnic0
VMkernel
portgroup7 0
1
vmnic0
Switch Name Num Ports Used Ports Configured Ports Uplinks
vSwitch1
64
2
64
vmnic1
PortGroup Name
Internal ID VLAN ID Used Ports Uplinks
vlan100
portgroup10 100
0
vmnic1
install VLAN 310 portgroup6 0
0
vmnic1
Switch Name Num Ports Used Ports Configured Ports Uplinks
vSwitch2
64
2
64
vmnic6
PortGroup Name
Internal ID VLAN ID Used Ports Uplinks
crossovercable portgroup9 0
0
vmnic6
Rebuild Networking
Delete all vSwitches
• esxcfg-vswitch –d vSwitch0
• esxcfg-vswitch –d vSwitch1
• esxcfg-vswitch –d vSwitch2
Rebuild Networking
Create a vSwitch
esxcfg-vswitch -a vSwitch0
Create the Service Console portgroup
esxcfg-vswitch -p "Service Console" vSwitch0
Add a NIC to the vSwitch
esxcfg-vswitch –L vmnic0 vSwitch0
Add a vswif interface and configure
esxcfg-vswif –a vswif0 -p "Service Console" i 10.10.10.3 –n
255.0.0.0
Rebuild Networking
Check if you can connect.
• Use PING both to and from the ESX server
• Try SSH
• Try VI Client
Rebuild Networking
If you still can’t connect.
Use “esxcfg-nics –l” to list available NICs.
Rebuild Networking
[root@newross root]# esxcfg-nics -l
Name PCI
Driver
Link Speed Duplex Description
vmnic0 03:0c.00 e1000
Up 1000Mbps Full Intel
Corporation 82546EB Gigabit Ethernet Controller
(Copper)
vmnic1 03:0c.01 e1000
Down 0Mbps Half Intel
Corporation 82546EB Gigabit Ethernet Controller
(Copper)
vmnic8 07:00.00 tg3
Up 1000Mbps Full
Broadcom Corporation NetXtreme BCM5721 Gigabit
Ethernet
vmnic6 08:00.00 tg3
Down 0Mbps Half Broadcom
Corporation NetXtreme BCM5721 Gigabit Ethernet
Rebuild Networking
Remove and add NICs to vSwitch0
• Remove
 esxcfg-vswitch –U vmnic0 vSwitch0
• Add
 esxcfg-vswitch –L vmnic8 vSwitch0
Rebuild Networking
It might also need a VLAN ID
esxcfg-vswitch –v 101 –p “Service Console”
vSwitch0
Rebuild Networking
To avoid this issue, be careful when configuring
the Service Console virtual NIC or its parent
virtual switch property that can affect the Service
Console virtual NIC connectivity, for example,
the uplink.
If possible, before updating the Service Console
virtual NIC, create another independent working
Service Console NIC so that, in the event the
configuration brings the console NIC down, the
second Service Console NIC is still available to
repair the configuration.
Rebuild Networking
As stated, depending on your environment it may
be easier not to delete all your configurations.
It may just require reassigning a NIC or changing
VLAN ID.
Check the following guide.
http://www.rtfm-ed.eu/docs/vmwdocs/esx3.xvc2.x-serviceconsole-guide.pdf
Network Bond
NICs in a Bond not on the same
Broadcast Domain
Network Bond
NICs in a Bond should be in the same
broadcast domain.
Determine what NICs are in each Bond.
Network Bond
[root@cork root]# esxcfg-info|grep -i -A 2
VirtualSwitchImpl
\==+VirtualSwitchImpl :
|----Name.........................................vSwitch0
|----Uplinks......................................vmnic0
-\==+VirtualSwitchImpl :
|----Name.........................................vSwitch1
|----Uplinks......................................vmnic4
-\==+VirtualSwitchImpl :
|----Name.........................................vSwitch2
|----Uplinks......................................vmnic1,vmnic3
--
Next we need to determine which networks
each NIC is connected.
[root@cork root]# esxcfg-info |grep -i -B 5 hint
\==+PnicImpl :
|----_name.............................................. vmnic3
|----_bus...............................................6
|----_slot..............................................3
|----_function..........................................1
|----Network Hint.......................................0 10.16.157.00/255.255.255.192
-\==+PnicImpl :
|----_name..............................................vmnic0
|----_bus...............................................11
|----_slot..............................................7
|----_function..........................................0
|----Network Hint.......................................0 10.16.156.00/255.255.255.00
-\==+PnicImpl :
|----_name.............................................. vmnic1
|----_bus...............................................12
|----_slot..............................................8
|----_function..........................................0
|----Network Hint.......................................0 10.16.156.00/255.255.255.00
HA and STP
HA and STP
Spanning Tree Protocol
When there is a network change STP can cause a
temporary outage on your Network
HA
An ESX Server will determine that it is isolated after 15
Seconds
Depending on the “Isolation Response” that you have set,
all your VMs may power down
It is therefore worth checking your network to
determine if STP can be configured to reduce
the temporary network outage
Expanding the size of a VMDK
with an existing Snapshot
Expanding VM with a Snapshot
You can NOT expand a VM‟s VMDK file while it
still has snapshots.
e.g.
#ls *
important.vmdk important-000001-delta.vmdk
#vmkfstools –X 20G important.vmdk
Expanding VM with a Snapshot
If you do, you will now have a VM
that won’t boot
Expanding VM with a Snapshot
Tricking ESX into seeing the expanded
VMDK as the original size.
In this example we have a test.vmdk that we
expand from 5GB to 6GB
#vmkfstools -X 6G test.vmdk
Expanding VM with a Snapshot
If we check test.vmdk we see
# Disk DescriptorFile
version=1
CID=3f24a1b3
parentCID=ffffffff
createType="vmfs"
# Extent description
RW 12582912 VMFS "test-flat.vmdk"
# The Disk Data Base
#DDB
ddb.virtualHWVersion = "4"
ddb.geometry.cylinders = "783"
ddb.geometry.heads = "255"
ddb.geometry.sectors = "63"
ddb.adapterType = "buslogic"
Expanding VM with a Snapshot
• Original - RW 10485760 VMFS "test-flat.vmdk“
• New - RW 12582912 VMFS "test-flat.vmdk“
Expanding VM with a Snapshot
If we have no “BACKUPS” how do we get the
original value?
#grep -i rw test-000001.vmdk
RW 10485760 VMFSSPARSE “test-000001-delta.vmdk"
Expanding VM with a Snapshot
We change test.vmdk RW value.
# Disk DescriptorFile
version=1
CID=3f24a1b3
parentCID=ffffffff
createType="vmfs"
# Extent description
RW 10485760 VMFS "test-flat.vmdk"
# The Disk Data Base
#DDB
ddb.virtualHWVersion = "4"
ddb.geometry.cylinders = "783"
ddb.geometry.heads = "255"
ddb.geometry.sectors = "63"
ddb.adapterType = "buslogic"
Expanding VM with a Snapshot
Commit The snapshot(s)
#vmware-cmd /pathtovmx/test.vmx
removesnapshots
Expanding VM with a Snapshot
Grow the VMDK file
#vmware-cmd –X 6GB test.vmdk
Expanding VM with a Snapshot
If needed add a snapshot
#vmware-cmd pathtovmx/test.vmx createsnapshot
<name> <description>
Corrupted .VMSD file
Corrupted Snapshots
In this example we will deal with a corrupt
.VMSD file.
Let‟s first look at a working .VMSD file, separated
into 3 slides for the people at the back
Corrupted Snapshots
snapshot.lastUID = "4"
snapshot.numSnapshots = "3"
snapshot.current = "4"
snapshot0.uid = "2"
snapshot0.filename = "VC1.3to201_standardSnapshot2.vmsn"
snapshot0.displayName = "myfirst"
snapshot0.description = "My first test snapshot"
snapshot0.createTimeHigh = "273684"
snapshot0.createTimeLow = "942632403"
snapshot0.numDisks = "1"
snapshot0.disk0.fileName = "VC1.3to201_standard.vmdk"
snapshot0.disk0.node = "scsi0:0"
Corrupted Snapshots
snapshot.needConsolidate = "FALSE"
snapshot1.uid = "3"
snapshot1.filename = "VC1.3to201_standardSnapshot3.vmsn"
snapshot1.parent = "2"
snapshot1.displayName = "second"
snapshot1.description = "My second test snapshot"
snapshot1.createTimeHigh = "273684"
snapshot1.createTimeLow = "980947483"
snapshot1.numDisks = "1"
snapshot1.disk0.fileName = "VC1.3to201_standard000001.vmdk"
snapshot1.disk0.node = "scsi0:0"
Corrupted Snapshots
snapshot2.uid = "4"
snapshot2.filename = "VC1.3to201_standardSnapshot4.vmsn"
snapshot2.parent = "3"
snapshot2.displayName = "third"
snapshot2.description = "My third test snapshot"
snapshot2.createTimeHigh = "273684"
snapshot2.createTimeLow = "1088942286"
snapshot2.numDisks = "1"
snapshot2.disk0.fileName = "VC1.3to201_standard000002.vmdk"
snapshot2.disk0.node = "scsi0:0"
Corrupted Snapshots
After corruption of the .VMSD file the file now
looks like this.
Corrupted Snapshots
Corrupted Snapshots
However we see that the snapshots still exist
[root@newross VC1.3to201_standard]# ls
VC1.3to201_standard-000001-delta.vmdk VC1.3to201_standardSnapshot4.vmsn
VC1.3to201_standard-000001.vmdk
VC1.3to201_standard.vmdk
VC1.3to201_standard-000002-delta.vmdk VC1.3to201_standard.vmsd
VC1.3to201_standard-000002.vmdk
VC1.3to201_standard.vmx
VC1.3to201_standard-000003-delta.vmdk VC1.3to201_standard.vmxf
VC1.3to201_standard-000003.vmdk
vmware-1.log
VC1.3to201_standard-flat.vmdk
vmware-2.log
VC1.3to201_standard.nvram
vmware-3.log
VC1.3to201_standard-Snapshot2.vmsn vmware.log
VC1.3to201_standard-Snapshot3.vmsn
Corrupted Snapshots
At this stage rename the
.VMSD file to .VMSD.OLD
Corrupted Snapshots
We are going to create a new .VMSD file
What kind of magic is required to build a
new VMSD file?
Corrupted Snapshots
Create another snapshot to automatically
recreate a .VMSD file
#vmware-cmd VC1.3to201_standard.vmx
createsnapshot addedforrecovey "Hope it works"
Corrupted Snapshots
You wont be able to selectively rollback to a
particular snapshot.
You will have to commit them all.
Corrupted Snapshots
Commit the Snapshots
#vmware-cmd VC1.3to201_standard.vmx
removesnapshots
All a bit too easy 
Corrupted Snapshots
Corrupted Snapshot
Corrupted Snapshots
What happens if the last snapshot is corrupt?
This can be caused by the VMFS volume being
full.
Now there is data loss.
We can try limit this to losing only the last changes
since the last snapshot.
Corrupted Snapshots
Move the last delta file to a temp area (or delete).
Corrupted Snapshots
Edit the .VMX file and point to the second last 000xx.vmdk file
Corrupted Snapshots
[root@newross VC1.3to201_standard]# ls
VC1.3to201_standard-000001-delta.vmdk
VC1.3to201_standard-Snapshot4.vmsn
VC1.3to201_standard-000001.vmdk
VC1.3to201_standard.vmdk
VC1.3to201_standard-000002-delta.vmdk
VC1.3to201_standard.vmsd
VC1.3to201_standard-000002.vmdk
VC1.3to201_standard.vmx
VC1.3to201_standard-000003-delta.vmdk
VC1.3to201_standard.vmxf
VC1.3to201_standard-000003.vmdk
vmware-1.log
VC1.3to201_standard-flat.vmdk
vmware-2.log
VC1.3to201_standard.nvram
vmware-3.log
VC1.3to201_standard-Snapshot2.vmsn vmware.log
VC1.3to201_standard-Snapshot3.vmsn
Corrupted Snapshots
scsi0:0.present = "TRUE"
scsi0:0.fileName = " VC1.3to201_standard000003.vmdk"
scsi0:0.present = "TRUE"
scsi0:0.fileName = " VC1.3to201_standard000002.vmdk"
Corrupted Snapshots
When the .VMX has been updated to point to
the second last snapshot.
Commit the snapshots.
#vmware-cmd VC1.3to201_standard.vmx
removesnapshots
Corrupted Snapshots
Examining the Snapshots.
Corrupted Snapshots
The original file will contain something similar.
[root@newross VC1.3to201_standard]# more VC1.3to201_standard.vmdk
# Disk DescriptorFile
version=1
CID=9e6bfa08
parentCID=ffffffff
createType="vmfs"
# Extent description
RW 16777216 VMFS "VC1.3to201_standard-flat.vmdk"
# The Disk Data Base
#DDB
ddb.virtualHWVersion = "4"
ddb.geometry.cylinders = "1044"
ddb.geometry.heads = "255"
ddb.geometry.sectors = "63"
ddb.adapterType = "lsilogic"
ddb.toolsVersion = "7201"
Corrupted Snapshots
A snapshot disk can look similar to this
[root@newross VC1.3to201_standard]# more VC1.3to201_standard000001.vmdk
# Disk DescriptorFile
version=1
CID=9e6bfa08
parentCID=9e6bfa08
createType="vmfsSparse"
parentFileNameHint="VC1.3to201_standard.vmdk"
# Extent description
RW 16777216 VMFSSPARSE "VC1.3to201_standard-000001delta.vmdk"
# The Disk Data Base
#DDB
Corrupted Snapshots
[root@newross VC1.3to201_standard]# more
VC1.3to201_standard-000007.vmdk
# Disk DescriptorFile
version=1
CID=678cf29b
parentCID=9e6bfa08
createType="vmfsSparse"
parentFileNameHint=" VC1.3to201_standard-00006.vmdk "
# Extent description
RW 16777216 VMFSSPARSE "VC1.3to201_standard000007-delta.vmdk"
# The Disk Data Base
#DDB
VMFS Volumes and Extents
Avoiding Issues with Extents
When you add an extent to a VMFS volume
only one ESX server is aware of the change.
It is best practice to rescan VMFS volumes from all
hosts.
Otherwise it is possible to add another extent from
another ESX server and cause issues with the
VMFS volume.
#esxcfg-rescan vmhba1
#esxcfg-rescan vmhba2
#service mgmt-vmware restart
Recover VMFS
How to recover after deleting
a VMFS partition
Recover VMFS
How to recover after deleting a VMFS partition
Why would somebody delete their partition?
Recover VMFS
How to recover after deleting a VMFS partition
Why would somebody delete their partition?




fdisk
dd over the beginning of the disk
Unattended install of Linux with clearpart –all
LUN corruption of partition information
Recover VMFS
What options are available?
• If the VMFS volume is corrupted or formatted
over the recovery procedure is less likely to
work.
• If it is only the partition information then
recreating the partition will most likely bring it
back.
 Here we will use fdisk to recreate the partition
information
Recover VMFS
First we need to identify the correct device
• ESX 2.X
 vmkpcidivy –q vmhba_devs
• ESX 3.X
 esxcfg-vmhbadevs –m
For this example we will assume it is /dev/sdf
Check that there are no partitions by using the
command
• fdisk –l /dev/sdf
Recover VMFS
Checking the Volume Header of a VMFS3 Volume.
What does it look like?
Recover VMFS
f15e2fab000400002c15accef245465e293b00032304a5c50
26c00006300616c69726f695f6e756c316e00000000000
000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000002
00001000000000002c00acce014500002400accec64507
f7449a00002304a5c5016c000000000000000000000000
000000000100040000000000010000000000000000000
000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000
000
Recover VMFS
f15e2fab000400002c15accef245465e293b00032304a5c50
26c00006300616c69726f695f6e756c316e00000000000
000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000002
00001000000000002c00acce014500002400accec64507
f7449a00002304a5c5016c000000000000000000000000
000000000100040000000000010000000000000000000
000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000
000
Recover VMFS
Extracting Volume Header
dd if=/dev/sdf bs=1k count=1 skip=19456
2>/dev/null|od -x -v |awk '{print $2, $3, $4,
$5,$6,$7, $8, $9}'|tr -d " "|tr -d '\n„
magicnumber=${longstring:4:4}${longstring:0:4}
Recover VMFS
Recreate the partition using the following commands
• fdisk /dev/sdf
 n (to create a new partition)
 p (to create a primary partition)
 1 (to create the 1st partition)
 [enter] to keep the default value
 [enter] to keep the default value
 t (to change the type of partition)
 fb (to set the partition as VMFS)
 w (to save)
• vmkfstools -V (to discover the VMFS)
Recover VMFS
If the VMFS still isn‟t present this can be due to the
fact that it needs to be realigned.
To do this use fdisk again
• fdisk /dev/sdf
 x (to move to expert mode)
 b (to change the beginning of the partition)
 128 (to move to the block 128 the beginning of the
partition)
 w (to save)
vmkfstools -V (to discover the vmfs back)
Recover VMFS
What happens if the LUN was formatted?
Do you have a
BACKUP?
Recover VMFS
At this stage any data that you can get back is a
bonus.
If there are still ESX servers with running VMs on
the VMFS volume there are three options
 Run VMware Converter.
Convert the VM doing a V2V conversion.
 Run backup software in the VM.
 Copy the files from the VM.
Recover VMFS
If the VMs are powered down and an ESX server
can still see the LUN.
• Copy all files (VMDK VMX etc) to another LUN
Best Practice
Best Practices
• Backups
• Run vm-support before and after changes. Also
periodically and move tar.gz to another location.
• Change control for your full environment
• If issues are intermittent, record the time and
date when this happens.
Any Questions?
Download