Recovering a lost partition table with a VMFS datastore

Fonte: http://vinfrastructure.it/2013/01/recovering-a-lost-partition-table-with-a-vmfs-datastore/

This post is also available in: Italian

There are different reason where you can you loose or corrupt your partition table of your VMFS volumes: resignature from another system (for example the backup server, if connected in SAN mode), a human mistake (datastore / delete), or maybe some storage issues.

In this case usually the VMFS partition is still there and also the related data: you have “only” to rebuild the right partition table. But this could really simple if you follow the recommended practice to have only one VMFS partition on each disk. In this case you have only to build the configuration for a single partition (with VMFS type) starting from the right offset (VMFS partitions are aligned) and ending on the latest block.

But you may need a different tool depending the type of partition table:

    • For legacy MBR disk you have to use fdisk command and is quite simple (especially if you know it for Linux)
    • For new GPT disk you have to use partedUtil that is little more tricky and complicated.

Starting from ESXi 5.0 the new GPT format is used by default on all new disks, but if you have a previous VMFS3 this will not necessary be converted to GPT, also if you upgrade it to VMFS5 (for more information see this post). Only when you extend it to more than 2 TB it will be converted to GPT (and of course must be VMFS5 before the extension).

So on ESX/ESXi to version 4.1 you are sure to use MBR disk, on ESXi from version 5.0 you have to check your disk type (you can simple use fdisk -l command).

Recover a lost VMFS partition on a MBR disk

You can use the KB 1002281 (Recovering a lost partition table on a VMFS volume) that explain the required steps:

    • Log in to the ESX host service console. For ESXi, see Tech Support Mode for Emergency Support (1003677)
    • Run the command:fdisk -l
    • Identify the affected disk (you will notice that the partition table is missing on it) and take note of the name (something starting with /dev/disks/…)
    • Start fdisk with this command and press Enter:fdisk -u /dev/disks/…
    • Create the partition:
        1. Press n and press Enter to create a new partition.
        2. Press p and press Enter to select that this is a primary partition.
        3. Press 1 and press Enter to make the first partition.
        4. Press 128 and Enter to align the partition to sector 128.
        5. Press Enter again to retain the default value.
        6. Change the partition to type fb (VMFS):
            1. Press t and press Enter. Partition 1 is automatically selected.
            2. Enter fb and press Enter.
    • Press w and press Enter to save.
    • Run vmkfstools -V and press Enter to discover the VMFS.

Recover a lost VMFS partition on a GPT disk

The partedUtil is a not interective command so could be a little more complicated. For more information on this command you can see the KB 1036609(Using the partedUtil command line utility on ESXi/ESX).

In order to recover your partition, first you need some information that you can obtain still with fdisk -l

You can notice that the second disk is empty and from this screen you can define both the name of the disk and also the usable sector. To gain this second information you can also query the disk with the partedUtil getUsableSectors command:

Those number are really important, because to know the usable last sector you have to make the difference between those munbers (in the example 20971520 – 34 = 20971486).

Now you can use partedUtil setptbl to create your partition and the syntax is:

partedUtil setptbl diskName label "partNum startSector endSector type/guid attr"

The diskName is the one from the fdisk -l output, and the label is quite simple because is just gpt. The complex part is the list of partitions are specified as quoted strings, each of which encapsulates a 5-tuple composed of the partition number, starting sector, ending sector, type ID, and attributes:

    • The startSectoris usually 2048 (the aligment used in VMFS-5).Note: Volumes that are upgraded from VMFS-3 to VMFS-5 continue to have the VMFS partition starting at sector 128, rather than at sector 2048.
    • The endSector is calculated from the difference of the two values that you get frompartedUtil getUsableSectors.Note: the first number is usually 34
    • The partition type identifies the purpose of a partition, and may be represented by either a a decimal identifier (for example, 251) or a UUID (for VMFS, AA31E02A400F11DB9590000C2911D1B8). Partitions created on ESXi 5.0 and higher with the gpt disklabel must be specified using the GUID.
    • The partition attribute is a number which identifies properties of the partition. A common attribute is 128 = 0x80, which indicates that the partition is bootable. Otherwise, most partitions have an attribute value of 0.

So in the previous example, the command to rebuild the missing partition (note that is just for this example: don’t use as-is on your running system) is:

partedUtil setptbl /dev/disks/mpx.vmhba1:C0:T1:L0 gpt "1 2048 20971486 AA31E02A400F11DB9590000C2911D1B8 0"

Caution: There is no facility to undo a partition table change other than creating a new partition table. Ensure that you have a backup before marking any change. Ensure that there is no active I/O to a partition prior to modifying it.

Andrea Mauro

About Andrea Mauro (2029 Posts)

Virtualization & Cloud Architect. VMUG IT Co-Founder and board member. VMware VMTN Moderator and vExpert (2010, 2011, 2012, 2013, 2014, 2015). PernixPro 2014. Dell TechCenter Rockstar 2014. MVP 2014. Several certifications including: VCDX-DCV, VCP-DCV/DT/Cloud, VCAP-DCA/DCD/CIA/CID/DTA/DTD, MCSA, MCSE, MCITP, CCA, NPP.

Erro real com Reparo

Por Silvio Garbes

Após vários testes fiz algumas alterações nos blocos do HD e recuperei a partição da LUN.

Referência: http://vinfrastructure.it/en/2013/01/recovering-a-lost-partition-table-with-a-vmfs-volume/

Resolução:

26 de novembro de 2013

Alguns comandos utilizados.

Listar partições: fdisk -l

Tamanho do disco: partedUtil getUsableSectors /dev/disks/naa.600601605f601c00401e2a838d0ee311

Comando de reparação: partedUtil setptbl /dev/disks/naa.600601605f601c00401e2a838d0ee311 gpt "1 2048 629145566 AA31E02A400F11DB9590000C2911D1B8 0"

Antes:

Found valid GPT with protective MBR; using GPT

Disk /dev/disks/naa.600601605f601c00401e2a838d0ee311: 629145600 sectors, 600M

Logical sector size: 512

Disk identifier (GUID): 75c2e656-7fac-465e-baac-decbdbf481fc

Partition table holds up to 128 entries

First usable sector is 34, last usable sector is 629145566

Number Start (sector) End (sector) Size Code Name

1 2048 8388641 8190K 0700

2 8390656 16777249 8190K 0700

3 16779264 629145566 583M 0700

Depois:

Found valid GPT with protective MBR; using GPT

Disk /dev/disks/naa.600601605f601c00401e2a838d0ee311: 629145600 sectors, 600M

Logical sector size: 512

Disk identifier (GUID): 75c2e656-7fac-465e-baac-decbdbf481fc

Partition table holds up to 128 entries

First usable sector is 34, last usable sector is 629145566

Number Start (sector) End (sector) Size Code Name

1 2048 629145566 599M 0700

Problema:

sábado (23 de novembro de 2013) passado fizemos uma parada programada nos servidores e quando retornou o vmware não detectou um Datastore. Os logs seguem abaixo. Tentei contactar a Vmware mas o contrato expirou. Iniciamos a contação de renovação hoje. Você conhece esse problema?

2013-11-25T17:17:02.257Z cpu2:5833)<3>ata1.00: bad CDB len=16, scsi_op=0x9e, max=12

2013-11-25T17:17:02.287Z cpu2:5833)<3>ata1.00: bad CDB len=16, scsi_op=0x9e, max=12

2013-11-25T17:17:02.292Z cpu2:5833)LinBlock: LinuxBlockCompleteCommand:857: This message has repeated 1536 times: Command 0x9e (0x41240078bac0) failed H:0x0 D:0x2

2013-11-25T17:17:02.295Z cpu2:5833)Vol3: 692: Couldn't read volume header from control: Not supported

2013-11-25T17:17:02.295Z cpu2:5833)Vol3: 692: Couldn't read volume header from control: Not supported

2013-11-25T17:17:02.295Z cpu2:5833)FSS: 4972: No FS driver claimed device 'control': Not supported

2013-11-25T17:17:02.301Z cpu2:5833)FSS: 4972: No FS driver claimed device 'naa.600601605f601c00401e2a838d0ee311:1': Not supported

2013-11-25T17:17:02.308Z cpu2:5833)FSS: 4972: No FS driver claimed device 'naa.600601605f601c00401e2a838d0ee311:2': Not supported

2013-11-25T17:17:02.324Z cpu3:5833)<3>ata1.00: bad CDB len=16, scsi_op=0x9e, max=12

2013-11-25T17:17:02.327Z cpu3:5833)<3>ata1.00: bad CDB len=16, scsi_op=0x9e, max=12

2013-11-25T17:17:02.330Z cpu3:5833)<3>ata1.00: bad CDB len=16, scsi_op=0x9e, max=12

2013-11-25T17:17:02.333Z cpu3:5833)<3>ata1.00: bad CDB len=16, scsi_op=0x9e, max=12

2013-11-25T17:17:02.337Z cpu3:5833)<3>ata1.00: bad CDB len=16, scsi_op=0x9e, max=12

2013-11-25T17:17:02.340Z cpu3:5833)<3>ata1.00: bad CDB len=16, scsi_op=0x9e, max=12

2013-11-25T17:17:02.345Z cpu3:5833)FSS: 4972: No FS driver claimed device 'mpx.vmhba0:C0:T0:L0': Not supported

2013-11-25T17:17:02.472Z cpu3:5833)VC: 1547: Device rescan time 77 msec (total number of devices 12)

2013-11-25T17:17:02.472Z cpu3:5833)VC: 1550: Filesystem probe time 78 msec (devices probed 8 of 12)

2013-11-25T17:17:02.524Z cpu3:5833)<3>ata1.00: bad CDB len=16, scsi_op=0x9e, max=12

2013-11-25T17:17:02.537Z cpu3:5833)<3>ata1.00: bad CDB len=16, scsi_op=0x9e, max=12

2013-11-25T17:17:02.571Z cpu3:5833)Vol3: 692: Couldn't read volume header from control: Not supported

2013-11-25T17:17:02.571Z cpu3:5833)Vol3: 692: Couldn't read volume header from control: Not supported

2013-11-25T17:17:02.571Z cpu3:5833)FSS: 4972: No FS driver claimed device 'control': Not supported

2013-11-25T17:17:02.577Z cpu3:5833)FSS: 4972: No FS driver claimed device 'naa.600601605f601c00401e2a838d0ee311:1': Not supported

2013-11-25T17:17:02.582Z cpu3:5833)FSS: 4972: No FS driver claimed device 'naa.600601605f601c00401e2a838d0ee311:2': Not supported

2013-11-25T17:17:02.599Z cpu3:5833)<3>ata1.00: bad CDB len=16, scsi_op=0x9e, max=12

2013-11-25T17:17:02.602Z cpu3:5833)<3>ata1.00: bad CDB len=16, scsi_op=0x9e, max=12

2013-11-25T17:17:02.605Z cpu3:5833)<3>ata1.00: bad CDB len=16, scsi_op=0x9e, max=12

2013-11-25T17:17:02.608Z cpu3:5833)<3>ata1.00: bad CDB len=16, scsi_op=0x9e, max=12

2013-11-25T17:17:02.611Z cpu3:5833)<3>ata1.00: bad CDB len=16, scsi_op=0x9e, max=12

2013-11-25T17:17:02.614Z cpu3:5833)<3>ata1.00: bad CDB len=16, scsi_op=0x9e, max=12

2013-11-25T17:17:02.620Z cpu3:5833)FSS: 4972: No FS driver claimed device 'mpx.vmhba0:C0:T0:L0': Not supported

2013-11-25T17:17:02.624Z cpu3:5833)VC: 1547: Device rescan time 21 msec (total number of devices 12)

2013-11-25T17:17:02.624Z cpu3:5833)VC: 1550: Filesystem probe time 79 msec (devices probed 8 of 12)

2013-11-25T17:17:03.336Z cpu7:6850)<3>ata1.00: bad CDB len=16, scsi_op=0x9e, max=12

2013-11-25T17:17:03.339Z cpu7:6850)<3>ata1.00: bad CDB len=16, scsi_op=0x9e, max=12

2013-11-25T17:17:03.378Z cpu6:6850)<3>ata1.00: bad CDB len=16, scsi_op=0x9e, max=12

2013-11-25T17:17:03.381Z cpu6:6850)<3>ata1.00: bad CDB len=16, scsi_op=0x9e, max=12

2013-11-25T17:17:03.384Z cpu7:6850)<3>ata1.00: bad CDB len=16, scsi_op=0x9e, max=12

2013-11-25T17:17:03.387Z cpu7:6850)<3>ata1.00: bad CDB len=16, scsi_op=0x9e, max=12

2013-11-25T17:17:03.390Z cpu7:6850)<3>ata1.00: bad CDB len=16, scsi_op=0x9e, max=12

2013-11-25T17:17:03.394Z cpu7:6850)<3>ata1.00: bad CDB len=16, scsi_op=0x9e, max=12

2013-11-25T17:17:03.399Z cpu7:6850)FSS: 4972: No FS driver claimed device 'mpx.vmhba0:C0:T0:L0': Not supported

2013-11-25T17:17:03.416Z cpu7:6850)FSS: 4972: No FS driver claimed device 'naa.600601605f601c00401e2a838d0ee311:2': Not supported

2013-11-25T17:17:03.416Z cpu7:6850)Vol3: 692: Couldn't read volume header from control: Not supported

2013-11-25T17:17:03.416Z cpu7:6850)Vol3: 692: Couldn't read volume header from control: Not supported

2013-11-25T17:17:03.416Z cpu7:6850)FSS: 4972: No FS driver claimed device 'control': Not supported

2013-11-25T17:17:03.424Z cpu7:6850)FSS: 4972: No FS driver claimed device 'naa.600601605f601c00401e2a838d0ee311:1': Not supported

2013-11-25T17:17:03.442Z cpu7:6850)VC: 1547: Device rescan time 26 msec (total number of devices 12)

2013-11-25T17:17:03.442Z cpu7:6850)VC: 1550: Filesystem probe time 83 msec (devices probed 8 of 12)

2013-11-25T17:19:35.707Z cpu0:4653)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x1a (0x412400800500, 0) to dev "mpx.vmhba0:C0:T0:L0" on path "vmhba0:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

2013-11-25T17:19:35.707Z cpu0:4653)ScsiDeviceIO: 2331: Cmd(0x412400800500) 0x1a, CmdSN 0x4cb1 from world 0 to dev "mpx.vmhba0:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.