Creating a New ASM Disk Header After Existing One Is Corrupted

现在的位置: 首页 > 综合 > 正文

Creating a New ASM Disk Header After Existing One Is Corrupted

2013年10月09日 ⁄ 综合 ⁄ 共 5405字 ⁄ 字号小中大 ⁄ 评论关闭

	Doc ID:	Note:417687.1	Type:	BULLETIN
Last Revision Date:	12-FEB-2008	Status:	PUBLISHED

In
this Document

Purpose

Scope and Application
***INTERNAL
ONLY*** Creating a New ASM Disk Header After Existing One IsCorrupted

@
Oracle Confidential (INTERNAL). Do not distribute tocustomers

@ Reason: This hasthe potential to permanantly destroy a diskgroup if not done right.

@
(AuthWiz 2.5.3)

@ Click here to edit in wizard.

Applies to:

OracleServer – Enterprise Edition – Version: 10.2.0.2 to 10.2.0.4

Information in this document applies to any platform.

10.2 ASM External Redundancy Diskgroup

Purpose

Thepurpose of this bulletin is to provide steps to create a new ASM disk header ifthe existing one is corrupt.

Scope andApplication

Thisdocumented is INTERNAL ONLY and is intended for individuals with extensive ASMexperience and source code access. This document should only be used as aLAST RESORT if a diskgroup with external redundancy cannot mount due to acorrupt header. It is a much
better option to re-createthe diskgroup and restore the data via RMAN. This note should only beused if there are extreme circumstances where it would be nearly impossible torestore (corrupt backup or several TB database). It should also be notedthat this
note only addresses the first 4k of the disk (the header) and ifthere are problems with the data itself this note cannot fix or address that.

INTERNALONLY Creating a New ASM Disk Header After Existing One Is Corrupted

Makesure you have the correct kfed version in place:

-If you are on 10.2.0.2, apply the patch for Bug
5039964.

- If you are on 10.2.0.3 or above, continue on…

For this test we have 3 ASM disks in an external redundancy diskgroup. For the

test we will wipe out the header for ASM disk 3 (data03):

/ocfs02/asm/data01

/ocfs02/asm/data02

/ocfs02/asm/data03

1.Make sure all ASM instances are shut down.

2.Make a backup of the first 4k of the bad disk with dd:

dd if=<bad disk> of=<file> bs=4096 count=1

3.Check existing disks and see which one has “file 1 block 1″:

Tofind the disk with f1b1 run:

kfedread <device name> | grep f1b1

Example:

$ kfed read /ocfs02/asm/data01 | grep f1b1

kfdhdb.f1b1locn: 2 ; 0x0d4: 0x00000002

$ kfed read /ocfs02/asm/data02 | grep f1b1

kfdhdb.f1b1locn: 0 ; 0x0d4: 0x00000000

Sincedata01, has a non-zero value, data01 is the disk with “file 1 block 1″. Confirmthis by checking the following to see if you see “KFBTYP_LISTHEAD” in the 2ndallocation unit:

kfedread <device name> aunum=2 | grep kfbh.type

Also specify the ausize with AUSZ=# if using a non default allocationunit size.

Example:

$ kfed read /ocfs02/asm/data01 aunum=2 | grep kfbh.type

kfbh.type: 5 ; 0x002: KFBTYP_LISTHEAD

If the lost disk is the “file 1block 1″ disk then scan every AU of the bad disk tillyou find a header which claims to be FILE_DIRECTORY (KFBTYP_FILEDIR). Once youfind that you can set f1b1locn to that AU number and continue… If the file directory cannot be
found anywhere then we haveno choice but to re-create the diskgroup and restore from a backup.

4.Make a copy of a good disk header with kfed that IS NOT the disk that containsf1b1 and is in the SAME diskgroup as the bad disk. In our example this isdata02:

kfedread <device name> > fix.txt

Example:

$ kfed read /ocfs02/asm/data02 > fix.txt

5.Edit the fix.txt and change the following fields to the proper values (use theASM alert log for reference):

kfdhdb.dsknum

kfdhdb.dskname

kfdhdb.fgname

Example:

Check the alert log for proper names:

NOTE: cache opening disk 0 of grp 1: DATA_0000 path:/ocfs02/asm/data01

NOTE: cache opening disk 1 of grp 1: DATA_0001 path:/ocfs02/asm/data02

NOTE: cache opening disk 2 of grp 1: DATA_0002 path:/ocfs02/asm/data03

Old values from fix.txt:

kfdhdb.dsknum: 1 ; 0x024: 0x0001

kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL

kfdhdb.hdrsts: 3 ; 0x027:

KFDHDR_MEMBER

kfdhdb.dskname: DATA_0001 ; 0x028: length=9

kfdhdb.grpname: DATA ; 0x048: length=4

kfdhdb.fgname: DATA_0001 ; 0x068: length=9

New values from fix.txt:
 
kfdhdb.dsknum: 2 ; 0x024: 0x0002
kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL
 
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
 
kfdhdb.dskname: DATA_0002 ; 0x028: length=9
kfdhdb.grpname: DATA ; 0x048: length=4
kfdhdb.fgname: DATA_0002 ; 0x068: length=9



6.Find the disk directory by dumping aunum=2 and blknum=2 for the disk with f1b1:
kfedread <device name> aunum=2 blknum=2 | more

Example:
 
$ kfed read /ocfs02/asm/data01 aunum=2 blknum=2 | more
kfffde[0].xptr.au: 2 ; 0x4a0: 0x00000002
kfffde[0].xptr.disk: 2 ; 0x4a4:
0x0002
kfffde[0].xptr.flags: 0 ; 0x4a6: L=0 E=0 D=0 S=0
kfffde[0].xptr.chk: 42 ; 0x4a7: 0x2a
kfffde[1].xptr.au: 4294967295
; 0x4a8: 0xffffffff
kfffde[1].xptr.disk: 65535 ; 0x4ac: 0xffff
kfffde[1].xptr.flags: 0 ; 0x4ae: L=0 E=0 D=0 S=0
kfffde[1].xptr.chk: 42 ; 0x4af: 0x2a
After the initial file directory header, you will see the extent map. If the diskgroup is external redundancy
then each entry refers to an extent of the file. For normal redundancy, every pair is a extent set, similarly
for high redundancy
[012] form the extent set. Here we see the disk directory is at au = 2 in disk number = 2. In this example, it
turned out to be in that location on the second AU, but it is not guaranteed that it will always be there.



7.Once the disk directory location is found, find the info for your disk number.
kfedread <device name> aunum=2 blknum=0 | more

Example:
 
kfed read /ocfs02/asm/data02 aunum=2 blknum=0 | more
kfbh.type: 6 ; 0x002: KFBTYP_DISKDIR
...
kfddde[0].entry.incarn: 1 ;
0x024: A=1 NUMM=0x0
...
 
kfddde[2].dsknum: 2 ; 0x3b4: 0x0002
kfddde[2].state: 2 ; 0x3b6: KFDSTA_NORMAL
kfddde[2].ub1spare:
0 ; 0x3b7: 0x00
kfddde[2].dskname: DATA_0002 ; 0x3b8: length=9
kfddde[2].fgname: DATA_0002 ; 0x3d8: length=9
kfddde[2].crestmp.hi: 32885842 ; 0x3f8: HOUR=0x12 DAYS=0x2 MNTH=0x3 YEAR=0x7d7
kfddde[2].crestmp.lo: 3860343808 ; 0x3fc: USEC=0x0 MSEC=0x20b SECS=0x21 MINS=0x39
kfddde[2].failstmp.hi: 0 ; 0x400: HOUR=0x0 DAYS=0x0 MNTH=0x0 YEAR=0x0
kfddde[2].failstmp.lo: 0 ; 0x404: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0

Variouskfddde refer to the disk directory entries. Only entries with entry.incarnnumbers should A=1 are allocated entries. You might find entries with dsknamepopulated, but if A=0 then it means that entry was
deleted.


8.Now go back to fix.txt and adjust the crestmp.hi and crestmp.lo to match whatthe disk directory shows. If it