VIP Problem on Richmond1 (03)

Sunday 11 May, 2008 - 10:21

Searched MetaLink for vipca. Found the following among the hits:

The latter link pointed me to Server Control Utility Reference.

# ./srvctl config nodeapps -n richmond1 -a
VIP exists.: /richmond1-vip/192.168.100.33/255.255.255.0/eth0

Reviewing the logs. /u00/crs/oracle/product/10/app/log/richmond1/alertrichmond1.log shows:

CRS-1006:The OCR location /dev/raw/raw16 is inaccessible. Details in /u00/crs/oracle/product/10/app/log/richmond1/client/ocrconfig_3215.log.
# cat /u00/crs/oracle/product/10/app/log/richmond1/client/ocrconfig_3215.log
Oracle Database 10g CRS Release 10.2.0.1.0 Production Copyright 1996, 2005 Oracle. All rights reserved.
2008-05-11 00:26:26.972: [ OCRCONF][3076411520]ocrconfig starts...
2008-05-11 00:26:26.973: [ OCRCONF][3076411520]Upgrading OCR data
2008-05-11 00:26:27.099: [ OCRRAW][3076411520]propriogid:1: INVALID FORMAT
2008-05-11 00:26:27.100: [ OCRRAW][3076411520]propriogid:1: INVALID FORMAT
2008-05-11 00:26:27.100: [ OCRRAW][3076411520]proprioini: both disks are not OCR formatted
2008-05-11 00:26:27.100: [ OCRRAW][3076411520]proprinit: Could not open raw device
2008-05-11 00:26:27.110: [ default][3076411520]a_init:7!: Backend init unsuccessful : [26]
2008-05-11 00:26:27.110: [ OCRCONF][3076411520]Exporting OCR data to [OCRUPGRADEFILE]
2008-05-11 00:26:27.110: [ OCRAPI][3076411520]a_init:7!: Backend init unsuccessful : [33]
2008-05-11 00:26:27.110: [ OCRCONF][3076411520]There was no previous version of OCR. error:[PROC-33: Oracle Cluster Registry is not configured]
2008-05-11 00:26:27.138: [ OCRRAW][3076411520]propriogid:1: INVALID FORMAT
2008-05-11 00:26:27.139: [ OCRRAW][3076411520]propriogid:1: INVALID FORMAT
2008-05-11 00:26:27.139: [ OCRRAW][3076411520]proprioini: both disks are not OCR formatted
2008-05-11 00:26:27.139: [ OCRRAW][3076411520]proprinit: Could not open raw device
2008-05-11 00:26:27.149: [ default][3076411520]a_init:7!: Backend init unsuccessful : [26]
2008-05-11 00:26:27.176: [ OCRRAW][3076411520]propriogid:1: INVALID FORMAT
2008-05-11 00:26:27.176: [ OCRRAW][3076411520]propriogid:1: INVALID FORMAT
2008-05-11 00:26:27.176: [ OCRRAW][3076411520]ibctx:1:ERROR: INVALID FORMAT
2008-05-11 00:26:27.176: [ OCRRAW][3076411520]proprinit:problem reading the bootblock or superbloc 22
2008-05-11 00:26:27.212: [ OCRRAW][3076411520]propriogid:1: INVALID FORMAT
2008-05-11 00:26:27.212: [ OCRRAW][3076411520]propriogid:1: INVALID FORMAT
2008-05-11 00:26:27.234: [ OCRRAW][3076411520]propriowv: Vote information on disk 0 [/dev/raw/raw1] is adjusted from [0/0] to [1/2]
2008-05-11 00:26:27.249: [ OCRRAW][3076411520]propriowv: Vote information on disk 1 [/dev/raw/raw16] is adjusted from [0/0] to [1/2]
2008-05-11 00:26:27.531: [ OCRRAW][3076411520]propriniconfig:No 92 configuration
2008-05-11 00:26:27.531: [ OCRAPI][3076411520]a_init:6a: Backend init successful
2008-05-11 00:26:27.852: [ OCRCONF][3076411520]Initialized DATABASE keys in OCR
2008-05-11 00:26:28.153: [ OCRCONF][3076411520]Successfully set skgfr block 0
2008-05-11 00:26:28.154: [ OCRCONF][3076411520]Exiting [status=success]...

However when I check the OCR via ocrcheck, I find that the log is incorrect (who would've thought so?):

]# ./ocrcheck
Status of Oracle Cluster Registry is as follows :
     Version : 2
     Total space (kbytes) : 264952
     Used space (kbytes) : 1164
     Available space (kbytes) : 263788
     ID : 1926765147
     Device/File Name : /dev/raw/raw1
         Device/File integrity check succeeded
     Device/File Name : /dev/raw/raw16
        Device/File integrity check succeeded
     Cluster registry integrity check succeeded

The following is a bit weird. I am trying to enable debugging on the vip resource, and crs stops and starts without reason:

# ./crsctl debug log res "ora.richmond1.vip:1"
error 9 encountered when connecting to CRSD
# ./crsctl check crs
CSS appears healthy
Cannot communicate with CRS
EVM appears healthy
# ./crsctl check crsd
CRS appears healthy
# ./crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy
# ./crsctl debug log res "ora.richmond1.vip:1"
Set Resource Debug Module: ora.richmond1.vip Level: 1

alertrichmond1.log just shows the following entries without any preceding error messages:

2008-05-11 09:35:00.096
[crsd(2786)]CRS-1012:The OCR service started on node richmond1.
2008-05-11 09:35:01.460
[crsd(2786)]CRS-1201:CRSD started on node richmond1.

Later on, I found a core dump in /u00/crs/oracle/product/10/app/log/richmond1/crsd. It would appear that crs crashed and restarted automatically. The crsd.log in the same directory showed:

[ OCRUTL][396110768]u_check_keycomp:3': Case 3 or 4
[ OCRAPI][396110768]procr_create_key:invalid keyname [.USR_ORA_DEBUG] passed
2008-05-11 09:33:55.080: [ CRSOCR][396110768]0OCR api procr_create_key failed for key .USR_ORA_DEBUG. OCR error code = 8 OCR error msg: PROC-8: Cannot perform cluster registry operation because one of the parameters is invalid.
2008-05-11 09:34:59.127: [ default][3076412416][ENTER]0

When I try to start the vip resource,

# ./crs_start ora.richmond1.vip
Attempting to start `ora.richmond1.vip` on member `richmond1`
Start of `ora.richmond1.vip` on member `richmond1` failed.
CRS-1006: No more members to consider
CRS-0215: Could not start resource 'ora.richmond1.vip'.

The ora.richmond1.vip.log in /u00/crs/oracle/product/10/app/log/richmond1/racg shows:

Sun May 11 09:41:09 EST 2008 [ 12629 ] Checking interface existance
Sun May 11 09:41:09 EST 2008 [ 12629 ] Calling getifbyip
Sun May 11 09:41:09 EST 2008 [ 12629 ] getifbyip: started for
2008-05-11 09:41:18.664: [ RACG][3076420800] [12625][3076420800][ora.richmond1.vip]: 192.168.100.33
Sun May 11 09:41:09 EST 2008 [ 12629 ] Completed getifbyip
Sun May 11 09:41:09 EST 2008 [ 12629 ] Calling getifbyip -a
Sun May 11 09:41:09 EST 2008 [ 12629 ] getifbyip: started for 192.168.100.33
Sun May 11 09:41:09 EST 2008 [ 12629 ] Comp
2008-05-11 09:41:18.664: [ RACG][3076420800] [12625][3076420800][ora.richmond1.vip]: leted getifbyip
Sun May 11 09:41:12 EST 2008 [ 12629 ] Completed with initial interface test
Sun May 11 09:41:12 EST 2008 [ 12629 ] Interface tests
Sun May 11 09:41:12 EST 2008 [ 12629 ] checkIf: start for if=eth0
Sun May 11 09:41:12 EST 2008 [ 12629 ] /s
2008-05-11 09:41:18.664: [ RACG][3076420800] [12625][3076420800][ora.richmond1.vip]: bin/mii-tool eth0 error
Sun May 11 09:41:12 EST 2008 [ 12629 ] defaultgw: started
Sun May 11 09:41:12 EST 2008 [ 12629 ] defaultgw: completed with 192.168.101.254
Sun May 11 09:41:18 EST 2008 [ 12629 ] checkIf: ping and RX packets checked if=eth0 failed
2008-05-11 09:41:18.664: [ RACG][3076420800] [12625][3076420800][ora.richmond1.vip]:
Interface eth0 checked failed (host=richmond1.yaocm.id.au)
Sun May 11 09:41:18 EST 2008 [ 12629 ] checkIf: end for if=eth0
Invalid parameters, or failed to bring up VIP (host=richmond1.yaocm.id.au)

Here the error message is pointing to the default gateway on eth1.

Updated the gateway parameter in /etc/sysconfig/network-scripts/ifcfg-eth1 to 192.168.101.1 and restarted the network service.

The vip resource still fails to come online. The log shows:

2008-05-11 10:08:45.873: [ RACG][3076424896] [17350][3076424896][ora.richmond1.vip]: [ 17620 ] Interface tests
Sun May 11 10:08:39 EST 2008 [ 17620 ] checkIf: start for if=eth0
Sun May 11 10:08:39 EST 2008 [ 17620 ] /sbin/mii-tool eth0 error
Sun May 11 10:08:39 EST 2008 [ 17620 ] defaultgw: started
Sun May 11 10:08:39 EST 2008 [ 17620 ]
2008-05-11 10:08:45.873: [ RACG][3076424896] [17350][3076424896][ora.richmond1.vip]: defaultgw: completed with 192.168.101.1
Sun May 11 10:08:45 EST 2008 [ 17620 ] checkIf: ping and RX packets checked if=eth0 failed
Interface eth0 checked failed (host=richmond1.yaocm.id.au)
Sun May 11 10:08:45 EST 2008 [ 17620 ] checkIf: end for if=eth0

It is the wrong default gateway!

# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
192.168.100.0 * 255.255.255.0 U 0 0 0 eth0
192.168.101.0 * 255.255.255.0 U 0 0 0 eth1
169.254.0.0 * 255.255.0.0 U 0 0 0 eth1
default richmond1-priv 0.0.0.0 UG 0 0 0 eth1

Now to figure out how to change the default gateway.