Sunday 11 May, 2008 - 10:21
Searched MetaLink for vipca. Found the following among the hits:
The latter link pointed me to Server Control Utility Reference.
# ./srvctl config nodeapps -n richmond1 -aVIP exists.: /richmond1-vip/192.168.100.33/255.255.255.0/eth0Reviewing the logs. /u00/crs/oracle/product/10/app/log/richmond1/alertrichmond1.log shows:
CRS-1006:The OCR location /dev/raw/raw16 is inaccessible. Details in /u00/crs/oracle/product/10/app/log/richmond1/client/ocrconfig_3215.log.# cat /u00/crs/oracle/product/10/app/log/richmond1/client/ocrconfig_3215.logOracle Database 10g CRS Release 10.2.0.1.0 Production Copyright 1996, 2005 Oracle. All rights reserved.2008-05-11 00:26:26.972: [ OCRCONF][3076411520]ocrconfig starts...2008-05-11 00:26:26.973: [ OCRCONF][3076411520]Upgrading OCR data2008-05-11 00:26:27.099: [ OCRRAW][3076411520]propriogid:1: INVALID FORMAT2008-05-11 00:26:27.100: [ OCRRAW][3076411520]propriogid:1: INVALID FORMAT2008-05-11 00:26:27.100: [ OCRRAW][3076411520]proprioini: both disks are not OCR formatted2008-05-11 00:26:27.100: [ OCRRAW][3076411520]proprinit: Could not open raw device2008-05-11 00:26:27.110: [ default][3076411520]a_init:7!: Backend init unsuccessful : [26]2008-05-11 00:26:27.110: [ OCRCONF][3076411520]Exporting OCR data to [OCRUPGRADEFILE]2008-05-11 00:26:27.110: [ OCRAPI][3076411520]a_init:7!: Backend init unsuccessful : [33]2008-05-11 00:26:27.110: [ OCRCONF][3076411520]There was no previous version of OCR. error:[PROC-33: Oracle Cluster Registry is not configured]2008-05-11 00:26:27.138: [ OCRRAW][3076411520]propriogid:1: INVALID FORMAT2008-05-11 00:26:27.139: [ OCRRAW][3076411520]propriogid:1: INVALID FORMAT2008-05-11 00:26:27.139: [ OCRRAW][3076411520]proprioini: both disks are not OCR formatted2008-05-11 00:26:27.139: [ OCRRAW][3076411520]proprinit: Could not open raw device2008-05-11 00:26:27.149: [ default][3076411520]a_init:7!: Backend init unsuccessful : [26]2008-05-11 00:26:27.176: [ OCRRAW][3076411520]propriogid:1: INVALID FORMAT2008-05-11 00:26:27.176: [ OCRRAW][3076411520]propriogid:1: INVALID FORMAT2008-05-11 00:26:27.176: [ OCRRAW][3076411520]ibctx:1:ERROR: INVALID FORMAT2008-05-11 00:26:27.176: [ OCRRAW][3076411520]proprinit:problem reading the bootblock or superbloc 222008-05-11 00:26:27.212: [ OCRRAW][3076411520]propriogid:1: INVALID FORMAT2008-05-11 00:26:27.212: [ OCRRAW][3076411520]propriogid:1: INVALID FORMAT2008-05-11 00:26:27.234: [ OCRRAW][3076411520]propriowv: Vote information on disk 0 [/dev/raw/raw1] is adjusted from [0/0] to [1/2]2008-05-11 00:26:27.249: [ OCRRAW][3076411520]propriowv: Vote information on disk 1 [/dev/raw/raw16] is adjusted from [0/0] to [1/2]2008-05-11 00:26:27.531: [ OCRRAW][3076411520]propriniconfig:No 92 configuration2008-05-11 00:26:27.531: [ OCRAPI][3076411520]a_init:6a: Backend init successful2008-05-11 00:26:27.852: [ OCRCONF][3076411520]Initialized DATABASE keys in OCR2008-05-11 00:26:28.153: [ OCRCONF][3076411520]Successfully set skgfr block 02008-05-11 00:26:28.154: [ OCRCONF][3076411520]Exiting [status=success]...However when I check the OCR via ocrcheck, I find that the log is incorrect (who would've thought so?):
]# ./ocrcheckStatus of Oracle Cluster Registry is as follows : Version : 2 Total space (kbytes) : 264952 Used space (kbytes) : 1164 Available space (kbytes) : 263788 ID : 1926765147 Device/File Name : /dev/raw/raw1 Device/File integrity check succeeded Device/File Name : /dev/raw/raw16 Device/File integrity check succeeded Cluster registry integrity check succeededThe following is a bit weird. I am trying to enable debugging on the vip resource, and crs stops and starts without reason:
# ./crsctl debug log res "ora.richmond1.vip:1"error 9 encountered when connecting to CRSD# ./crsctl check crsCSS appears healthyCannot communicate with CRSEVM appears healthy# ./crsctl check crsdCRS appears healthy# ./crsctl check crsCSS appears healthyCRS appears healthyEVM appears healthy# ./crsctl debug log res "ora.richmond1.vip:1"Set Resource Debug Module: ora.richmond1.vip Level: 1alertrichmond1.log just shows the following entries without any preceding error messages:
2008-05-11 09:35:00.096[crsd(2786)]CRS-1012:The OCR service started on node richmond1.2008-05-11 09:35:01.460[crsd(2786)]CRS-1201:CRSD started on node richmond1.Later on, I found a core dump in /u00/crs/oracle/product/10/app/log/richmond1/crsd. It would appear that crs crashed and restarted automatically. The crsd.log in the same directory showed:
[ OCRUTL][396110768]u_check_keycomp:3': Case 3 or 4[ OCRAPI][396110768]procr_create_key:invalid keyname [.USR_ORA_DEBUG] passed2008-05-11 09:33:55.080: [ CRSOCR][396110768]0OCR api procr_create_key failed for key .USR_ORA_DEBUG. OCR error code = 8 OCR error msg: PROC-8: Cannot perform cluster registry operation because one of the parameters is invalid.2008-05-11 09:34:59.127: [ default][3076412416][ENTER]0When I try to start the vip resource,
# ./crs_start ora.richmond1.vipAttempting to start `ora.richmond1.vip` on member `richmond1`Start of `ora.richmond1.vip` on member `richmond1` failed.CRS-1006: No more members to considerCRS-0215: Could not start resource 'ora.richmond1.vip'.The ora.richmond1.vip.log in /u00/crs/oracle/product/10/app/log/richmond1/racg shows:
Sun May 11 09:41:09 EST 2008 [ 12629 ] Checking interface existanceSun May 11 09:41:09 EST 2008 [ 12629 ] Calling getifbyipSun May 11 09:41:09 EST 2008 [ 12629 ] getifbyip: started for2008-05-11 09:41:18.664: [ RACG][3076420800] [12625][3076420800][ora.richmond1.vip]: 192.168.100.33Sun May 11 09:41:09 EST 2008 [ 12629 ] Completed getifbyipSun May 11 09:41:09 EST 2008 [ 12629 ] Calling getifbyip -aSun May 11 09:41:09 EST 2008 [ 12629 ] getifbyip: started for 192.168.100.33Sun May 11 09:41:09 EST 2008 [ 12629 ] Comp2008-05-11 09:41:18.664: [ RACG][3076420800] [12625][3076420800][ora.richmond1.vip]: leted getifbyipSun May 11 09:41:12 EST 2008 [ 12629 ] Completed with initial interface testSun May 11 09:41:12 EST 2008 [ 12629 ] Interface testsSun May 11 09:41:12 EST 2008 [ 12629 ] checkIf: start for if=eth0Sun May 11 09:41:12 EST 2008 [ 12629 ] /s2008-05-11 09:41:18.664: [ RACG][3076420800] [12625][3076420800][ora.richmond1.vip]: bin/mii-tool eth0 errorSun May 11 09:41:12 EST 2008 [ 12629 ] defaultgw: startedSun May 11 09:41:12 EST 2008 [ 12629 ] defaultgw: completed with 192.168.101.254Sun May 11 09:41:18 EST 2008 [ 12629 ] checkIf: ping and RX packets checked if=eth0 failed2008-05-11 09:41:18.664: [ RACG][3076420800] [12625][3076420800][ora.richmond1.vip]:Interface eth0 checked failed (host=richmond1.yaocm.id.au)Sun May 11 09:41:18 EST 2008 [ 12629 ] checkIf: end for if=eth0Invalid parameters, or failed to bring up VIP (host=richmond1.yaocm.id.au)Here the error message is pointing to the default gateway on eth1.
Updated the gateway parameter in /etc/sysconfig/network-scripts/ifcfg-eth1 to 192.168.101.1 and restarted the network service.
The vip resource still fails to come online. The log shows:
2008-05-11 10:08:45.873: [ RACG][3076424896] [17350][3076424896][ora.richmond1.vip]: [ 17620 ] Interface testsSun May 11 10:08:39 EST 2008 [ 17620 ] checkIf: start for if=eth0Sun May 11 10:08:39 EST 2008 [ 17620 ] /sbin/mii-tool eth0 errorSun May 11 10:08:39 EST 2008 [ 17620 ] defaultgw: startedSun May 11 10:08:39 EST 2008 [ 17620 ]2008-05-11 10:08:45.873: [ RACG][3076424896] [17350][3076424896][ora.richmond1.vip]: defaultgw: completed with 192.168.101.1Sun May 11 10:08:45 EST 2008 [ 17620 ] checkIf: ping and RX packets checked if=eth0 failedInterface eth0 checked failed (host=richmond1.yaocm.id.au)Sun May 11 10:08:45 EST 2008 [ 17620 ] checkIf: end for if=eth0It is the wrong default gateway!
# routeKernel IP routing tableDestination Gateway Genmask Flags Metric Ref Use Iface192.168.100.0 * 255.255.255.0 U 0 0 0 eth0192.168.101.0 * 255.255.255.0 U 0 0 0 eth1169.254.0.0 * 255.255.0.0 U 0 0 0 eth1default richmond1-priv 0.0.0.0 UG 0 0 0 eth1Now to figure out how to change the default gateway.