Richmond Cluster (13)

Thursday 08 May, 2008 - 19:40

I think I have found the problem described in Richmond Cluster (12). The TCP/IP parameters were not set correctly. The specific one was sys.net.ipv4.ip_local_port_range which had values 32768 61000. The recommended values from Clusterware Installation for Linux (pp.2-37 to 2-38) are 1024 65000. I checked the other TCP/IP and found that they were wrong as well:

net.core.rmem_default = 65535

net.core.wmem_default = 65535

net.core.rmem_max = 131071

net.core.wmem_max = 131071

My reasoning was that the script was trying to establish communications on port 6200 which requires root privilege at the current settings.

Updated the networking parameters on both richmond1 and richmond2:

$ su -

# cat >>/etc/sysctl.conf

net.ipv4.ip_local_port_range = 1024 65000

net.core.rmem_default = 1048576

net.core.wmem_default = 262144

net.core.rmem_max = 1048576

net.core.wmem_max = 262144

# systctl -p

Looks like I cannot rely on cluvfy for everything.

Clicked retry in OUI. Failed at the same point again.

Stopped crs on both systems:

# cd /u00/crs/oracle/product/10/app/bin

# ./crsctl stop crs

Stopping resources.

Error while stopping resources. Possible cause: CRSD is down.

Stopping CSSD.

Unable to communicate with the CSS daemon.

Started crs on both nodes:

# ./crsctl start crs

Attempting to start CRS stack

The CRS stack will be started shortly

However when I check the status of crs, I get the following:

# ./crsctl check crs

Failure 1 contacting CSS daemon

Cannot communicate with CRS

Cannot communicate with EVM

The system log (/var/log/messages) shows:

logger: Oracle CSS daemon failed to start up. Check CRS logs for diagnostics.

On richmond2, /u00/crs/oracle/product/10/app/log/richmond2/alertrichmond2.log shows:

[cssd(19779)]CRS-1604:CSSD voting file is offline: /dev/raw/raw2. Details in /u00/crs/oracle/product/10/app/log/richmond2/cssd/ocssd.log.

[cssd(19779)]CRS-1604:CSSD voting file is offline: /dev/raw/raw17. Details in /u00/crs/oracle/product/10/app/log/richmond2/cssd/ocssd.log.

[cssd(19779)]CRS-1604:CSSD voting file is offline: /dev/raw/raw32. Details in /u00/crs/oracle/product/10/app/log/richmond2/cssd/ocssd.log.

On richmond2, /u00/crs/oracle/product/10/app/log/client/css.log shows:

[ CSSCLNT][3076425056]clsssInitNative: connect failed, rc 9

The logs on richmond1 are not that helpful.

I decided to deinstall clusterware and reinstall it.