2017-11-28 Recreate REDFERN Cluster on VICTORIA

Overview

After discovering why ASM and CRS failed to start up, I decided to rebuild the REDFERN cluster from scratch.

References

Procedure

Using Previous Procedures

Only re-initiallised the root_disk in both VMs in order to get a clean installation.

This time, I selected the minimal development environment in order to install Perl, and I created douglas as an admin user. Used visudo to not require the password for sudo for admin users (wheel group).

Restart the Installation

I did the following:

    1. Installed the pre-installatuion RPM as described in Install GI 12.1.0.2.
    2. Then the UDEV settings were set correctly as per Correct UDEV Settings.
    3. Rebooted both systems
  1. Use NFS for Oracle Software
    1. Overwrote the old OCR disk header with dd if=/dev/zero of=/dev/xvdh bs=8K count=12800
    2. Started the GI installation as described in Install GI 12.1.0.2.

Installer Failure

At Step 7: Network Interface Usage, the following message appeared:

Cause - Installer has detected that network interface eth0 does not maintain connectivity on all cluster nodes.

Action - Ensure that the chosen interface has been configured across all cluster nodes.

Additional Information:

Summary of the failed nodes

redfern2 - PRVG-11850 : The system call "connect" failed with error "113" while executing exectask on node "redfern2"

No route to host

- Cause: An attempt to execute exectask on the specified node failed.

- Action: Examine the accompanying error message for details or contact Oracle Support Services.

redfern1 - PRVG-11850 : The system call "connect" failed with error "113" while executing exectask on node "redfern1"

No route to host

- Cause: An attempt to execute exectask on the specified node failed.

- Action: Examine the accompanying error message for details or contact Oracle Support Services.

Investigate Failure

Following the advice in [INS-41112] Specified network interface doesnt maintain connectivity across cluster nodes. (Doc ID 1427202.1), I ran the following commands:

cd /opt/share/Software/grid/linuxamd64_12102/grid ./runcluvfy.sh comp nodecon -i eth0 -n redfern1,redfern2 -verbose

The output was:

Verifying node connectivity Checking node connectivity... Checking hosts config file... Node Name Status ------------------------------------ ------------------------ redfern2 passed redfern1 passed Verification of the hosts config file successful Interface information for node "redfern2" Name IP Address Subnet Gateway Def. Gateway HW Address MTU ------ --------------- --------------- --------------- --------------- ----------------- ------ eth0 192.168.1.141 192.168.1.0 UNKNOWN UNKNOWN 00:16:3E:00:00:12 1500 Interface information for node "redfern1" Name IP Address Subnet Gateway Def. Gateway HW Address MTU ------ --------------- --------------- --------------- --------------- ----------------- ------ eth0 192.168.1.140 192.168.1.0 UNKNOWN UNKNOWN 00:16:3E:00:00:0E 1500 Check: Node connectivity using interfaces on subnet "192.168.1.0" Check: Node connectivity of subnet "192.168.1.0" Source Destination Connected? ------------------------------ ------------------------------ ---------------- redfern1[192.168.1.140] redfern2[192.168.1.141] yes Result: Node connectivity passed for subnet "192.168.1.0" with node(s) redfern1,redfern2 Check: TCP connectivity of subnet "192.168.1.0" Source Destination Connected? ------------------------------ ------------------------------ ---------------- redfern1 : 192.168.1.140 redfern1 : 192.168.1.140 passed redfern2 : 192.168.1.141 redfern1 : 192.168.1.140 failed ERROR: PRVG-11850 : The system call "connect" failed with error "113" while executing exectask on node "redfern2" No route to host redfern1 : 192.168.1.140 redfern2 : 192.168.1.141 failed ERROR: PRVG-11850 : The system call "connect" failed with error "113" while executing exectask on node "redfern1" No route to host redfern2 : 192.168.1.141 redfern2 : 192.168.1.141 passed Result: TCP connectivity check failed for subnet "192.168.1.0" Checking subnet mask consistency... Subnet mask consistency check passed for subnet "192.168.1.0". Subnet mask consistency check passed. Result: Node connectivity check failed Verification of node connectivity was unsuccessful on all the specified nodes.

Solution: Disable Firewall

Following part of Oracle RAC12c on OL 7 using Virtualbox,I arn the following commands on both REDFREN1 and REDFREN1:

sudo systemctl stop firewalld sudo systemctl disable firewalld

The output was:

Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service. Removed symlink /etc/systemd/system/basic.target.wants/firewalld.service.

Validate Network Connectivity

Following the advice in [INS-41112] Specified network interface doesnt maintain connectivity across cluster nodes. (Doc ID 1427202.1), I ran the following commands:

cd /opt/share/Software/grid/linuxamd64_12102/grid ./runcluvfy.sh comp nodecon -i eth0 -n redfern1,redfern2 -verbose

The output was:

Verifying node connectivity Checking node connectivity... Checking hosts config file... Node Name Status ------------------------------------ ------------------------ redfern2 passed redfern1 passed Verification of the hosts config file successful Interface information for node "redfern2" Name IP Address Subnet Gateway Def. Gateway HW Address MTU ------ --------------- --------------- --------------- --------------- ----------------- ------ eth0 192.168.1.141 192.168.1.0 UNKNOWN UNKNOWN 00:16:3E:00:00:12 1500 Interface information for node "redfern1" Name IP Address Subnet Gateway Def. Gateway HW Address MTU ------ --------------- --------------- --------------- --------------- ----------------- ------ eth0 192.168.1.140 192.168.1.0 UNKNOWN UNKNOWN 00:16:3E:00:00:0E 1500 Check: Node connectivity using interfaces on subnet "192.168.1.0" Check: Node connectivity of subnet "192.168.1.0" Source Destination Connected? ------------------------------ ------------------------------ ---------------- redfern1[192.168.1.140] redfern2[192.168.1.141] yes Result: Node connectivity passed for subnet "192.168.1.0" with node(s) redfern1,redfern2 Check: TCP connectivity of subnet "192.168.1.0" Source Destination Connected? ------------------------------ ------------------------------ ---------------- redfern1 : 192.168.1.140 redfern1 : 192.168.1.140 passed redfern2 : 192.168.1.141 redfern1 : 192.168.1.140 passed redfern1 : 192.168.1.140 redfern2 : 192.168.1.141 passed redfern2 : 192.168.1.141 redfern2 : 192.168.1.141 passed Result: TCP connectivity check passed for subnet "192.168.1.0" Checking subnet mask consistency... Subnet mask consistency check passed for subnet "192.168.1.0". Subnet mask consistency check passed. Result: Node connectivity check passed Checking multicast communication... Checking subnet "192.168.1.0" for multicast communication with multicast group "224.0.0.251"... Check of subnet "192.168.1.0" for multicast communication with multicast group "224.0.0.251" passed. Check of multicast communication passed. Verification of node connectivity was successful.

Restart GI Installation

Followed the procedure in Install GI 12.1.0.2. However, Step 18: Install Product failed with:

Cause - Installer has failed to execute the specified script on one or more nodes. This might be because of exception occurred while executing the script on nodes. Action - Review the log files '/opt/app/oraInventory/logs/installActions2017-11-28_08-10-33PM.log' and '/opt/app/grid_infra/12.1.0/grid/cfgtoollogs/crsconfig/rootcrs_<nodename>_<timestamp>.log' for further details. More Details Execution of GI Install script is failed on nodes : [redfern1] Exception details - PRCZ-2009 : Failed to execute command "/opt/app/grid_infra/12.1.0/grid/root.sh" as root within 3,600 seconds on nodes "redfern1" - PRCZ-2009 : Failed to execute command "/opt/app/grid_infra/12.1.0/grid/root.sh" as root within 3,600 seconds on nodes "redfern1" Execution status of failed node:redfern1 Errors : Performing root user operation. The following environment variables are set as: ORACLE_OWNER= oracle ORACLE_HOME= /opt/app/grid_infra/12.1.0/grid Copying dbhome to /usr/local/bin ... Copying oraenv to /usr/local/bin ... Copying coraenv to /usr/local/bin ... Creating /etc/oratab file... Entries will be added to the /etc/oratab file as needed by Database Configuration Assistant when a database is created Finished running generic part of root script. Now product-specific root actions will be performed. Using configuration parameter file: /opt/app/grid_infra/12.1.0/grid/crs/install/crsconfig_params 2017/11/28 20:45:27 CLSRSC-4001: Installing Oracle Trace File Analyzer (TFA) Collector. 2017/11/28 20:46:24 CLSRSC-4002: Successfully installed Oracle Trace File Analyzer (TFA) Collector. 2017/11/28 20:46:26 CLSRSC-363: User ignored prerequisites during installation OLR initialization - successful root wallet root wallet cert root cert export peer wallet profile reader wallet pa wallet peer wallet keys pa wallet keys peer cert request pa cert request peer cert pa cert peer root cert TP profile reader root cert TP pa root cert TP peer pa cert TP pa peer cert TP profile reader pa cert TP profile reader peer cert TP peer user cert pa user cert 2017/11/28 20:47:10 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.service' CRS-4133: Oracle High Availability Services has been stopped. CRS-4123: Oracle High Availability Services has been started. CRS-4133: Oracle High Availability Services has been stopped. CRS-4123: Oracle High Availability Services has been started. CRS-2672: Attempting to start 'ora.evmd' on 'redfern1' CRS-2672: Attempting to start 'ora.mdnsd' on 'redfern1' CRS-2676: Start of 'ora.mdnsd' on 'redfern1' succeeded CRS-2676: Start of 'ora.evmd' on 'redfern1' succeeded CRS-2672: Attempting to start 'ora.gpnpd' on 'redfern1' CRS-2676: Start of 'ora.gpnpd' on 'redfern1' succeeded CRS-2672: Attempting to start 'ora.cssdmonitor' on 'redfern1' CRS-2672: Attempting to start 'ora.gipcd' on 'redfern1' CRS-2676: Start of 'ora.cssdmonitor' on 'redfern1' succeeded CRS-2676: Start of 'ora.gipcd' on 'redfern1' succeeded CRS-2672: Attempting to start 'ora.cssd' on 'redfern1' CRS-2672: Attempting to start 'ora.diskmon' on 'redfern1' CRS-2676: Start of 'ora.diskmon' on 'redfern1' succeeded CRS-2676: Start of 'ora.cssd' on 'redfern1' succeeded ASM created and started successfully. Disk Group VOTE created successfully. 2017/11/28 20:53:03 CLSRSC-12: The ASM resource ora.asm did not start 2017/11/28 20:53:03 CLSRSC-258: Failed to configure and start ASM Died at /opt/app/grid_infra/12.1.0/grid/crs/install/crsinstall.pm line 2017. The command '/opt/app/grid_infra/12.1.0/grid/perl/bin/perl -I/opt/app/grid_infra/12.1.0/grid/perl/lib -I/opt/app/grid_infra/12.1.0/grid/crs/install /opt/app/grid_infra/12.1.0/grid/crs/install/rootcrs.pl -auto -lang=en_AU.UTF-8' execution failed Standard output : Performing root user operation. The following environment variables are set as: ORACLE_OWNER= oracle ORACLE_HOME= /opt/app/grid_infra/12.1.0/grid Copying dbhome to /usr/local/bin ... Copying oraenv to /usr/local/bin ... Copying coraenv to /usr/local/bin ... Creating /etc/oratab file... Entries will be added to the /etc/oratab file as needed by Database Configuration Assistant when a database is created Finished running generic part of root script. Now product-specific root actions will be performed. Using configuration parameter file: /opt/app/grid_infra/12.1.0/grid/crs/install/crsconfig_params 2017/11/28 20:45:27 CLSRSC-4001: Installing Oracle Trace File Analyzer (TFA) Collector. 2017/11/28 20:46:24 CLSRSC-4002: Successfully installed Oracle Trace File Analyzer (TFA) Collector. 2017/11/28 20:46:26 CLSRSC-363: User ignored prerequisites during installation OLR initialization - successful root wallet root wallet cert root cert export peer wallet profile reader wallet pa wallet peer wallet keys pa wallet keys peer cert request pa cert request peer cert pa cert peer root cert TP profile reader root cert TP pa root cert TP peer pa cert TP pa peer cert TP profile reader pa cert TP profile reader peer cert TP peer user cert pa user cert 2017/11/28 20:47:10 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.service' CRS-4133: Oracle High Availability Services has been stopped. CRS-4123: Oracle High Availability Services has been started. CRS-4133: Oracle High Availability Services has been stopped. CRS-4123: Oracle High Availability Services has been started. CRS-2672: Attempting to start 'ora.evmd' on 'redfern1' CRS-2672: Attempting to start 'ora.mdnsd' on 'redfern1' CRS-2676: Start of 'ora.mdnsd' on 'redfern1' succeeded CRS-2676: Start of 'ora.evmd' on 'redfern1' succeeded CRS-2672: Attempting to start 'ora.gpnpd' on 'redfern1' CRS-2676: Start of 'ora.gpnpd' on 'redfern1' succeeded CRS-2672: Attempting to start 'ora.cssdmonitor' on 'redfern1' CRS-2672: Attempting to start 'ora.gipcd' on 'redfern1' CRS-2676: Start of 'ora.cssdmonitor' on 'redfern1' succeeded CRS-2676: Start of 'ora.gipcd' on 'redfern1' succeeded CRS-2672: Attempting to start 'ora.cssd' on 'redfern1' CRS-2672: Attempting to start 'ora.diskmon' on 'redfern1' CRS-2676: Start of 'ora.diskmon' on 'redfern1' succeeded CRS-2676: Start of 'ora.cssd' on 'redfern1' succeeded ASM created and started successfully. Disk Group VOTE created successfully. 2017/11/28 20:53:03 CLSRSC-12: The ASM resource ora.asm did not start 2017/11/28 20:53:03 CLSRSC-258: Failed to configure and start ASM Died at /opt/app/grid_infra/12.1.0/grid/crs/install/crsinstall.pm line 2017. The command '/opt/app/grid_infra/12.1.0/grid/perl/bin/perl -I/opt/app/grid_infra/12.1.0/grid/perl/lib -I/opt/app/grid_infra/12.1.0/grid/crs/install /opt/app/grid_infra/12.1.0/grid/crs/install/rootcrs.pl -auto -lang=en_AU.UTF-8' execution failed