Cache Fusion is a new parallel database architecture for exploiting clustered computers to achieve scalability of all types of applications. Cache Fusion is a shared cache architecture that uses high speed low latency interconnects available today on clustered systems to maintain database cache coherency. Database blocks are shipped across the interconnect to the node where access to the data is needed. This is accomplished transparently to the application and users of the system. As Cache Fusion uses at most a 3 point protocol, this means that it easily scales to clusters with a large numbers of nodes. For more information about cache fusion see the following links:
Additional Information can be found at:
Note: 139436.1 Understanding 9i Real Application Clusters Cache Fusion
Each node of a cluster that is being used for a clustered database will typically have the database and Oracle RAC software loaded on it, but not actual datafiles (these need to be available via shared disk). For example, if you wish to run Oracle RAC on 2 nodes of a 4-node cluster, you would need to install the clusterware on all nodes, Oracle RAC on 2 nodes and it would only need to be licensed on the two nodes running the Oracle RAC database. Note that using a clustered file system, or NAS storage can provide a configuration that does not necessarily require the Oracle binaries to be installed on all nodes.
With Oracle RAC 11g Release 2, if you are using policy managed databases, then you should have the Oracle RAC binaries accessible on all nodes in the cluster.
For the cluster synchronization service (CSS), the master can be found by searching ORACLE_HOME/log/nodename/cssd/ocssd.log where it is either the Oracle HOME for the Oracle Clusterware (this is the Grid Infrastructure home in Oracle Database 11g Release 2).
For master of a enqueue resource with Oracle RAC, you can select from v$ges_resource. There should be a master_node column.
No. You will have to stop the Oracle Clusterware stack on the node on which you need to stop the Oracle ASM instance. Either use "crsctl stop cluster -n node_name" or "crsctl stop crs" for this purpose.
How to recover: </p>
In $ORACLE_HOME/dbs
. oraenv <instance_name>
sqlplus "/ as sysdba"
startup nomount
create pfile='recoversp' from spfile
/
shutdown immediate
quit
Now edit the newly created pfile to change the parameter to something sensible.
Then:
sqlplus "/ as sysdba"
startup pfile='recoversp' (or whatever you called it in step one).
create spfile='+DATA/GASM/spfileGASM.ora' from pfile='recoversp'
/
<b>N.B.The name of the spfile is in your original init(instance_name).ora so adjust to suit</b>
shutdown immediate
startup
quit
If you already have an ASM instance/diskgroup then the following creates a RAC database on that diskgroup (run as the Oracle user):
$ORACLE_HOME/bin/dbca -silent -createDatabase -templateName General_Purpose.dbc -gdbName $SID -sid $SID -sysPassword $PASSWORD -systemPassword $PASSWORD -sysmanPassword $PASSWORD -dbsnmpPassword $PASSWORD -emConfiguration LOCAL -storageType ASM -diskGroupName $ASMGROUPNAME -datafileJarLocation $ORACLE_HOME/assistants/dbca/templates -nodeinfo $NODE1,$NODE2 -characterset WE8ISO8859P1 -obfuscatedPasswords false -sampleSchema false -oratabLocation /etc/oratab
The following will create a ASM instance & 1 diskgroup (run as the ASM/Oracle user)
$ORA_ASM_HOME/bin/dbca -silent -configureASM -gdbName NO -sid NO -emConfiguration NONE -diskList $ASM_DISKS -diskGroupName $ASMGROUPNAME -nodeinfo $NODE1,$NODE2 -obfuscatedPasswords false -oratabLocation /etc/oratab -asmSysPassword $PASSWORD -redundancy $ASMREDUNDANCY
where ASM_DISKS = '/dev/sda1,/dev/sdb1' and ASMREDUNDANCY='NORMAL'
OCR is the Oracle Cluster Registry, it holds all the cluster related information such as instances, services. The OCR file format is binary and starting with 10.2 it is possible to mirror it. Location of file(s) is located in: /etc/oracle/ocr.loc in ocrconfig_loc and ocrmirrorconfig_loc variables.
Obviously if you only have one copy of the OCR and it is lost or corrupt then you must restore a recent backup, see ocrconfig utility for details, specifically -showbackup and -restore flags. Until a valid backup is restored the Oracle Clusterware will not startup due to the corrupt/missing OCR file.
The interesting discussion is what happens if you have the OCR mirrored and one of the copies gets corrupt? You would expect that everything will continue to work seemlessly. Well.. Almost.. The real answer depends on when the corruption takes place.
If the corruption happens while the Oracle Clusterware stack is up and running, then the corruption will be tolerated and the Oracle Clusterware will continue to funtion without interruptions. Despite the corrupt copy. DBA is advised to repair this hardware/software problem that prevent OCR from accessing the device as soon as possible; alternatively, DBA can replace the failed device with another healthy device using the ocrconfig utility with -replace flag.
If however the corruption happens while the Oracle Clusterware stack is down, then it will not be possible to start it up until the failed device becomes online again or some administrative action using ocrconfig utility with -overwrite flag is taken. When the Clusteware attempts to start you will see messages similar to:
total id sets (1), 1st set (1669906634,1958222370), 2nd set (0,0) my votes (1), total votes (2)
2006-07-12 10:53:54.301: [OCRRAW][1210108256]proprioini:disk 0 (/dev/raw/raw1) doesn't have enough votes (1,2)
2006-07-12 10:53:54.301: [OCRRAW][1210108256]proprseterror: Error in accessing physical storage [26]
This is because the software can't determin which OCR copy is the valid one. In the above example one of the OCR mirrors was lost while the Oracle Clusterware was down. There are 3 ways to fix this failure:
a) Fix whatever problem (hardware/software?) that prevent OCR from accessing the device.
b) Issue "ocrconfig -overwrite" on any one of the nodes in the cluster. This command will overwrite the vote check built into OCR when it starts up. Basically, if OCR device is configured with mirror, OCR assign each device with one vote. The rule is to have more than 50% of total vote (quorum) in order to safely make sure the available devices contain the latest data. In 2-way mirroring, the total vote count is 2 so it requires 2 votes to achieve the quorum. In the example above there isn't enough vote to start if only one device with one vote is available. (In the earlier example, while OCR is running when the device is down, OCR assign 2 vote to the surviving device and that is why this surviving device now with two votes can start after the cluster is down). See warning below
c) This method is not recommend to be performed by customers. It is possible to manually modify ocr.loc to delete the failed device and restart the cluster. OCR won't do the vote check if the mirror is not configured. See warning below
EXTREME CAUTION should be excersized if chosing option b or c above since data loss can occur if the wrong file is manipulated, please contact Oracle Support for assistance before proceeding.
What are Oracle Clusterware processes for 10g on Unix and Linux
Cluster Synchronization Services (ocssd) — Manages cluster node membership and runs as the oracle user; failure of this process results in cluster restart.
Cluster Ready Services (crsd) — The crs process manages cluster resources (which could be a database, an instance, a service, a Listener, a virtual IP (VIP) address, an application process, and so on) based on the resource's configuration information that is stored in the OCR. This includes start, stop, monitor and failover operations. This process runs as the root user
Event manager daemon (evmd) —A background process that publishes events that crs creates.
Process Monitor Daemon (OPROCD) —This process monitor the cluster and provide I/O fencing. OPROCD performs its check, stops running, and if the wake up is beyond the expected time, then OPROCD resets the processor and reboots the node. An OPROCD failure results in Oracle Clusterware restarting the node. OPROCD uses the hangcheck timer on Linux platforms.
RACG (racgmain, racgimon) —Extends clusterware to support Oracle-specific requirements and complex resources. Runs server callout scripts when FAN events occur.
What are Oracle database background processes specific to RAC
•LMS—Global Cache Service Process
•LMD—Global Enqueue Service Daemon
•LMON—Global Enqueue Service Monitor
•LCK0—Instance Enqueue Process
To ensure that each Oracle RAC database instance obtains the block that it needs to satisfy a query or transaction, Oracle RAC instances use two processes, the Global Cache Service (GCS) and the Global Enqueue Service (GES). The GCS and GES maintain records of the statuses of each data file and each cached block using a Global Resource Directory (GRD). The GRD contents are distributed across all of the active instances.
What are Oracle Clusterware Components
Voting Disk — Oracle RAC uses the voting disk to manage cluster membership by way of a health check and arbitrates cluster ownership among the instances in case of network failures. The voting disk must reside on shared disk.
Oracle Cluster Registry (OCR) — Maintains cluster configuration information as well as configuration information about any cluster database within the cluster. The OCR must reside on shared disk that is accessible by all of the nodes in your cluster
How do you troubleshoot node reboot
Please check metalink ...
Note 265769.1 Troubleshooting CRS Reboots
Note.559365.1 Using Diagwait as a diagnostic to get more information for diagnosing Oracle Clusterware Node evictions.
How do you backup the OCR
There is an automatic backup mechanism for OCR. The default location is : $ORA_CRS_HOME\cdata\"clustername"\
To display backups :
#ocrconfig -showbackup
To restore a backup :
#ocrconfig -restore
With Oracle RAC 10g Release 2 or later, you can also use the export command:
#ocrconfig -export -s online, and use -import option to restore the contents back.
With Oracle RAC 11g Release 1, you can do a manaual backup of the OCR with the command:
# ocrconfig -manualbackup
How do you backup voting disk
#dd if=voting_disk_name of=backup_file_name
How do I identify the voting disk location
#crsctl query css votedisk
How do I identify the OCR file location
check /var/opt/oracle/ocr.loc or /etc/ocr.loc ( depends upon platform)
or
#ocrcheck
Is ssh required for normal Oracle RAC operation ?
"ssh" are not required for normal Oracle RAC operation. However "ssh" should be enabled for Oracle RAC and patchset installation.
What is SCAN?
Single Client Access Name (SCAN) is s a new Oracle Real Application Clusters (RAC) 11g Release 2 feature that provides a single name for clients to access an Oracle Database running in a cluster. The benefit is clients using SCAN do not need to change if you add or remove nodes in the cluster.
SCANs uses IP's not assigned to any interface. Clusterware will be in charge of it. It will direct the requests to the appropriate servers in the cluster. VIPs are the IPs which also CRS maintains for each node. They must do this so that failover is possible, in the case one node fails it can fail over the VIP to another node.
The main purpose of SCAN is to provide ease of management/connection. For instance you can add new nodes to the cluster without changing your client TNSnames. This is because Oracle will automatically distribute requests accordingly based on the SCAN IPs which point to the underlying VIPs. Scan listeners do the bridge between clients and the underlying local listeners which are VIP-dependent.
What is the purpose of Private Interconnect ?
Clusterware uses the private interconnect for cluster synchronization (network heartbeat) and daemon communication between the the clustered nodes. This communication is based on the TCP protocol.
RAC uses the interconnect for cache fusion (UDP) and inter-process communication (TCP). Cache Fusion is the remote memory mapping of Oracle buffers, shared between the caches of participating nodes in the cluster.
Why do we have a Virtual IP (VIP) in Oracle RAC?
Without using VIPs or FAN, clients connected to a node that died will often wait for a TCP timeout period (which can be up to 10 min) before getting an error. As a result, you don't really have a good HA solution without using VIPs.
When a node fails, the VIP associated with it is automatically failed over to some other node and new node re-arps the world indicating a new MAC address for the IP. Subsequent packets sent to the VIP go to the new node, which will send error RST packets back to the clients. This results in the clients getting errors immediately.
What do you do if you see GC CR BLOCK LOST in top 5 Timed Events in AWR Report?
This is most likely due to a fault in interconnect network.
Check netstat -s
if you see "fragments dropped" or "packet reassemblies failed" , Work with your system administrator find the fault with network.
How many nodes are supported in a RAC Database?
10g Release 2, support 100 nodes in a cluster using Oracle Clusterware, and 100 instances in a RAC database.
Srvctl cannot start instance, I get the following error PRKP-1001 CRS-0215, however sqlplus can start it on both nodes? How do you identify the problem?
Set the environmental variable SRVM_TRACE to true.. And start the instance with srvctl. Now you will get detailed error stack.
What is the purpose of the ONS daemon?
The Oracle Notification Service (ONS) daemon is an daemon started by the CRS clusterware as part of the nodeapps. There is one ons daemon started per clustered node.
The Oracle Notification Service daemon receive a subset of published clusterware events via the local evmd and racgimon clusterware daemons and forward those events to application subscribers and to the local listeners.
This in order to facilitate:
a. the FAN or Fast Application Notification feature or allowing applications to respond to database state changes.
b. the 10gR2 Load Balancing Advisory, the feature that permit load balancing accross different rac nodes dependent of the load on the different nodes. The rdbms MMON is creating an advisory for distribution of work every 30seconds and forward it via racgimon and ONS to listeners and applications.
What are the major RAC wait events?
In a RAC environment the buffer cache is global across all instances in the cluster and hence the processing differs.The most common wait events related to this are gc cr request and gc buffer busy
GC CR request :the time it takes to retrieve the data from the remote cache
Reason: RAC Traffic Using Slow Connection or Inefficient queries (poorly tuned queries will increase the amount of data blocks requested by an Oracle session. The more blocks requested typically means the more often a block will need to be read from a remote instance via the interconnect.)
GC BUFFER BUSY: It is the time the remote instance locally spends accessing the requested data block.