Padstow (08)
Monday 21 July, 2008 - 08:34
The padstow cluster had a few problems over the past week:
The DATA disk group got corrupted again (the same tablespace was affected - SYSAUX).
The cluster had timing problems - at one point the padstow2 node was nine (9) seconds ahead of padstow1.
The Grid Control agent was not collecting data or picking up targets on either node of the cluster.
Now I know there are some errors that ASM cannot protect against. I had to do a PITR because the archive logs were not duplexed across the DATA and FRA disk groups. At least, I am getting practice with RMAN backups and restorations.
To overcome the timing problems, I decided to go back to using NTP with gridctrl as the local NTP server. Although the other nodes recognise gridctrl as a peer (via ntpq peer), they still insist on using the local clock as the timing source.
The implementation procedure for NTP I have been using is:
vi /etc/ntp.conf (to add "server gridctrl")
Get the ntp service to recognise the new NTP server:
service ntpd restart
ntptime # to check the time
The Grid Control agent took several attempts at reinstallation before all the targets were detected. I am still having data collection errors. At least, I did not have to recreate the cluster from scratch to get this far.