After power outage, the machine will come backup randomly. In principal, the order should be
hepcms-hn (gateway 10.1.0.1, NIS server, /home)
r720-datanfs (/data server)
siab-1 (/data2 server)
All working and interactive nodes.
All local node (10.1.0.xx) connect to outside world through hepcms-hn. Make sure iptables is running on hepcms-hn to direct privnet (10.1.0.nn) traffic to outside world.
service iptables restart"
Check that ypbind is running on all computers and /home, /data and /date2 are mounted properly.
Some of the machines may not reboot automatically. It is most probably due to failure to mount disk. On console, edit /etc/fstab file and comments out all local data disks (dev/sdb, /dev/sdc....) , /home, /data, /data2 and reboot. Mount the disk manually. Edit /etc/fstab and include the disks which do not have any problem.
Check if /DataCampusBackup and /CampusBackup are mounted on hepcms-hn.umd.edu. If not check /etc/resolve and add nameserver 128.8.74.2 if missing.
service ganglia-monitor restart
Make sure all the disks needed for /hadoop storage are mounted properly.
Start hadoop namenode.
Make sure /mnt/hadoop is mounted on all worker and interactive nodes.
open https://hepcms-ovirt.umd.edu/ovirt-engine/webadmin/?locale=en_US#vms-general
username admin /root password)
hepcms-namenode
hepcms-secondarynamenode
Start hadoop datanodes on all worker nodes.
Check the progress on http://localhost:8001/dfshealth.html#tab-datanode
On hepcms-se2, cmsd@clustered and xrootd@clustered may not start. If /var/run/xrootd is missing, one need to make a directory /var/run/xrootd and change ownership to xrootd.
mkdir /var/run/xrootd
chown xrootd:xrootd /var/run/xrootd
systemctl start cmsd@clustered
systemctl status cmsd@clustered
systemctl start xrootd@clustered
systemctl status xrootd@clustered
The authentication on hepcms-se2 was broken as CRL was corrupted. Deleted obsolete files and reinstalled
yum reinstall igtf-ca-certs
CRL were not being updates. re-install fetch-crl but it did not help. Deleted out-of-date files in /var/cache/fetch-crl/
and update CRL
systemctl enable fetch-crl-cron
systemctl start fetch-crl-cron