Link to UMD T3 Site Configuration about CE, SE, Gridftp and squid machines.
https://gitlab.cern.ch/SITECONF/T3_US_UMD
CLEANUP SYSTEM DISK:
Cleanup /tmp area
systemctl start systemd-tmpfiles-clean
delete all files in /var/log/condor/
Check if cvmfs is writing files under / if yes, clean the cvmfs cache using command
cvmfs_config wipecache
https://www.tecmint.com/find-user-account-info-and-login-details-in-linux/
commands:
$ last -a
$ lastlog
$ w
$ who
$ users
$ getent passwd username
ypmatch username passwd
Service tag:
dmidecode -t 1
dmidecode -s system-serial-number
Hadoop:
The fsimage is stored in /var/lib/hadoop-hdfs/cache/hdfs/dfs/name/current/, so it is part of daily backup.
Checkpoints are stored on /data/osg/hadoop/
ls -Flart /data/osg/hadoop/
drwxr-xr-x 3 hdfs users 20 Aug 1 16:09 checkpoint2/
drwxr-xr-x 3 hdfs users 20 Aug 1 16:09 checkpoint1/
disk usage: (on datanode, namenode, seconadry-namenode)
hdfs dfs -ls /cms/store
hdfs dfs -du -s -h /cms/store
hdfs dfs -mkdir /cms/store/xxxx
hdfs dfs -chown user /cms/store/xxx
hdfs dfs -ls /
bad blocks: (on hepcms-in1.umd.edu)
hdfs fsck / | egrep -v '^\.+$' | grep -v eplica >& hadoop_bad_block_20231010.txt
Secondary-namenode
hepcms-secondary-namenode.privnet
service hadoop-hdfs-secondarynamenode status/start/stop
tail -100 /scratch/hadoop/hadoop-hdfs/hadoop-hdfs-secondarynamenode-hepcms-secondary-namenode.privnet.log
Removing datanode from hadoop
Login to the hepcms-namenode, become root
cd /etc/hadoop/conf
vi hosts_exclude # enter/comment out the nodename to be excluded
hdfs dfsadmin -refreshNodes
check the progress on local node after ssh -L 8001:hepcms-namenode:50070 bhatti@hepcms-in1.umd.edu'
CVMFS:
cvmfs_config wipecache
https://cvmfs.readthedocs.io/en/stable/cpt-configure.html
DNS Server:
Currently private/local name-server /DNS server are running at siab-1.privnet (10.1.0.100) and hepcms-in1.privnet (10.1.0.14).
Old nameserver was running at hepcms-foreman and should be obsolete except may for virtual machines running on ovirt.
It requires following packages.
yum install bind bind-utils
Here are the instructions.
https://www.digitalocean.com/community/tutorials/how-to-configure-bind-as-a-private-network-dns-server-on-centos-7
To add a new IP address to the network edit following three file must be changed.
/etc/named.conf
/var/named/dynamic/db.privnet
/var/named/dynamic/db.1.10.in-addr.arpa
After editing these files, restart the service.
systemctl restart named
DHCP Server
The network card IP address should match the IP of the local subnet. It is a static IP in the same subnet.
https://www.tecmint.com/install-dhcp-server-in-centos-rhel-fedora/
Edit dhcpd.service (?) file to specify the network card to be used for dhcp server.
cp /usr/lib/systemd/system/dhcpd.service /etc/systemd/system/
vi /etc/systemd/system/dhcpd.service
ExecStart=/usr/sbin/dhcpd -f -cf /etc/dhcp/dhcpd.conf -user dhcpd -group dhcpd --no-pid your_interface_name(s)
systemctl --system daemon-reload
systemctl restart dhcpd
The assigned IP addresses are stored in /var/lib/dhcpd/dhcpd.leases
NIS Server (passwords)
yum install -y ypserv ypbind
scp bhatti@hepcms-hn.umd.edu:/var/yp/securenets
scp bhatti@hepcms-hn.umd.edu:/var/yp/ypservers (edit it to change the server name)
# copy files from old server 10.1.0.1
/usr/lib64/yp/ypinit -s 10.1.0.1
#cd to /var/yp
#Conver files woth host name to ASCII
/usr/lib64/yp/makedbm -u nishepcms.privnet/hosts.byname > tmphosts.byname
# edit to change the source name
emacs-24.3 tmphosts.byname &
#
/usr/lib64/yp/makedbm - tmphosts.byname < tmphosts.bynameEdited
# copy it back
cp tmphosts.byname nishepcms.privnet/hosts.byname
#
Above instruction were from SL6 to Alma8. To rebuild after hn after crash, need to copy relavent files from the backup to newly built machine and restart the service.
Relevant files are under /etc/ and /var/yp (/etc/passwd, /etc/group /etc/shadow....)
#
systemctl restart ypserv
systemctl enable ypserv
systemctl status ypserv
ypwhich # name of new server
On each client node, edit /etc/yp.config and chane the IP address
Disks not visible
Reboot the Server.
During boot, press Ctrl + R when the PERC RAID Controller message appears.
You will enter the PERC H710/H730 Configuration Utility.
If you want to use RAID (recommended for redundancy):
Select "Virtual Disk Management"
Press F2 on the controller and choose Create New VD (Virtual Disk).
Select the disks and configure the RAID level:
RAID 0 (for individual disks)
RAID 1 (for mirroring, 2 disks)
RAID 5 (for redundancy, 3+ disks)
RAID 6 (higher redundancy, 4+ disks)
RAID 10 (best performance, 4+ disks in pairs)
Set the stripe size (default is fine for most cases).
Confirm and save the configuration.
Exit and reboot. Linux should now see the virtual RAID volume.
LVM
Combine multiple disks in one large volume i.e. combine /dev/sdb, /dev/sdc, /dev/sdd into /data.
#Create physical volumes on top of /dev/sdb, /dev/sdc, and /dev/sdd:
pvcreate /dev/sdb /dev/sdc /dev/sdd
# combine into single volume
vgcreate data /dev/sdb /dev/sdc /dev/sdd
# use all availabl free space in a single logical volume
lvcreate -n data -l +100%FREE data
mkdir /home
# find UUID
blkid /dev/data/data
# add following line in /etc/fstab (UUID from blkid command)
UUID=e1929239-5087-44b1-9396-53e09db6eb9e /data ext4 defaults 0 0
# mount the disk as /data as described on fstab
mount -a
Setting up new PowerEdge Server
New PowerEdge R750 has 2 disks ( ~ 1 TB) on backplane and 12 disk (8 TB) in front.
Remove all disks from RAID setup.
Create a virtual disk and add two back plan disk, use RAID-1.
12 front plane disks will be managed by OS/Linux directly.
Install Centos-7 using USB.
Setting up new Swicth/privnet network
https://www.ismoothblog.com/2019/07/access-cisco-switch-serial-console-linux.html
Install minicom to connect to serial port via RJ45-serial port connector
dmesg | grep tty
It is managed switch. Connect it to serial port of a computer by RJ-45/serial port cable which comes with system.
Turn of dhcp server.
Setup IP address of the switch and network mask (10.1.0.250 255.255.255.0
Setup default gateway.
Setup account (admin) with privilege level 15.
and store this info to start-up file.
After this you can access the system though web interface. Open web link 10.1.0.250 in any browser.
Remember IP address is set on the individual computer. Computer announces its MAC address and IP address to switch and switch stores it to local table.
Setting up gateway
Need setup one of the computers (hepcms-hn) as a gateway for privnet. Following commands may be useful.
iptables -P INPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -P OUTPUT ACCEPT
iptables -A INPUT -s 10.1.0.1/32 -p tcp -m tcp --dport 8649 -m state --state NEW -j ACCEPT
iptables -A INPUT -i p3p1 -j ACCEPT
iptables -A FORWARD -i p3p1 -j ACCEPT
iptables -A FORWARD -i p3p1 -j ACCEPT
iptables -t nat -A POSTROUTING -o enp0s25 -j MASQUERADE
iptables-save
https://www.baeldung.com/linux/network-gateway#:~:text=IP%20masquerading%20is%20a%20technique,the%20gateway's%20external%20IP%20address.
Assuming our Linux machine has two network interfaces – one for the internal network (eth0) and another for the external network (eth1) – we can use the iptables commands to complete the configuration:
iptables -t nat -A POSTROUTING -o eth1 -j MASQUERADE
iptables -A FORWARD -i eth1 -o eth0 -m state --state RELATED,ESTABLISHED -j ACCEPT
iptables -A FORWARD -i eth0 -o eth1 -j ACCEPT
The last step is to configure client systems on our internal network to use the gateway, either as a default gateway to access the internet or a gateway to a different network.
Assuming our client Ethernet interface name is eth0, and our gateway internal IP address is 192.168.0.1, we can use the ip command to temporarily set a default gateway:
# ip route add default via 10.1.0.40 dev eno1
# ip route add default via 10.1.0.40 dev em1
Copy
Similarly, if we are using our gateway to reach a remote network rather than the internet, and assuming the remote network address is 10.10.10.0/24, we can use the ip command to set a temporary static route through our gateway:
# ip route add 10.10.10.0/24 via 192.168.0.1
Copy
However, we can lose these routes if we restart the system. Therefore, we should make the changes persistent.
After activating IP forwarding and configuring the gateway and clients, it’s time to test the connectivity. We can connect devices from the internal network to the Linux machine and verify they can reach the internet.
There are multiple tools and utilities available to Linux users to verify connectivity, including ping, traceroute, netcat, etc.
For example, to ensure we can reach the internet from our client, we can utilize the ping command:
$ ping google.com
JUPYTER NOTEBOOK
#Install Anaconda3 under /usr/local/Anaconda3 as root i.e. download appropriate file from https://www.anaconda.com/products/distribution
#and run
bash Anaconda3-2022.05-Linux-x86_64.sh
#Setup environment
eval "$(/usr/local/anaconda3/bin/conda shell.bash hook)"
#Install tensorflow
pip3 install tensorflow
# delete line from text file
sed -i '1d' file
OMSA Coammands:
omreport storage vdisk controller=0
omreport storage controller=0
omreport storage pdisk controller=0
omreport storage pdisk controller=0 pdisk=0:1:4
omconfig storage pdisk action=blink controller=0 pdisk=0:1:4
omconfig storage pdisk action=unblink controller=0 pdisk=0:1:4
omconfig chassis leds led=identify flash=on
omconfig chassis leds led=identify flash=off
Condor commands
condor_ssh_to_job (connect to running job)