Lewis Qualkenbush - HEP T3 Cluster Management
February - April 2018
An outline of the "research" done regarding how the UMD HEP T3 cluster works, including a somewhat detailed guide of how to create and install software on your own cluster. All work done during this time period was in collaboration with Sam Schoberg and under the mentorship of Dr. Shabnam Jabeen. Thank you to Dr. Jabeen and the third floor of PSC for allowing us to use some machines and space for the Spring 2018 semester.
Resources
Learning Phase
Initial Research
25 January, 2018
Today we begin by finding web-pages associated with the project we are working on. I am going to find some information on the structure of the WLCG.
To begin, here is a public website by the people at WLCG themselves:
http://wlcg-public.web.cern.ch/
This initial page outlines the need for a system of this size when dealing with the data given from LHC experiments.
Introduction to WLCG:
Specifically pertaining the tier system:
http://wlcg-public.web.cern.ch/tier-centres
We will be working on a Tier-3 system, which in essence means we are not associated with WLCG professionally, but are able to use the resources locally for individual experiments. To provide an overview of the rest of the system:
Tier 0 - Two computing centers that capture the initial raw data and perform some initial calculations. One is located at CERN in Geneva and the other is in Budapest. These centers account for the distribution of data to the Tier-1 systems.
Tier 1 - 13 large centers that distribute data to the Tier-2 systems but also reprocess the data while managing both raw data from LHC and specific output.
Tier 2 - About 160 universities and scientific institutes that provide computing power for specific tasks.
Specifically pertaining to structure and networking:
http://wlcg-public.web.cern.ch/structure
The LHC Optical Private Network connects Tier-0 to Tier-1 systems on a private and dedicated optical-fibre network. The LHC Open Network Environment is used for communication between all other tiers. It is also worth noting that most systems within the network use ROOT for physics analysis. Data transfer is managed by the Grid File Transfer Service.
26 January, 2018
We met with Dr. Jabeen today, in which she began explaining what the cluster does and some of the software used to make sure that it runs correctly and efficiently. This software includes Apache Hadoop and Puppet, which I will be teaching myself about in the coming week.
29 January, 2018
Introduction to Hadoop:
Today I started reading about Apache Hadoop, specifically how to install and manipulate it.
An overview of the software is located here:
http://hadoop.apache.org/docs/current/
Helpful video outlining the basics of Hadoop:
https://www.youtube.com/watch?v=NeqC6t1J1dw&t=21s
To begin, Hadoop consists of two main parts, that being the HDFS (Hadoop Distributed File System), and the MapReduce framework. HDFS allows the nodes to speak to each other in order to store the data properly. Hadoop is important in that it greatly reduces the time it take for data to be written. This is because it will take a large amount of data and allocate that data to different nodes in the cluster. It also saves backups of that data on separate nodes in the event that one node dies. This is usually configured so that two copies will exist on one rack in different nodes, and another copy exists on another rack. Not only does this framework allow for a quicker writing process and safer data storage, but also benefits analysis in that different parts of the data are stored on different nodes and can be managed or analyzed separately yet concurrently.
31 January, 2018
Today I began looking into Puppet. They actually provide what they call a "Learning VM" that teaches you the basics of their software by completing different tasks in their specific environment. In order to do this however, you must already have some sort of VM software downloaded. I am currently in the process of getting a VM with Linux 7 set up, and I may ask some questions about how worthwhile this may be at the team meeting we have today.
Introduction to Puppet:
Information on the "Learning VM":
https://puppet.com/download-learning-vm
Video they provide outlining the basics of their own software:
https://www.youtube.com/watch?v=QFcqvBk1gNA
What I have learned about the software so far is that it is an automation tool. Instead of an administrator configuring every machine separately, they can use puppet to automate these tasks. Puppet also has a specific command language that I will need to become familiar with, and which I can assume is a part of the "Learning VM".
Cluster VMs
7 February, 2018
Sam is having trouble getting his VM working as we think he has not been given certain permissions. I am having some trouble following the pictures provided to make the VM.
8 February, 2018
Today before class, Dr. Jabeen helped me set up my VM. I needed to make sure that the name of my machine as well as the IP address were both individual. There, however, are still some questions I have pertaining to setup.
14 February, 2018
Today we continued to change our configuration settings in order to see what may actually let our machines build. Dr. Jabeen advised that we begin to take a look at some python and shell scripts used in cluster management.
15 February, 2018
Before my presentation today, I read through the .sh script provided to us. It dealt mostly with a command called omreport which is included with OMSA developed by Dell. This command allows an admin to check on the status of a node as well as checking system component information.
Personal VMs
21 February, 2018
After having noticed that the machines had not built after a week, we learned that the cluster was facing some issues. Apparently there are some problems involving the allocation of files to different nodes and then an excess of jobs on that specific node. Nonetheless, it was decided that instead of chasing VMs on the cluster, we would instead set up our own machines with Scientific Linux 7 using VirtualBox. After these machines are set up we can begin to run management scripts in order to learn exactly what they do.
CronTab:
Crontab is short for CRON Table. CRON is used for running scripts at specific time intervals outlined by an administrator. Each field stands for a different time interval (month, day of week, minutes, etc.), and the symbol * (wildcard) means run for every possible instance of that time interval. Every command run using CRON runs in the background. This tool is very useful for automation and system management and monitoring.
22 February, 2018
Today I will begin setting up my VM.
Virtual Machine setup with SL7 Image:
*Aside - Read up on how .iso files and VirtualBox work beforehand
Begin by making sure that you have the correct image (I am using SL7.3) you want to use downloaded to your computer
Start by setting up VirtualBox on your computer of which is attainable here: https://www.virtualbox.org/wiki/Downloads
After basic installation, click on the new icon in the top left, here is where you can name your machine
For the Operating System, choose Linux as the type, and Red Hat (as Scientific Linux is RHEL-based) as the version
After this you will be taken to the memory allocation step. It is important you take into account how much memory you will need on your machine, and how much memory your host has
Next is partitioning. Choose for a disk to be made for you, and then on the next page, choose VirtualBox Disk Image, and then after that choose dynamically allocated (this makes things convenient). You can then choose how much space on your machine you want to put towards this virtual machine. Just like the memory, take into account how much space your host machine has. This step is just telling the host machine how you want to partition its disk space so you can have space for the virtual machine
You are then able to create the machine, and a new icon should pop up on your VirtualBox window to show your new VM
Right click on the machine, and then choose Start > Normal Start. A new window will pop up and ask which image you want to use for this machine. Choose your preferred .iso
You should now have a working virtual machine with an SL7 image
28 February, 2018
I have begun work on my personal SL7 machine after the successful setup. I do want to note that the .iso that I installed with includes the graphical interface and is set at default runlevel 5. In order to make machine less memory-intensive, I decided to change the runlevel to 3 (simple terminal setting, multi-user).
Switching runlevels in RHEL 7:
*Aside - Do not follow these steps if you are not comfortable with working solely with the CLI (Command Line Interface), however I believe the command startx should return you to the graphical environment if need be.
You can begin by checking your runlevel using runlevel or who -r
In short, there are a series of .target files that "emulate" different runlevels. Your default runlevel should be 5, called graphical.target, but we want to deal with multi-user.target for runlevel 3.
To set the default runlevel when the machine boots, enter systemctl enable multi-user.target and then systemctl set-default multi-user.target
If you would like to switch targets without rebooting, you can use systemctl isolate multi-user.target
Feel free to use the free command and check how much memory you now have available.
Personal Cluster
7 March, 2018
Introduction to NIS:
NIS (Network Information Service), previously called Yellow Pages or YP, was created by Sun Microsystems for large system management. It is useful in that NIS on the host server keeps a master copy of certain important /etc files it will then distribute to the rest of the nodes on the system, referred to as clients. You can read more about NIS here: http://www.linux-nis.org/
Introduction to Ganglia:
Ganglia is most useful in the fact that it is a monitoring tool for HPC systems. It makes analytics about a cluster easy to read given the graphs and information is captures about each node and the cluster as a whole. You can read more about Ganglia here: http://ganglia.sourceforge.net/
Introduction to ClusterShell:
ClusterShell, as it's name implies, is a shell created specifically for HPC purposes. It can be used in an interactive shell ("clush"), and makes grouping nodes together quite easy. You can read more about ClusterShell here: https://clustershell.readthedocs.io/en/latest/intro.html
14 March, 2018
Today I am beginning installation of the VDE or Virtual Distributed Ethernet. The basic premise is that we want to assign static IP adresses on a virtualized ethernet switch. Sam and I will be researching exactly how this is done, and will try and spend some time setting it up. You can read more about VDE here: http://wiki.virtualsquare.org/wiki/index.php/VDE_Basic_Networking
The only problem with VDE networking is the fact that the virtualization software is somewhat complicated conceptually. The benefit of using this however would be simplification of networking as well as the ability for our own personal laptops to have dedicated VMs as nodes on our cluster.
27 March, 2018
We are now dealing with the installation of SL7 onto a random machine in order to create a head node. It is therefore needed to put that image onto some sort of media to be booted off of by our machine. I had the SL7 complete .iso downloaded onto my personal computer, so I put that file onto a 16 GB USB drive I had. I thought at first to use a DVD-R, however, the storage space was less than the size of the .iso, about 4.7 GB as compared to 6.9 GB. I also didn't have any dual-layers at my disposal.
Putting a .iso on a USB:
*Aside - It helps to start off by becoming the super-user (sudo su) for these steps
*Aside - Make sure to be very careful with the dd command, as you can easily overwrite your hard drive or something else of equal importance
To begin, make sure that your media is big enough to hold a the .iso file you want to use. I personally opted for a 16GB USB stick, given my .iso file was about 7 GB
After you insert the USB into your machine, you want to go to the /dev directory and find which device file is your media. My USB happened to be sdc, however you can use either parted -s /dev/<device filename> print or smartctl -a /dev/<device filename> to check
After you are certain which sd* is your specific device file, use the command dd if=<iso image name> of=/dev/<device filename>, which will turn that media into the specified .iso, which you can then boot from on your machine
28 March, 2018
We have decided to abandon the idea of using a VDE switch given that the specifics are too complicated for our purposes. We want to move on to learning more about how HPC software works so that we can better grasp the overall purpose of these systems.
2 April, 2018
Sam found a website called Cognative Class that not only provides a free online class on how to use Hadoop, but also offers a trial for a small cluster on IBM's cloud to be used for practicing Hadoop commands and seeing how the software generally runs. You can sign up for Cognative Class here: https://cognitiveclass.ai/
11 April, 2018
We have decided to return to the idea of setting up a cluster with three nodes. Our process is documented in the next section.
Cluster Setup
This is a tutorial for a small cluster setup using an Scientific Linux 7 image.
Introduction
Our cluster comprises of three nodes, one of which is considered the head node. A head node within a cluster is comparable to the brain of an organism, it plans and decides how other parts of the body will work. Our head node is bare metal, meaning it is not virtual. Using bare metal systems is much simpler and is advised if you have the means to do so. In most cases, this is not an option, as in our situation, the two other compute nodes are virtual machines.
Operating System:
As stated above, it was part of the guidelines for this cluster that our machines use an SL7 image as used on the real HEP cluster (even thought the current incarnation uses SL6). Our head node and compute nodes both use an SL7.3 image available here: http://ftp.scientificlinux.org/linux/scientific/7.3/x86_64/iso/
I would suggest using either the "DVD" (6.9 G) or "Everything" (7.8 G) .iso files, in order to avoid using a live DVD (meaning you have to download from a repositiory after intial setup).
Our compute nodes have hosts that both run Windows 10.
Head Node:
For initial head node setup, I copied the .iso to a USB, which was outlined on 27 March, 2018.
Turn off the machine if it is not already powered down
Insert the USB or whatever media you are using into the machine
Turn the machine back on and begin to press the F12 button on your keyboard until an options menu appears
Choose the media your image is stored on and then press enter to get setup for that image started
During initial image setup, you should be able to choose your language, setup automatic partitioning, add a root password, and create users. Setup was normal for this machine
You should have SL7 now working on this machine
Compute Nodes:
For both compute nodes, I used a VirtualBox VM, in which setup was outlined on 22 February, 2018.
After successfully creating two Virtual Machines with the same image as the head node, you are ready to start networking so that your nodes are able to communicate with each other.
Networking
Networking was by far the most difficult part of this setup, being that Sam and I had no previous experience working with computer networks. If you follow what we did, however, it will definitely be an easier process for you.
To begin, clusters like the one we built are typically called Beowulf clusters. This is due to the fact that we are using off-the-shelf hardware for both the machines as well as the networking hardware. More specifically, we are using a regular ethernet switch for communication between the nodes. "Real" clusters are more prone to using hardware called infiniband for communication connections, however we did not have access to this very low-latency hardware. In any case, your cluster must use at least ethernet to communicate, so make sure you have access to a network as well as a network switch and some ethernet cables.
Head Node:
The great thing about having a bare metal system is the fact that interfaces are relatively simple for ethernet connections. Let's also assume that there is already another machine on the network you are going to be using, as was the case for us. You should note that on that other machine, let's call it machine X, there is an ethernet cable plugged into the back of the machine. You should take another ethernet cable, plug it into the back of the head node, and then plug the other side of the cable into the switch, which should be found at the other end of machine X's ethernet cable. After you have done this, there is some information you need for setting up the ethernet switch on the head node.
This information includes:
A new IP address that is different from other IP addresses on the network
The subnet mask of the network
The gateway of the network
A domain IP for your network
For the last three bullets, you should be able to find this information in the network settings of machine X. The subnet mask should look something like 255.255.0.0 or 255.255.255.0, and tells you what portion of the IP address is used to signify the network, and what portion of the IP address is used to signify a host machine. The gateway and domain should also be easy enough to find, and just help the system with routing and such. No for deciding on a new IP address, you have to take into account the subnet mask. For example, if machine Y is given a static IP 1.2.3.5, and the subnet mask is 255.255.255.0, you can use an IP address for the host machine like 1.2.3.6 or 1.2.3.55. Also make sure that your new IP address is different from others on the network. If you are still confused about IP addresses and masks, here is some material: https://support.microsoft.com/en-us/help/164015/understanding-tcp-ip-addressing-and-subnetting-basics
Once you have this information, go onto your head node and click on the top right of the desktop where the time is given to access your settings. There should be a path to your connection settings. You will notice that your IP address is probably set to automatic or DHCP (Dynamic Host Configuration Protocol), meaning that a new IP address is assigned to you from a pool each time you start up your ethernet connection (or after something called a lease time has run up). You want to use a static IP which will never change in order to simplify cluster setup. Therefore, you must choose manual, which will allow you to input the networking information you should have as listed above. You can then connect, and your bare metal head node should then be able to access the internet. You can check this by pinging google.com (ping 8.8.8.8) or checking that you do have an ethernet interface (usually called enp4sX) using either the ip addr or ifconfig commands.
Compute Nodes:
Setting up a network interface on the compute nodes is rather similar to the head node, however relatively different due to the compute nodes being virtual machines. How virtual machines usually connect to the outside world is through a virtual ethernet cable that connects the host machine (machine the virtual machine is running on top of) and the VM. Packets will come to and from the virtual machine from the host, making connection to the virtual machine a little more difficult.
Before we get into the technical details of how to assign a virtual machine a static IP on its host's network, let's talk about something called secure shell. Secure shell or SSH is a means by which a person on one computer can remotely access another computer. This service is extremely important when talking about clusters, being that our cluster specifically completely relies on SSH for communication between nodes. A typical way to SSH into another computer is through the command ssh <username>@<IP address>. However, this is only possible when you know the specific IP address of the remote host. The problem therefore is that a virtual machine is not usually set up with a static IP on the network, however this does not mean the VM cannot be remotely accessed using SSH. You can assign a port number to the virtual machine within the machines settings, and what this does is tell the host that it needs to listen on that specific port for traffic coming to the VM. Let's say that we assigned the VM port 3022 on the host, we would then be able to remotely access the VM machine from another machine using the command ssh -p 3022 <username>@<Host IP Address>. However, this is not good enough for our purposes. We want our machine to have a specific static IP on the network because trust me, ports are no fun to deal with.
In this case, we will use what is called a bridged network. What a bridged network does is assign the host machine two different IP addresses. On of those IP addresses signifies the destination as that host machine, but the other signifies the destination as the virtual machine. However, when the host gets a packet going to the IP assigned to the VM, it routes that packet through the virtual ethernet between the host and the virtual machine. This is called bridged networking and simplifies the process going forward.
No that we have gotten this out of the way, for each virtual machine, go the machine's settings and find the tab for interfaces or networking. For interface number one, choose bridged network. Now restart your VM and go into the SL7 networking settings. You can then follow the same steps for the head node regard the IP address and that other information, making sure to assign an IP address not already in use. After you have done this, try SSH-ing from the head node to one of the compute nodes using their static IP and vice versa.
Hadoop
For our hadoop setup, we mainly followed this tutorial: https://linode.com/docs/databases/hadoop/how-to-install-and-set-up-hadoop-cluster/, however I will be going through all the steps as I did some things differently.
Introduction:
As noted previously in this logbook, Hadoop is useful for reading and writing/backing up data on a cluster. We can start with the naming of different nodes on a hadoop system. Our head node is considered a namenode, meaning it makes decisions for the Hadoop Distributed File System (HDFS). This is also where most of the setup is involved. The compute nodes are called datanodes, being that they are used to both process and store data, but do not make overall decisions about the system.
Notes:
Hadoop setup is not always going to be the same process every time, make sure to keep working at it and read errors carefully if you run into problems
We used Hadoop v2.8.3 as it seemed to be the easiest to set up for our purposes
If you are not used to using vi text editor (or emacs/nano), try a short tutorial online to get used to editing files inside of the terminal
This tutorial assumes that java is already installed properly on all nodes
If you ever have a problem with permissions, feel free to sudo su or su root in order to bypass this, however use that power responsibly
Setup:
For each step, there will be a label for that step to be done on either the namenode, datanodes, or all nodes.
All Nodes: In order to simplify the process, we are going to create hadoop users on all three nodes and set the passwords to all be the same
adduser hadoop
passwd hadoop
All Nodes: We want these hadoop users to be at an administrator level
su root
sudo visudo
Append "hadoop ALL=(ALL) ALL" to the end of the file and then write
All Nodes: We want to be able to easily reference the static IPs of other nodes on the cluster. This is done by editing the /etc/hosts file.
vi /etc/hosts
For each node on the cluster, append a line "<node IP address> <what you want to name that node>"
We used the name "hadoop-master" for the namenode and "hadoop-slave-1"/"hadoop-slave-2" for the data nodes
Namenode: We are going to make an SSH key so that the nodes can communicate without using a password
ssh-keygen -b 4096
You should also set a passphrase when prompted, but remember it as you will need it later
ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub hadoop@hadoop-master
ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub hadoop@hadoop-slave-1
ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub hadoop@hadoop-slave-2
Namenode: Download the Hadoop binary from the website (we used v2.8.3, results may vary) and unpack it
cd ~
Download here: http://hadoop.apache.org/releases.html or use command wget http://apache.mindstudios.com/hadoop/common/hadoop-2.8.3/hadoop-2.8.3.tar.gz
tar -xvzf hadoop-2.8.3.tar.gz
mv hadoop-2.8.3 hadoop
Namenode: We want to edit the configuration .xml files to get the site working properly (add all properties within each configuration section)
vi core-site.xml
vi hdfs-site.xml
vi mapred-site.xml
vi yarn-site.xml
Namenode: Edit the slaves file to include the names of the two datanodes
vi /home/hadoop/etc/hadoop/slaves
Append "hadoop-slave-1" on one line and "hadoop-slave-2" on the next, then write
All Nodes: Edit the environment so that variables and the binary search path is correctly configured
cd /etc/profile.d
Use update-alternatives --display java to get the java installation path and then leave off the /bin/java
vi hadoopSetup.sh
source /etc/profile
cd /home/hadoop
Namenode: Copy the hadoop binary to the datanodes
scp hadoop-2.8.3.tar.gz hadoop-slave-1:/home/hadoop
scp hadoop-2.8.3.tar.gz hadoop-slave-2:/home/hadoop
Datanodes: Unpack the binary
tar -xvzf hadoop-2.8.3.tar.gz
mv hadoop-2.8.3 hadoop
Namenode: Copy the hadoop config files to the datanodes
scp /home/hadoop/hadoop/etc/hadoop/* hadoop-slave-1:/home/hadoop/hadoop/etc/hadoop/
scp /home/hadoop/hadoop/etc/hadoop/* hadoop-slave-2:/home/hadoop/hadoop/etc/hadoop/
All Nodes: Change the owner of the ~/hadoop directory
sudo chown -R hadoop /home/hadoop/hadoop/
Namenode: Start the filesystem and hadoop server
hdfs namenode -format
start-dfs.sh
Use the ssh passphrase you assigned earlier when asked
At this point, Hadoop should run properly on your cluster. You are able to check on the cluster from the command line using "hdfs dfsadmin -report".
You can stop the server using "stop-dfs.sh". A web user interface is available at "http://<namenode IP address>:50070".
NIS
Most of the work, if not all of the work on the cluster regarding NIS can be attributed to Sam, and therefore I will link his logbook in this section. Head down to his cluster setup section to read more: Sam Schoberg
Ganglia
For Ganglia setup, I pretty much followed this tutorial completely: https://www.tecmint.com/install-configure-ganglia-monitoring-centos-linux/
(These instructions use pictures from given tutorial site)
Introduction:
Ganglia is a rather useful tool for monitoring components of the cluster systems how much their resources are being used.
Notes:
I set up hadoop before ganglia, and therefore some potential networking and permission issues were nullified
If you make a mistake in the config files, make sure to restart the services after you fix the mistake
Setup:
Head node:
Update the OS and download the ganglia packages
yum update
yum install epel-release
yum install ganglia rrdtool ganglia-gmetad ganglia-gmond ganglia-web
Set up username and password for ganglia monitoring site
htpasswd -c /etc/httpd/auth.basic adminganglia
Edit configuration files
vi /etc/httpd/conf.d/ganglia.conf
vi /etc/ganglia/gmetad.conf (Add the following lines to the file)
gridname "<Location Name>"
data_source "<Cluster Name>" 60 <Head Node IP>:8649
data_source "<Cluster Name>" 60 <Slave 1 IP>
data_source "<Cluster Name>" 60 <Slave 2 IP>
vi /etc/ganglia/gmond.conf (Make sure that the following blocks look the same as the given images)
Change firewall settings to get site running
firewall-cmd --add-port=8649/udp
firewall-cmd --add-port=8649/udp --permanent
setsebool -P httpd_can_network_connect 1
Restart the services
systemctl restart httpd gmetad gmond
systemctl enable httpd gmetad gmond
Check that the site is working
Access http://<Head Node IP>/ganglia
Login with the username and password created in above steps
Data nodes:
Update the OS and download the ganglia package
yum update
yum install epel-release
yum install ganglia-gmond
Edit the configuration file
vi /etc/ganglia/gmond.conf (make sure to set specific multicast IP adress)
Restart the monitoring service
systemctl restart gmond
systemctl enable gmond
You should now have a working Ganglia monitoring site to see the performance of your nodes.
Conclusion
There exist many more pieces of software that may help with management of your cluster such as ClusterShell, Puppet, and Foreman. We would have loved to implement these services, but I am afraid there just was not enough time. Thank you for reading this logbook and good luck on whatever project you have in store. Cluster management may be a little difficult and confusing at the beginning, but stick to it and it will be rewarding.
You can contact me on Twitter @lqualkenbush99 if you have any questions about our setup.