Virtualization

Virtualization has been around for quite a few years. IBM did it on their mainframe. With advances in chip technology it's become feasible to run virtual machines on PC's. One of the first players to provide virtualization was VMWare. They built a user space application that provided a GUI to run on windows systems to create, manage and run a virtual system. Early versions of VMWare were provided free without support, and only ran on windows systems. I used VMWare to host early versions of Linux slackware on my pc and laptop. Other players jumped in the market and today we have a broad collection of virtualization products to choose from.

As I do a majority of my work on Linux, I needed a virtualization package that was supported on Linux. Qemu is an excellent virtualization package. Like VMWare, it runs as a user space application. What sets Qemu apart is that it is more than a virtualization application. It's an instruction translator. This means that Qemu is host not just operating systems written for the i386 architecture, but also PPC (Power PC), Sparc, and others. The developer of Qemu also built an excellent IO peripheral virtualization package. Qemu, like VMware, provides full-virtualization. The guest OS is completely isolated from the host OS. Full virtualization is great in that you can boot and install the guest OS from cdrom, or cdrom ISO images, as well as boot and install from the network using PXE. The downside to full virtualization is performance. When the Linux kernel was at 2.4, the clock rates, what's called "ticks" was set to 100. Ticks were used by the scheduler to interrupt program flow and provide for service of asynchronous events. The value of 100 meant 100 interrupts per second. The interrupt rate is important at least for maintaining system clock synchronization. When the Linux kernel 2.6 came out, the ticks were increased to 1000. As a result, a guest running a 2.6 kernel would generate interrupts so quickly that the host OS was unable to keep up with the rate. The end result would be clocks drifting out of sync. Redhat as attempted to "hack" a solution to this by adding a clock divider option at kernel boot time. I consider this to be a "band-aid" to the problem, and not a solid fix. The ultimate fix would be faster hardware. Or another solution is para-virtualization.

With para-virtualization, the guest OS is modified to use hooks into the host OS for priviledged calls. This means the guest OS must be similiar to the host OS, and has had it's kernel and libraries modified. The end result is near-native performance for the guest OS, and thus, no clock drifting. There are tradeoffs with para-virtualization. For one, you don't use a boot loader to start the guest OS. Instead, the host OS must be configured with the kernel and initrd files to boot on behalf of the guest OS. The virtualization package called XEN provides support for para-virtualization. If you host system has the newer CPU chips that support VT (virtual technology), then XEN will also support full virtualization. Up until recently, you needed full virtualization in order to run windows as a guest OS.

As my environment is mainly Linux, it makes sense for me to use para-virtualization. It's possible for my host OS to be running OpenSuSE, and still support a guest OS that is running Debian, or RedHat. As long I have access to the guest OS kernel and initrd files, AND, the guest kernel was compiled with XEN support, I can run para-virtualized.

The XEN folks did some pretty smart things when the designed it. For one, the used an existing library for virtualizing the peripherals. The Qemu virtual IO package was included. This means if I have a Qemu virtual disk, I can use XEN to boot it. The VMWare folks didn't do this, and their virtual disk structures are proprietory. In later years, VMWare added perl tools to access the contents of the guest virtual disk, but they're clumsy. With XEN, I can use a loopback mount to access the guest virtual disk. Of course, you don't want the host OS to be mounting the guest virtual disk while the guest is running. But this opens up some great options when I need to clone a virtual guest.

Linus Torvalds has been involved with the virtualization wave on Linux. In his later kernel releases, he included the KVM (Kernel Virtual Machine). The KVM requires a CPU with VT capabilities. He's also using the existing Qemu peripheral support. If a shop is looking into using virtualization, it would be wise to go with XEN as this will progress to the use of KVM, and you'll be able to retain your investment of the guest virtual machines. The XEN package has been opensource, and is included for free in major distributions, such as SuSE (Novell) and RedHat. The company that created XEN was purchased by Citrix, so there's anybody's guess where XEN will go. Both SuSE and RedHat will provide support for Linux guests under XEN. I'm not aware of a limitation of XEN clients on SuSE, but RedHat will allow up to 4 RHEL guests without additional license fees. If you're going to use VMWare, prepare to cough up bucks for support of those guests.

The installation of the virtual guest will vary based on what virtualization package you choose. If you're using XEN on a SuSE or RedHat platform, both vendors have included, for free, both GUI and command line tools to install guests, in paravirtual or full-virtual mode. If you're looking for performance, you'll choose para-virtual mode, and both vendors make the installation of the guest, as long as it's the same OS distribution, trivial. For example, I can create a SuSE paravirtual guest using either their YAST2 administrative tool, or command line. During the virtual guest creation I can specify an auto-installation file (autoyast), and make the client install completely lights-out. RedHat offers the same.

What I think is a significant feature of using XEN in para-virtual mode is the accessiblity of the guest virtual disks. I can use the host OS to create a virtual disk on a loopback device. For example:

# mkdir -p /opt/xen/images

# cd /opt/xen/images

# mkdir deb01

# cd deb01

# dd if=/dev/zero of=./root.img bs=4096 count=512K

# mke2fs -jF ./root.img

# mkdir ./mnt

# mount -o loop ./root.img ./mnt

This sequence creates my XEN images directory, a good practice, and creates a subdirectory to represent a debian client I'm creating, deb01. The dd line will create a file, ./root.img, that is 2GB in size, full of zeros. I then put an ext3 filessystem on it with the mke2fs command. I create a mount point, ./mnt, and loopback mount the ./root.img to ./mnt. Now my ./mnt directory can be populated with my Linux OS files. I can do the same thing to create a swap area for the guest.

# dd if=/dev/zero of=./swap.img bs=4096 count=512K

# mkswap ./swap.img

Once you've populated your root.img with all the necessary Linux files, then you create your xen configuration file. I put mine in the /opt/xen/imags/deb01 directory and called it deb01. Here's my deb01 xen configuration file:

memory = "128"

name = "deb01"

vif = ['mac=00:16:3E:17:35:50']

disk = ['file:/opt/xen/images/deb01/root.img,sda1,w', 'file:/opt/xen/images/deb01/swap.img,sdb1,w' ]

root = "/dev/sda1 ro"

kernel="/opt/xen/images/deb01/vmlinuz-2.6.18-6-xen-686"

ramdisk="/opt/xen/images/deb01/initrd.img-2.6.18-6-xen-686"

This configuration file declares the client to have 128MB of ram, the client name, "deb01", the vif (virtual interface controller - AKA, NIC) with a mac address of 00:16:3E:17:35:50 (NOTE: all XEN vif's must begin with 00:16:3E), 2 disks, ./root.img which the guest sees as /dev/sda1, and ./swap.img, which the guest sees as /dev/sdb1. The vif mac address must be unique for each guest. Here's a script that will generate a random mac address for a Xen vif.

#! /bin/bash

python -c 'import random; r=random.randint; print "00:16:3E:%02X:%02X:%02X" % (r(0, 0x7f), r(0, 0xff), r(0, 0xff))'

The root line declares what virtual disk holds root, and the kernel line tells the host OS Xen where the guest kernel is and the ramdisk line tells Xen where the guest initram is. I can then start the virtual guest with:

# xm create -c deb01

This tells XEN to start the guest in console mode (-c), which connects the /dev/console device (of the guest) to your current terminal. This console mode looks alot like a telnet session from the host OS to the guest OS, so you close it out just like you close out a telnet session, with ^]. Notice that the guest kernel and initrd must be accessible to the host OS, and that both the kernel and ramdisk were built for xen support.

I've created a script to automate the creation of my debian Xen guests. I have a master root.img file that gets copied to the guest directory, then my script will loopback mount the root.img and set the hostname and ip-address for that guest. My script takes about 2 minutes to build a new debian guest.

Resizeing a Guest Virtual Disk

One of the key reasons I like Xen so much is it's ability to use the underlying OS filesystem, without imposing any proprietory formatting of virtual disks. I generally will just create a large file, format it with a ext3 filesystem, then just copy my root filesystem onto it. This has the advantage of making the virtual disk easily accessible from the host OS. This being the case, it's extremely easy for me to increase the size of a guest virtual guest.

    1. Ensure the guest VM is down.

    2. Create a file of zero's, to be appended to the existing virtual disk image, I use something like:

      1. dd if=/dev/zero of=./more_space bs=4096 count=512k

    3. Append the new file (more_space) to the existing virtual disk image, for example:

      1. cat ./more_space >> ./root.img

    1. Now repair the existing root.img

      1. e2fsck -f ./root.img

    1. Now resize the existing root.img

      1. resize2fs -f ./root.img

    1. And I just like to run a final fsck to ensure everything is clean.

      1. e2fsck -f ./root.img

And there you have it. I created the "more_space" file, with a size of 2GB, and appended that to the existing root.img. The fsck is required prior to runing the resize2fs command. I've created a script that will perform this automatically, so now I can resize all my guest VM's virtual disks in batch.