Install Cassandra 0.7

posted Jan 17, 2011, 3:13 PM by Sameer Farooqui   [ updated Jan 28, 2011, 11:29 AM ]
This post guides you through some preliminary work required before installing Cassandra as well as the actual Cassandra installation.

Preliminary steps: Download Cassandra, log directories, JNA & MX4J

Download & Extract Cassandra
Log into the Ubuntu VM and go to the home directory: cd ~

Download Cassandra 0.7: wget http://apache.mirrors.pair.com//cassandra/0.7.0/apache-cassandra-0.7.0-bin.tar.gz

Extract Cassandra: tar -xzf apache-cassandra-0.7.0-bin.tar.gz

A new folder named "apache-cassandra-0.7.0" should now be in your home directory.

Here is a quick explanation of what the extracted Cassandra subdirectories contain:

bin: Cassandra executables, nodetool (utility to see if cluster is properly configured and to maintain cluster), command-line client (CLI)

conf: files for configuring Cassandra like the storage-conf.xml file, authentication settings and log4j (logging levels)

interface: has cassandra.thrift (RPC client API) and Avro

javadoc: documentation website made with JavaDoc (it's basically the comments stored in the code by developers)

lib: external libraries. Also has Thrift and Avro RPC libraries.


Log Directories
First create two new directories for Cassandra.

The -p option in mkdir creates any missing intermediate pathname components.

The chown command changes the owner of the file or directory specified by the File or Directory parameter to the user specified by the Owner parameter. If you specify the -R flag, the chown command recursively descends the specified directories.

(note replace techlabs below with your username)

sudo mkdir -p /var/log/cassandra
sudo chown -R techlabs /var/log/cassandra
sudo mkdir -p /var/lib/cassandra
sudo chown -R techlabs /var/lib/cassandra



JNA
JNA provides Java programs easy access to native shared libraries without writing anything but Java code.

Note from Cassandra developers for why JNA is needed:
"Linux aggressively swaps out infrequently used memory to make more room for its file system buffer cache. Unfortunately, modern generational garbage collectors like the JVM's leave parts of its heap un-touched for relatively large amounts of time, leading Linux to swap it out. When the JVM finally goes to use or GC that memory, swap hell ensues.

Setting swappiness to zero can mitigate this behavior but does not eliminate it entirely. Turning off swap entirely is effective. But to avoid surprising people who don't know about this behavior, the best solution is to tell Linux not to swap out the JVM, and that is what we do now with mlockall via JNA.

Because of licensing issues, we can't distribute JNA with Cassandra, so you must manually add it to the Cassandra lib/ directory or otherwise place it on the classpath. If the JNA jar is not present, Cassandra will continue as before.
"

Get JNA with:
cd ~
wget http://debian.riptano.com/debian/pool/libjna-java_3.2.7-0~nmu.2_amd64.deb


To install:
techlabs@cassandraN1:~$ sudo dpkg -i libjna-java_3.2.7-0~nmu.2_amd64.deb
(Reading database ... 44334 files and directories currently installed.)
Preparing to replace libjna-java 3.2.4-2 (using libjna-java_3.2.7-0~nmu.2_amd64.deb) ...
Unpacking replacement libjna-java ...
Setting up libjna-java (3.2.7-0~nmu.2) ...



The deb package will install the JNA jar file to /usr/share/java/jna.jar, but Cassandra only loads it if its in the class path. The easy way to do this is just create a symlink into your Cassandra lib directory (note: replace /home/techlabs with your home dir location):
ln -s /usr/share/java/jna.jar /home/techlabs/apache-cassandra-0.7.0/lib

Research:
http://journal.paul.querna.org/articles/2010/11/11/enabling-jna-in-cassandra/


MX4J
MX4J is used for monitoring in Cassandra. MX4J is a project to build an Open Source implementation of the Java(TM) Management Extensions (JMX).

Download the .tar.gz from and move it to your home dir:
http://sourceforge.net/projects/mx4j/files/MX4J%20Binary/3.0.2/

Note, you can download the file to your Windows machine and use WinSCP to transfer it to your home dir in Ubuntu.

MX4J does not need to be installed. It simply provides libraries in form of jars that can be used to develop JMX applications.

Extract MX4J: tar -xzf mx4j-3.0.2.tar.gz

Using WinSCP, copy the mx4j-tools.jar file from /home/techlabs/mx4j-3.0.2/lib to /home/techlabs/apache-cassandra-0.7.0/lib. Note, replace /home/techlabs with your home dir location.

In the next section, when you start Cassandra, you should see the following message in the startup log:

INFO 21:47:35,120 mx4j successfuly loaded
HttpAtapter started on port 8081


More info on MX4J: http://wiki.apache.org/cassandra/Operations#Monitoring_with_MX4J



Start Cassandra

Before you first start Cassandra, it is a good idea to reboot so that Ubuntu's memory will be cleared. After installing JNA & MX4J, about 250 MB of RAM will be tied up which can cause problems with Cassandra (if the VM only has a total of 512 MB). You can check memory consuption with the cmd: free -m

After a fresh reboot, only about 93 MB of memory should be used, leaving about 400 MB free for Cassandra to launch (assuming 512 MB total available to VM).

Navigate to the Cassandra folder:
techlabs@cassandraN1:~$ cd apache-cassandra-0.7.0/
techlabs@cassandraN1:~/apache-cassandra-0.7.0$ pwd
/home/techlabs/apache-cassandra-0.7.0



Note, although Cassandra can be configured to start without root permissions, we will start it as root for now. If it is not started with sudo, you sill see the following warning:

WARN 16:26:59,104 Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out, especially with mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root.



In order to avoid the above warning, start Cassandra as follows:

techlabs@cassandraN1:~/apache-cassandra-0.7.0$ sudo bin/cassandra -f
[sudo] password for techlabs:
 INFO 16:29:40,735 Heap size: 255787008/255787008
 INFO 16:29:46,425 JNA mlockall successful
 INFO 16:29:46,544 Loading settings from file:/home/techlabs/apache-cassandra-0.7.0/conf/cassandra.yaml
 INFO 16:29:46,836 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
 INFO 16:29:47,181 Creating new commitlog segment /var/lib/cassandra/commitlog/CommitLog-1295310587181.log
 INFO 16:29:47,739 Opening /var/lib/cassandra/data/system/LocationInfo-e-2
 INFO 16:29:47,792 Opening /var/lib/cassandra/data/system/LocationInfo-e-1
 INFO 16:29:47,963 Couldn't detect any schema definitions in local storage.
 INFO 16:29:47,964 Found table data in data directories. Consider using JMX to call org.apache.cassandra.service.StorageService.loadSchemaFromYaml().
 INFO 16:29:47,979 Replaying /var/lib/cassandra/commitlog/CommitLog-1295310419506.log
 INFO 16:29:47,981 Finished reading /var/lib/cassandra/commitlog/CommitLog-1295310419506.log
 INFO 16:29:47,981 Log replay complete
 INFO 16:29:48,018 Cassandra version: 0.7.0
 INFO 16:29:48,018 Thrift API version: 19.4.0
 INFO 16:29:48,034 Loading persisted ring state
 INFO 16:29:48,041 Starting up server gossip
 INFO 16:29:48,066 switching in a fresh Memtable for LocationInfo at CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1295310587181.log', position=148)
 INFO 16:29:48,073 Enqueuing flush of Memtable-LocationInfo@2133251039(29 bytes, 1 operations)
 INFO 16:29:48,074 Writing Memtable-LocationInfo@2133251039(29 bytes, 1 operations)
 INFO 16:29:48,121 Completed flushing /var/lib/cassandra/data/system/LocationInfo-e-3-Data.db (149 bytes)
 INFO 16:29:48,402 Using saved token 19850580716313504697781756261255509748
 INFO 16:29:48,404 switching in a fresh Memtable for LocationInfo at CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1295310587181.log', position=444)
 INFO 16:29:48,405 Enqueuing flush of Memtable-LocationInfo@27134372(53 bytes, 2 operations)
 INFO 16:29:48,406 Writing Memtable-LocationInfo@27134372(53 bytes, 2 operations)
 INFO 16:29:48,496 Completed flushing /var/lib/cassandra/data/system/LocationInfo-e-4-Data.db (301 bytes)
 INFO 16:29:48,499 Compacting [org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-e-1-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-e-2-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-e-3-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-e-4-Data.db')]
 INFO 16:29:48,718 Compacted to /var/lib/cassandra/data/system/LocationInfo-tmp-e-5-Data.db.  1,224 to 654 (~53% of original) bytes for 3 keys.  Time: 217ms.
 INFO 16:29:48,769 mx4j successfuly loaded
HttpAdaptor version 3.0.2 started on port 8081
 INFO 16:29:48,818 Binding thrift service to localhost/127.0.0.1:9160
 INFO 16:29:48,823 Using TFramedTransport with a max frame size of 15728640 bytes.
 INFO 16:29:48,831 Listening for thrift clients...



Make sure there are no errors or warnings in the startup log at this time.


Put Cassandra in background

We started Cassandra in the foreground with -f so that we can see any output from the startup.

Next, suspend the running foreground job with (this doesn't terminate, it just pauses): Ctrl+z

Run the jobs command to verify that Cassandra indeed has stopped (notice job #s on left):
techlabs@cassandraN1:~/apache-cassandra-0.7.0$ jobs
[1]+  Stopped                 sudo bin/cassandra -f



Then restart the suspended Cassandra job in background:

techlabs@cassandraN1:~/apache-cassandra-0.7.0$ bg %1
[1]+ sudo bin/cassandra -f &

techlabs@cassandraN1:~/apache-cassandra-0.7.0$ jobs
[1]+  Running                 sudo bin/cassandra -f &



If needed, bring the background job to foreground: fg %1

If needed, kill a job with: kill %job# (example: kill %2)

Note: If you didn't start Cassandra from the same terminal that you attempt to kill it from, you can use the method below to terminate Cassandra.

techlabs@cassandraN1:~/apache-cassandra-0.7.0$ sudo netstat -anp | grep 8080
tcp        0      0 0.0.0.0:8080            0.0.0.0:*               LISTEN      5445/java
techlabs@cassandraN1:~/apache-cassandra-0.7.0$ sudo kill 5445

You can check that Cassandra has stopped by running the netstat command again. You should get no output.

Ctrl+c kills a foreground job (send a SIGQUIT signal to the process).


You now have a running one node cluster of Cassandra!



Comments