FASTlib

Tutorial

The FASTlib core, the library you are about to download, is meant to provide the basic tools needed to start hands-on with machine learning. Its design goals are to have a low learning curve, allow distributed development, be as fast as possible, yet avoid most of the worst problems of developing in a low-level language. It's very new and researchy, so it is possible that there will be a fair number of bugs.

Right now, it looks like FASTlib's core has the following:

  • Reading and writing datasets (ARFF and CSV)
  • Automating all sorts of experiments and result gathering
  • Debugging tools
  • Matrices and vectors, with dense matrix operations thanks to BLAS and LAPACK
  • Miscellaneous collections classes and math routines
  • THOR, for parallelizing dual-tree algorithms (may eventually be moved separate from FASTlib core)

Contents

This Document

We have four main types of documentation that come with the FASTlib, for different purposes.

  • This Tutorial - you've never seen anything before, and you want to understand it. This is in-depth, covering higher-level concepts.
  • Lecture slides - This is the lecture from class that overviews some of the most useful parts of FASTlib for getting started coding. Get them here and the accompanying Kernel PCA example.
  • Cookbook - If you're working on the code and don't know how to do X, see the FASTlib Cookbook. This is very shallow over low-level details.
  • Doxygen - The most complete browsable documentation will always be the Doxygen -- it's a lot like Javadoc, and great if you know generally what you are looking for, or just want to grok how code is structured. For a version generated from this release of FASTlib, see the Doxygen and note the instructions on how to generate up-to-date documentation.

Getting Started

Before installation

Systems that we have tested:

  • Linux: 32-bit x86, 64-bit x86, and Itanium
  • Cygwin (x86 Windows)
  • MacOSX (you will need g77, see http://hpc.sourceforge.net/ and search for g77; let us know if there are any further problems)

Tools you'll need:

  • required: gcc C compiler, g++ C++ compiler, and g77 FORTRAN compiler
  • required: python 2.2 or later, check by typing python -V, earlier versions will NOT work
  • optional: doxygen (to generate local copies of documentation)

Installation

Download fastlib here

Let's now assume that FASTlib is located at $FASTLIBPATH. That is, if I extracted mlpack.tar.bz2 to /net/hc293/garryb/fastlib2, then whenever I see "$FASTLIBPATH" in this tutorial I will assume /net/hc293/garryb/fastlib2. Make sure this is an absolute path (it will start with a "/" at the beginning).

After this, you will need to make sure your environment variables are set up properly. Simply add $FASTLIBPATH/script into your $PATH environment variable. First find out what shell you are using by typing

echo $0

and it will be bash, ksh, csh, or tcsh. If you are using the bash or ksh shell, you should add this to your ~/.bashrc (for bash) or ~/.kshrc (for ksh), or ~/.profile if the other doesn't exist, substituting $FASTLIBPATH accordingly:

export PATH="$FASTLIBPATH/script:$PATH"

If you are using csh or tcsh, put the following in your ~/.cshrc file: setenv PATH "$FASTLIBPATH/script:""$PATH"

Close your terminal and re-open it. Check to see if FASTlib is working by typing:

fl-build  # that is lowercase "FL-BUILD"

It should give you the help message for FASTlib's build system. If it doesn't, type:

echo $PATH

and make sure the $FASTLIBPATH/script is there and is spelled exactly correctly. Try typing the export or setenv command by hand and see if it starts working afterwards. Consult your friend who knows Unix when necessary.

Now, change into your $FASTLIBPATH, and change into examples/platonic_allnn. This is where the example code is, and you can try building it by:

fl-build allnn_main

If this does not work, email rriegel [at] cc and tell what kind of system you are trying this on or anything else you think would be interesting (really old versions of python, etc). If it worked, there will be a ./allnn_main executable. Now, try:

./allnn_main --r=../../fastlib/data/test/fake.csv

This finds the (degenerate) nearest neighbors of points in fake.csv, which are the points themselves except when multiple points occur at the same location. You can see the results in output.txt.

Code Organization

FASTlib's core has a major C++ part, but smaller python parts with a few shell scripts. These are laid out in the following way:

  • fastlib/ wraps the rest of the library into one convenient include
    • base/ - compiler abstractions, debugging, and memory management
    • col/ - templated storage types (dynamic arrays, heaps, etc.)
    • data/ - dataset types and utilities
    • fx/ - FASTexec client library for managing parameters, timers and results, via the data store (actually, this is C so that you can use it in non-FASTlib code)
    • file/ - file reading, writing and tokenization
    • la/ - linear algebra routines (mostly a wrapper for BLAS/LAPACK)
    • math/ - a collection of math utilities, to be extended as needed
    • par/ - rudimentary parallelization utilities
    • sparse/ - sparse linear algebra routines (mostly a wrapper for Trilinos)
      • trilinos/ - includes needed for Trilinos
    • thor/ - templated, parallelized algorithm for GNPs
    • tree/ - utilities to build and manage kd-trees, among others
  • Community-built C++
    • contrib/ (user directory)
  • Other
    • script/ has scripts that will be in your $PATH variable, and a python mini-library used by the scripts
    • bin/ contains files generated by the build system - deleting this is equivalent to make clean
    • bin_keep/ contains other files generated by the build system but aren't meant to be cleaned, such as the linear algebra package

While we do all the tests we can to make sure changes doesn't break anything, we cannot guarantee against the occasional build break. Emails are highly appropriate in this case :-)

Building

Using the build tool

As stated earlier, building is done via the fl-build tool, which stands for FASTlib build. This tool reads through very short files which just have a list of sources (.cc files), headers (.h files), and other sub-packages it depends on, and produces a Makefile, which it runs automatically.

There are a handful of compilation modes supported by fl-build, specified by the --mode=mode parameter. Each mode has a use, and enables certain flags:

  • verbose: tracking the execution of a program when it's infeasible to step through manually. disables optimizations, enables printing of verbosity messages
  • debug: allow best use with gdb for tracking bugs. disables optimization, enables all debug checks, enables highest level of gdb.
  • check (default mode): regular development. enables code optimizations and leaves in debug checks, at a 25% or so penalty. gdb symbols are compiled in but may be inaccurate due to optimizations.
  • fast: timing runs, for the fairest timing comparisons. debug symbols still enabled but may be inaccurate.
  • unsafe: optimizations that might alter correctness, and often will slow the program down.
  • profile: speed profiling. compile with --mode=profile, run your program, and run gprof ./binaryfile | less to see what the bottleneck is.

(For the curious: The command line flags are listed in script/buildsys.py)

Thus, we could re-build the example, but turn off all debug checks:

fl-build allnn_main --mode=debug

To add custom flags, add --cflags. To get increased performance on Pentium 4 and define a macro called COAGULATE, you would use:

fl-build mybinary --mode=fast --cflags="-march=pentium4 -DCOAGULATE"

Writing build files

You probably want to skip this subsection until you start writing new code.

The fl-build tool looks in the current directory for a file called build.py. The build file is executed as straight Python code, with access to a few specific functions defined by the build system, that correspond to "meta build rules". In processing these, the build system recursively pulls build files from other directories and resolves the dependencies. A Makefile is then created in your current directory, which fl-build automatically runs for you.

Here is a sample build rule that will build an executable test compiled from test.cc and test.h, and link it against the fastlib front-end library:

binrule(
name = "test",
sources = ["test.cc"],
headers = ["test.h"],
deplibs = ["fastlib:fastlib"]
)

Each string you see is actually processed by the build system, which figures out what you are referring to. If there is no colon ":" character, it looks for a file in the same directory as the build file. If it finds a ":", such as "fastlib:fastlib", it looks in the directory "fastlib" for a rule named "fastlib" -- the left part of the colon is the directory, the right part is the rule name. For example, the full path to the rule for the main executable in u/example would be "u/example:main".

Next, suppose you want to create a small reusable library. Taking a look at the build file examples/platonic_allnn/build.py,

librule(
name = "allnn", # this line can be safely omitted
# (since this is examples/platonic_allnn, the build system
# will automatically name it "platonic_allnn" if you omit it -- most
# other sub-packages will omit the name)
#sources = ["helper.cc"], # Any .c or .cc files where library functions are defined. This line can be omitted if there are no .cc or .c files.
headers = ["allnn.h"], # include files part of the 'lib'
deplibs = ["fastlib:fastlib"] # depends on fastlib core
)

binrule(
name = "allnn_main", # the executable name
sources = ["allnn_main.cc"], # compile allnn_main.cc
headers = [], # no extra headers
deplibs = [":allnn"] # depends on allnn in this folder
)


Use of C++

FASTlib is C++, but only to an extent. If you are familiar with C, you will have no problem. The things we use from C++ is:

  • Templates, for pluggability (used very extensively in THOR)
  • Classes, for coupling data and operations over the data
  • Destructors, to avoid unnecessary clean-up code, and to better support templates

FASTlib mostly avoids inheritance, virtual functions, and operator overloading. It also notably avoids most constructors due to their pickiness about when they are called. Here's an example of what this looks like. In particular, we'll create a multiplication table:

Matrix mult_table;

mult_table.Init(10, 10); // make it 10 rows by 10 columns

for (index_t i = 0; i < 10; i++) {
for (index_t j = 0; j < 10; j++) {
mult_table.set(i, j, i * j);
}
}
// the matrix does not have to be freed

Some quick notes before we move on. The matrix must be initialized before it is usable, but keep in mind everything is freed by default. Caution -- if you declare a matrix and never initialize it, the program will crash at the end of the function. In debug mode, many FASTlib classes will let you know that this is happening.

Next, the index_t type is usually an regular int, but is wired through the system to become 64-bit if you tell it to.

Copying and Aliasing

C++ is infamous for its desire to make copies of everything. If you forget to pass a parameter as a constant reference (const Classname&), everything will be copied, but sometimes, there is just no way to get around of it. By avoiding constructors, we find copying is usually not needed. To avoid accidental copies of your class, put FORBID_COPY at the beginning like this:

class X {
FORBID_COPY(X);
...
}

Sometimes, though, you really need to make copies, or at least things like copies. Many classes support a Copy method to explicitly do so, through the Matrix and Vector.

The Vector and Matrix classes support a concept of aliasing. A vector can be a vector on its own, or it could also point to some column within a matrix; a matrix can also refer to a subset of another matrix's columns. The concept is simple, but there is one caveat: without garbage collection, somebody eventually has to free the memory. Each Vector and Matrix then knows whether it is the owner of the memory it points to, and if it is, it will free the data automatically; otherwise, it is only an alias.

An example is shown here:

   Matrix original;
original.Init(5, 5);
...
for (int j = 0; j < 3; ++) {
Matrix weak_alias;
weak_alias.Alias(original); // this matrix now aliases the original
weak_alias.set(j, j, 999.0); // this modifies the original matrix
// the destructor of weak_alias is called, but the memory is not freed
}
Matrix newowner;
newowner.Own(&original); // newowner
Matrix copy;
copy.Copy(original); // a completely new copy
original.Destruct(); // newowner is still valid
original.Init(99, 99);

Debugging

C++, and equally C, can sometimes make it easy to shoot yourself in the foot. To help you avoid this, we made debugging an important part of FASTlib.

The build system by default will enable all checks in the form of #ifdef DEBUG (we'll get back to this later). These checks and safeguards, such as bounds checking and memory poisoning, are present in nearly all core FASTlib classes. If you are debugging and find 0xDEADBEEF, 2146666666, or 0x7FF388AA, or NaN, it probably means you forgot to initialize something; if you go beyond the end of a Vector, it will let you know. In practice, these have a 20% or so performance penalty -- as a habit, we've found it's just fine to leave these on until you want to run timing experiments. In machine learning code, since you can sometimes generate seemingly good results from garbage data, these types of safeguards are almost essential.

To use debugging yourself, plaster your code with the following:

  • DEBUG_ASSERT_MSG(condition, format, ...) - If the condition is false, your program will print the formatted message to stderr and die.
  • DEBUG_GOT_HERE(3) - If verbosity is > 3 and verbose-mode is compiled on, it will print the function and line number.

You can safely leave these in at no cost in non debug mode. To set verbosity level to 3.0, you would specify the command line argument: --debug/verbosity_level=3.0

See the Doxygen for base/debug.h for more information.

FASTexec - Command-line parameters and experimentation

We felt it is important that machine learning researchers can run a lot of experiments without too much trouble. Parameter passing is integral to the experimentation process, so we unified these.

The core idea behind the system is that within one run of your program, a hierarchial data store is created. In some ways it is similar to a "Windows registry" except it only has a very brief existence. In fact, it more like an XML document tree, except without attributes, and has a "stateless" output format.

Basic command-line parameters

Let's look at a non-trivial example. We're measuring how different strided memory access patterns affect cache performance (something you probably won't care about, but is trivial to code):

#include "fastlib/fastlib.h"
int main(int argc, char *argv[]) {
fx_init(argc, argv); // initialize command-line parameters and the data store
int count = fx_param_int_req( // get a REQUIRED integer parameter
NULL, // directly from command line (not from a sub module)
"count"); // called "count", as in --count=num
int stride = fx_param_int(NULL, "stride", 1); // defaults to stride 1
double factor = fx_param_double(NULL, "factor", 1.0); // default value 1.0

ArrayList<double> values;
values.Init(count);

fx_timer_start(NULL, "strides"); // timers have names
// Let's see how stride affects cache performance
for (int s = 0; s < stride; s++) {
for (int j = s; j < count; j += stride) {
values[j] = factor * j;
}
}
fx_timer_stop(NULL, "strides");
fx_format_result(NULL, "success", "%d", 1); // store some result
fx_done();
}


The useful functions we used were fx_param_int (to get an integer), fx_param_double (to get a double-precision value), fx_timer_{start|stop} to have timers, and fx_done() to output the results. I ran this example with the parameters:

./test --count=10000 --factor=2.0

and the following text resulted:

/info/system/node/name watson.cc.gatech.edu
/info/system/arch/name x86_64
/info/system/kernel/name Linux
/info/system/kernel/release 2.6.9-55.0.6.ELsmp
/info/system/kernel/build %231%20SMP%20Thu%20Aug%2023%2011%3A13%3A21%20EDT%202007
/info/rusage/self/WARNING your_OS_might_not_support_all_of_these
/info/rusage/self/utime 0.000999
/info/rusage/self/stime 0.001999
/info/rusage/self/minflt 418
/info/rusage/self/majflt 0
/info/rusage/self/maxrss 0
/info/rusage/self/ixrss 0
/info/rusage/self/idrss 0
/info/rusage/self/isrss 0
/info/rusage/self/nswap 0
/info/rusage/self/inblock 0
/info/rusage/self/oublock 0
/info/rusage/self/msgsnd 0
/info/rusage/self/msgrcv 0
/info/rusage/self/nsignals 0
/info/rusage/self/nvcsw 1
/info/rusage/self/nivcsw 0
/info/rusage/children/WARNING your_OS_might_not_support_all_of_these
/info/rusage/children/utime 0.000000
/info/rusage/children/stime 0.000000
/info/rusage/children/minflt 0
/info/rusage/children/majflt 0
/info/rusage/children/maxrss 0
/info/rusage/children/ixrss 0
/info/rusage/children/idrss 0
/info/rusage/children/isrss 0
/info/rusage/children/nswap 0
/info/rusage/children/inblock 0
/info/rusage/children/oublock 0
/info/rusage/children/msgsnd 0
/info/rusage/children/msgrcv 0
/info/rusage/children/nsignals 0
/info/rusage/children/nvcsw 0
/info/rusage/children/nivcsw 0
/params/count 10000
/params/factor 2.0
/params/debug/print_warnings 1
/params/debug/abort_on_nonfatal 0
/params/debug/pause_on_nonfatal 0
/params/debug/print_notify_locs 0
/params/debug/noisy 0
/params/fx/timing 0
/params/stride 1
/timers/default/wall/sec 0.000240
/timers/default/user 0.000000
/timers/default/sys 0.000000
/timers/strides/wall/sec 0.000034
/timers/strides/user 0.000000
/timers/strides/sys 0.000000
/results/success 1

We admit this is a lot of text, but when you later comb through the results, you can select only the portions you care about. Briefly, each portion of the tree:

  • params/ - Parameters you were called with. Default parameters are explicitly stored in case you later change what the default values are.
  • timers/ - Each timer. There will always be a default timer, which runs between fx_init and fx_done.
  • results/ - Results that you decided to emit.
  • info/rusage/ - System information related to rusage. See the manpage for getrusage(2).
  • info/system/ - Information on the system run on. See manpage for uname(2).

This output could indeed just be an s-expression, or it could be an attribute-less XML file. We chose this path-value dump format because it is stateless -- i.e. anyone can process it with a little bit of grep. Corruption in the file will only affect it until the next newline.

Running automated experiments and collecting results

Running experiments and collecting results is made simpler using FASTexec. After using the FASTexec code to store output variables in the datastore, you can use FASTexec to run multiple experiments.

If you type the command:

fx-run | less

you will see the help screen for fx-run. This program will run your executable with all combinations of the parameters you give it, and save results in a directory called fx that is created in the same directory you run the tests. After that, gather results using fx-csv to make an Excel-compatible comma-separated-values file, or fx-latex to create a properly escaped LaTeX table.

You can try an example with the all nearest neighbors classifier example in u/example. We just consider with time statistics since the output doesn't vary per leaf size of the trees used:

  cd $FASTLIBPATH/examples/platonic_allnn && fl-build allnn_main
fx-run allnn_k ./allnn_main --r=../../fastlib/data/test/fake.csv --allnn/leaf_size=10,15,20,25,30
fx-csv allnn_k ./alnn_main /params/allnn/leaf_size /timers/default/wall/sec
fx-latex allnn_k ./alnn_main /params/allnn/leaf_size /timers/default/wall/sec --preview

Little gotchas

Success and failure

The success_t type in base/common.h defines our standard for indicating success or failure. Rather than assuming 1 or 0 or -1 indicates something or other, we explicitly return SUCCESS_PASS (succeeded), SUCCESS_FAIL (failed), or SUCCESS_WARN (something was suboptimal).

Basic types

You will notice heavy use of index_t (in base/scale.h) rather than integers. For practical purposes, index_t is a signed integer -- signed so you can loop over >= 0. On 32-bit and 64-bit Intel/AMD machines, this will be 32-bits, which is the fastest int for both systems and is relatively compact. But if you want to operate on datasets larger than a few gigabytes, you can define the SCALE_LARGE macro, and these will instantly switch to the largest size your computer can address.

Also, the base/basic_types.h file (it will be in bin/ since it is auto-generated) defines standard int16, int32, int64, uint16, uint32, uint64 types.

Printf versus Streams

We operate under the assumption that more of our audience is familiar with C I/O than with C++ I/O, especially when it comes to fancy floating-point formatting.

The caveat is that printf only works with native types like int, short, long. To print an index_t, you use:

printf("Iteration %"LI"d complete.", some_index_var);   // no special formatting
printf("Iteration %04"LI"d complete.", some_index_var); // with formatting

The LI macro is a string that will have the suitable "l" modifier for index_t (see man sprintf for more details). For other types:

  • int16 and uint16: L16
  • int32 and uint32: L32
  • int64 and uint64: L64

Alternately, you can just cast the variable to a native type like (int) (short) or (long), if you want to be lazy.

Coding style

In short, the funny usage of capitalization and underscores does have a method to the madness. Please see Coding Style Page for a full description. Some pocket-book examples:

  • variables are lower_with_underscore_and_descriptive_when_possible
  • namespaces are lowercase
  • classes are InitialCamelCase
  • member methods which perform an actual interesting action are InitialCamelCase
  • member methods which are just getters or setters are short_and_lowercase
  • private members variables are lower_with_trailing_underscore_
  • sub-packages intended for incorporation into non-fastlib C code look more like traditional C

After a while it will make sense. The inconsistency in member methods looks very odd at first, but you must admit it kind of helps you mentally understand this very non-Demeterish example:

warehouse.carboard_box().contents().bottom().AttachTape(
samsclub.utilities_department().shelf(3).item("packaging tape").Purchase());


Please adhere to this when making FASTlib code.

Subpages (2): Coding Style Cookbook

Attachments (2)

  • fastlib_tutorial.pdf - on Oct 14, 2008 10:38 PM by Parikshit Ram (version 1)
    130k View Download
  • kernel_matrix_algs.tar.bz2 - on Oct 14, 2008 10:38 PM by Parikshit Ram (version 1)
    3k Download

FASTlib group

For reporting bugs or asking questions or having discussions, go to this group:
Google Groups
FASTlib_forum
Visit this group

Documentation

For the current documentation of the whole library follow this link
Doxygen

Recent site activity