Written by: Jarrett Egertson on 12/15/2010 jegertso <a .t> u.washington.edu
GoalThe goal of this tutorial is to demonstrate how to incorporate ProteoWizard into a Linux project and use it to read and write data. It is aimed at Linux users with moderate programming expertise. Please feel free to add comments below or e-mail me at jegertso <a .t> u.washington.edu
The Final Product
A Linux binary called pwiz_example that reads a ProteoWizard-supported file format, outputs the first 5 peaks (m/z and intensity) data from the first two spectra in the file, and then outputs the file as mzML.
Delivered to the end-user:
The final product delivered to the user will be a .zip archive. For the user to use the software, they would download the .zip archive, uncompress it, and run one command (sh quickbuild.sh) to build the binary.
How you (the developer) will make this happen:
We will use the boost-build system to build our project. The project will contain the ProteoWizard source code in a compressed .tar.gz archive. When the project is built, the source folder will be decompressed and ProteoWizard code will be built directly into the binary. There will be a section at the end of this tutorial for advanced users who wish to reference the ProteoWizard SVN repository directly from an SVN project. By doing this, the ProteoWizard code is automatically updated on SVN check out. (edit: I haven't written this section yet)
There are two ways to use this tutorial:
One is to create the project manually, from scratch following steps 0-6, you will be creating all directories manually, downloading ProteoWizard source code and libraries manually, adding build scripts, and project source code manually. If you'd like to take the easy way out...
The other way is to follow step 0 of the tutorial, download the complete project here:
unzip the file, and run sh quickbuild.sh in the pwiz_project directory. This should build a binary which will be in pwiz_project/bin
Step 0: Make sure you have g++ installed
g++ is the GNU project C and C++ compiler
A lot of Linux distributions come with g++ installed already, but just to be safe...
The easiest way to check this is to run the command:
If this command returns nothing, you will have to install g++. Most distributions come with a package manager that can facilitate this installation.
Step 1: Create the directory structure for the project
In this step, we'll create a directory hierarchy for our project, and download the ProteoWizard source code and libraries it depends on.
The main directory for our project will be called pwiz_project and it will have four subdirectories:
bin - Compiled binaries will be placed in this directory
project_src - The source code for our project
pwiz_libraries - A directory containing all of the libraries ProteoWizard depends on (ex. boost) as well as the source for boost-build itself
pwiz_src - A directory that will contain the ProteoWizard source code
pwiz_project itself will contain two important scripts:
quickbuild.sh - the user calls this to build the project (ex.
clean.sh - the user calls this to clean the project (ex.
Please make sure the full path to your project does not contain any spaces.
Create these directories. We'll deal with the bin and project_src directories later, for now, lets prepare the pwiz_libraries and pwiz_src directories.
We'll access the ProteoWizard svn repository to download the source for the ProteoWizard dependencies and populate the pwiz_libraries directory.
From the pwiz_project directory:
To populate pwiz_src we'll download a compressed .tar.gz archive containing the ProteoWizard source code. I like to put this archive in the pwiz_project directory, and then have boost-build extract it at build time. This saves space and, if the project is checked into a source repository, allows for much faster check outs.
The source archives can be found at:
Under the "Artifacts" column of the most recent successful build, click view. You'll be presented with a page containing a set of proteowizard subset source archives. Download the "without-ltv" archive. This stands for with out libraries, tests or vendor-support. Vendor-support won't work in Linux, if you want to try building this in Windows, get the "without-lt" archive.
At the time of writing this tutorial, the most recent source archive is
I downloaded the archive to my pwiz_project directory.
Step 2: Write the source code for your project
The project is a pretty simple one, so it will contain one source file called pwiz_example.cpp. The file will be located in the pwiz_project/project_src directory. The source file can be found in the attachments, with comments. The file can include files from the ProteoWizard source directory, for example, the first line:
Step 3: Write the boost-build scripts for your project
The boost-build scripts are written in the boost-build language, and are analogous to Makefiles in Linux, but are much more powerful. The syntax for these files is covered in the boost-build documentation. We'll write two .jam files.
One will be in pwiz_project/project_src called Jamfile.jam. This file allows the boost-build software to find the necessary libraries and header files for the pwiz_example binary, as well as defining build options such as linking type and threading. In general, a large project will have many Jamfile's, each directing boost-build on how to build a sub-component of the project. The ProteoWizard source code contains numerous Jamfiles.
The second will be in the pwiz_project directory and is called Jamroot.jam. Each boost-build project must have a Jamroot.jam file at the project root. This is the file read by boost-build's interpreter (bjam) to determine how to build the entire project. Our Jamroot.jam file serves two main purposes:
1) To extract the ProteoWizard source archive (or determine that it has already been extracted), as well as other source archives in pwiz_libraries
2) To direct boost-build to build and install our pwiz_example program to the bin directory.
Both of these files are attached at the end of this page separately, as well as being included in the compressed project archive also attached.
Step 4: Write a shell script for the user to build the project with
At this point, the project could be built if the user had the boost-build system installed on their computer. The interpreter is called bjam. However, the source code for bjam is included in the pwiz_project/pwiz_libraries directory. A script in pwiz_project, quickbuild.sh, will check to see if bjam has been compiled, and if not, build it. Then, it will use this copy of bjam to interpret our build scripts from Step 3, and build/install the project. Arguments passed to quickbuild.sh get passed on to bjam. For example, to build the project using 4 threads:
quickbuild.sh is attached below separately, and as part of the project archive.
Step 5: Write a shell script for the user to clean the project with
It is good practice to have a shell script that the user can use to clean the project. This script is called clean.sh. This script will be included in pwiz_project
Step 6: Build the project!
Everything is now in place, and the project is ready to be built. If you have not been creating the project manually, you can download the attached project archive, extract it, and run sh quickbuild.sh from the pwiz_project directory (make sure you're in pwiz_project, not pwiz_project/pwiz_src !). Then, try running the pwiz_example binary in pwiz_project/bin.
For the time being, I'm temporarily hosting the full project archive myself, since it exceeds google's attachment limitation. Download it at:
-Thank you to Godwin Yung for testing this tutorial and Matt Chambers for creating the pwiz-integration example, which this example is heavily based on