WaveletSAT (SciVis 2013 Code)


Introduction

WaveletSAT is a powerful tool for region histogram query. WaveletSAT is designed to compress integral histograms, which is presented by Fatih Porikli in IEEE CVPR 2005. For each grid point in a Cartesian grid, its integral histogram is the histogram of the region bound by this point and the origin. By storing the integral histograms for all grid points, the region histogram of arbitrary axis-aligned regions can be queried by combining the integral histogram of its corners. (Integral histograms can be considered as an extension of sum area tables, where the area sum is extended to histograms).

Integral histograms are widely used in computer vision and image processing. Nevertheless, extending integral histograms to large images, videos or 3D grids will be challenging because of the storage consumption (each grid point is associated with one histogram, which contains multiple bins).

That's why WaveletSAT is developed. This package provides the core algorithm of WaveletSAT with the following features

  • A GPU-based implementation.
  • An optimized implementation with small memory footprints (when the grid points are processed in scan-line-order).

Source


The latest code can be checked out from the host page https://github.com/recheliu/wavelet-sat on GitHub.


Usage

Define a customized encoder

To implement an WaveletSAT-based encoder, inherit the class WaveletSAT::CWaveletSATGPUEncoder<DT, ST, BT, WT> and override the method _MapValueToBins to map the value to histogram bins. One example is the sample SimpleND.h.

virtual
void
_MapValueToBins
(
    const vector<size_t>& vuPos,
    const DT& value,
    vector< pair<BT, ST> >& vpBins,
    void *_Reserved = NULL
)
{
    DT clampedValue = min(max(value, valueMin), valueMax);

    size_t uBin = (size_t)floorf((float)(uNrOfBins * (clampedValue - valueMin))/(float)(valueMax - valueMin));
    uBin = min(uBin, uNrOfBins - 1);
    vpBins.clear();
    vpBins.push_back(pair<BT, ST>((BT)uBin, (ST)1));
}

Use the encoder

On example to use the defined encoder is SimpleBD.h, which uses SimpleNDEncoder. The basic procedure is listed as follows:

  • Setup the data size and allocate it:
cSimpleND._Set(vuDimLengths, (WaveletSAT::typeBin)uNrOfBins);
cSimpleND._Allocate();
The following are not related to the kernel of WaveletSAT but are used to map the value to bins for SimpleND.h:

cSimpleND._SetHistogram(dValueMin, dValueMax);
cSimpleND._SetData(&vdData);
cSimpleND._SetInteger(CSimpleND<double>::WITH_CONTOUR_SPECTRUM, iIsUsingContourSpectrum);
  • Scan and pass the grid point value to the encoder:
for(size_t i = 0; i < uNrOfValues; i++)
{  
    // Setup the coordinate. Here is our example.
    vector<size_t> vuPos;
for(size_t
    d = 0, uCoord = i;
d < nin->dim;
uCoord /= (size_t)nin->axis[d].size, d++)

        vuPos.push_back(uCoord % (size_t)nin->axis[d].size);

cSimpleND._AddValue(vuPos, vdData[i]);
}
  • Finalize the encoding and save the result:
cSimpleND._Finalize();
cSimpleND
._SaveFile(szNcFilePathPrefix);

Query

Later the WaveletSAT coefficient can be loaded to the decoder CSATSepDWTDecoder. It is used in the sample application SimpleNDQuery. The basic procedure is listed as follows:
  • Load the file:
cSimpleNDFile._LoadFile(szNcFilePath);
  • Define two vectors as the region to query:
vector<size_t> vuLeft;
vector<size_t>
vuRight;
// Setup the region.
  • Query the histogram of the region:
vector<WaveletSAT::typeSum> vdH;
vdH.assign(uNrOfBins, (WaveletSAT::typeSum)0);
cSimpleNDFile._GetRegionSums(vuLeft, vuRight, vdH);
PS. In the library, another class SimpleNDFile.h, is inherited from the class CSATSepDWTDecoder. This is actually not needed, but it allows me to used a single interface to compare WaveletSAT and integral histograms.

Installation

Prerequisites

Configuration w/ CMake

After you check out the folder to a local folder <cmake_src_dir> (e.g. d:/gpu-dist).

  1. Open cmake-gui. Put the box "Where is the source code" as <cmake_src_dir> and specify where you want to put the make filesin the box "Where to build the binaries". (This folder is called <cmake_bin_dir> hereafter. I usually use as <cmake_src_dir>/build/<compiler>/<arch>. For instance, for Visual Studio 2010 and x64 architecture, I will use <cmake_src_dir>/build/vs2010/x64).
  2. Set up the install path in CMAKE_INSTALL_PREFIX. I will suggest the value as <cmake_src_dir>/install/vs2010/x64.
  3. Set LIB_DIR as the installation path of 3rd-party-lib-for-recheliu.
  4. Set MYLIB_DIR as the installation path of recheliu-mylib.
  5. Set NETCDF_DIR as the installation path of your NetCDF4.
  6. Set HDF5_DIR as the installation path of your HDF5.
  7. (RECOMMENDED) If the machine support CUDA, set WITH_CUDA as ON to use the parallel implementation.
  8. ''Configure'' and then ''Generate''.

Visual studio

  1. If 3rd-party-lib-for-recheliu and recheliu-mylib are built as static libraries, enter advanced mode and add "/NODEFAULTLIB:LIBCMT /NODEFAULTLIB:LIBCMTD" to CMAKE_EXE_LINKER_FLAGS. Don't forget to "configure'' and ''Generate.'
  2. Open <cmake_bin_dir>/GPUDist.sln to build the libraries.
  3. Build the project INSTALL to install the libraries to CMAKE_INSTALL_PREFIX.

gcc

Enter the folder <cmake_bin_dir> and then run make and make install to build it.


Reproduction of the result in SciVis '13

Please check out the version r117, which is the one for the SciVis '13 paper. If there is any issue to build it for Linux, try r130, which is similar to the original implemantion with minor changes.

with revision > 130


If the version is newer than r130 and you want to test the SciVis result, please follow the instructions here:

  1. When using CMake to configure the project, go to the Advanced mode and add "-DWITH_DYNAMIC_ARRAY_ALLOCATION=0" to both CMAKE_C_FLAGS and CMAKE_CXX_FLAGS.
  2. For the GPU-based implementation, for now, please change the source code CudaDWT.h by changing the following line:
typedef unsigned long long typeKey;   

to

typedef unsigned long typeKey;   

The encoding algorithm can be tested by the program SimpleNDEncoder. A sample usage is below:

SimpleNDEncoder --n-bins 32 --vol-filepath D:\data\volumes\rawfiles\simulated\neghip.nhdr --nc-filepath-prefix D:\data\volumes\waveletsat\neghip.z_1

The input file is specified by the argument --vol-filepath <vol-filepath> . Currently it only supports NRRD format (fortunately, University of Utah releases a repo of volume data sets in NRRD format, as described in the later section Data Sets). The output is specified by the argument --nc-filepath-prefix <nc-filepath-prefix>. The output file will be <nc-filepath-prefix>.wnc4. (The character 'w' means WaveletSAT. The number '4' means NetCDF4).

Additional arguments:
  • --netcdf-deflate-level <zip-level>: The zip algorithm level.
  • --is-using-gpus <boolean>: Use GPUs or not?
  • --timing-printing-level <level>: Specify the level to print the timing.
  • --size-of-full-arrays <number-in-MB> Specify the size to store the full arrays.


The reference approach: out-of-core integral histograms


In the SciVis '13 paper, WaveletSAT is compared with the conventional integral histograms. As storing the entire integral histogram is difficult, the referenced approach iterates through the bins to compute the bin SAT and store it to  NetCDF file before the next bin.

To build the reference approach, when configuring in CMake, set WITH_SAT_FILE as ON. Then the solution or makefile will be generated with the referenced approach BUT without WaveletSAT. (As a result, it is recommended to put the make files in different projects).

The referenced approach can be also tested by the program SimpleNDEncoder. Here is a sample usage:

SimpleNDEncoder --n-bins 32 --vol-filepath D:\data\volumes\rawfiles\medical\Foot.nhdr --size-of-full-arrays 0 --nc-filepath-prefix D:\data\volumes\waveletsat\Foot.b_32.z_1 --netcdf-deflate-level 1 --is-comp-bins-only true

Verification of Accuracy


If WITH_SAT_FILE as ON, the program SimpleNDEncoder can be used to compute and store the bin index per grid point. The bin indices can be used by the program SimpleNDQuery to compute the ground truth so we can evaluate the accuracy of WaveletSAT. To compute the bin indices, add the option --is-comp-bins-only true to SimpleNDEncoder. The bin indices will be stored in the file <nc_filepath_prefix>.bin.nc4 where <nc_filepath_prefix> is the value of the argument --nc-filepath-prefix.

To verify the accuracy, run the program SimpleNDQuery as follows:

SimpleNDQuery --is-verbose true --nc-filepath D:\data\volumes\waveletsat\hydrogenAtom.tmp.z_1.wnc4 --bin-filepath D:\data\volumes\waveletsat\hydrogenAtom.b_32.z_1.bin.nc4 --query-win-length 16 --size-of-full-arrays 0 --is-testing-query true --n-testing-values 512

Here is the explanation of the arguments
  • --size-of-full-arrays: Should be 0 for the referenced approach. For WaveletSAT, it means the size of full arrays to store the wavelet coefficients.
  • --n-testing-values: The number of points to test. The program randomly picks points to tests its region histogram.
  • --query-win-length: The window length to test.

The program will query the region histograms and compare the result with the brute force approach. The returned RMSE should be 0 for all queries.

Data sets


The data sets HydrogenAtom and Foot can be downloaded from the project page of Revisiting Histograms and Isosurface Statistics. by Scheidegger et al.

The data sets Ocean and MJO are not available to the public.

New features


After the SciVis '13 submission, the implementation of WaveletSAT is improved with the following features:
  1. Support 2D and 3D contour spectrum algorithms presented by Bajaj et al. (The 3D algorithm is simplified by Xin Tong at The Ohio State University).
  2. Support the dynamic allocation of coefficient arrays. In the SciVis '13 paper, the arrays are pre-allocated, and thus the initialization time can be long.
  3. Now the coefficients can be written to the file during the encoding. Because the written arrays can be released from the memory, the memory requirement can be further reduced.

Publications


T.-Y. Lee and H.-W. Shen
Efficient Local Statistical Analysis via Integral Histograms with Discrete Wavelet Transform
IEEE Transactions on Visualization and Computer Graphics, 19(12):2693-2701, Dec., 2013.
Special Issue for IEEE SciVis '13, acceptance rate: 31/126 = 24%.
preprint, errata, video, slides


Errata

The x-axis ticks of charts in Figure 14 mismatch the curve markers. The reason is that my original MATLAB script called set(gca, 'XTickLabel', bin_labels) without calling set(gca, 'XTick', bins). Consequently, the tick locations were still chosen by MATLAB, and the manually-specified tick labels are incorrectly applied to the automatically-chosen locations. The solution is calling set(gca, 'XTick', bins) alone to manually decide which tick should be plotted. Calling set(gca, 'XTickLabel', bin_labels) is unnecessary (since the labels are still numbers).

Fortunately, this bug is only in the script to plot charts of Figure 14. They are corrected in the errata.

Note that now the new charts were plotted by matplotlib in Python since my MATLAB license was expired. The consequence is that the charts look different from others generated by MATLAB. Once I can re-generate the image via MATLAB, I will check with TVCG how to correct them.

Acknowledgements

I conducted this work when I was a post doc at The Ohio State University with Prof. Han-Wei Shen.

This work was supported by the following grants:
  • NSF grant IIS-1017635
  • NSF grant IIS-1065025
  • US Department of Energy DOESC0005036
  • Battelle Contract No. 137365
  • Department of Energy SciDAC grant DE-FC02-06ER25779
It is also related to the DoE ASCR project SDMAV: Information-theoretic Framework for Visualization, which involves the research teams from OSU, ANL, and NYU-Poly. Before WaveletSAT, this project has a preliminary publication about parallel histogram computation in IEEE LDAV '12, as listed in the references.

References


C. L. Bajaj, V. Pascucci, and D. R. Schikore
The contour spectrum
In Vis '97: Proceedings of the IEEE Visualization, pp. 167 - 173, 1997

F. Porikli
Integral histogram: a fast way to extract histograms in Cartesian spaces
In CVPR ’05: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, volume 1, pp. 829 - 836, 2005.

Carlos E. Scheidegger, John M. Schreiner, Brian Duffy, Hamish Carr, Claudio T. Silva
Revisiting Histograms and Isosurface Statistics
IEEE Transactions on Visualization and Computer Graphics, vol. 14, no. 6, pp. 1659 - 1666, November/December, 2008.

A. Chaudhuri, T.-Y. Lee, B. Zhou, C. Wang, T. Xu, H.-W. Shen, T. Peterka and Y.-J. Chiang

Scalable Computation of Distributions from Large Scale Data Sets

In LDAV ‘12: Proccedings of the IEEE Symposium on Large-Scale Data Analysis and Visualization, pp. 113 - 120, 2012.
Comments