
The last thing after having your package tested, cleaned, and ready for release is to produce some statistical data out: benchmark. To do that, we will use the benchmark functionalities inside the testing standard package.

General Rules of Thumbs

  1. Avoid global variables in the benchmark script. It has direct influences to the entire package including the main source codes.
  2. Anything private in benchmark script should start with the name "benchmark" similar to the test script, as in benchmarkYourNameHere.
  3. Anything public in benchmark script should be the test suite themselves, as in func BenchmarkFunctionA(b *testing.B) {.
  4. Always save the output and use it. This is to avoid the compiler function's optimization where it automatically deletes function that produces output without any consumption.

Writing Benchmark Script

Benchmark codes are essentially test codes. Ideally, you should write the benchmark script inside the test script.

However, from experience standpoint, test script itself is already a bloating script. Hence, I would rather create a separate benchmark test script next to it for reducing the scrolling nightmare just to alter benchmark codes.


The script's filename should contain a _test.go at its ending suffix similar to test script. My practice would usually be appending a _benchmark_test.go instead, so that it produces the following cleaner file structures:

  • sourceCode.go - the source code
  • sourceCode_test.go - the test script for the source code
  • sourceCode_benchmark_test.go - the benchmark script for the source code

Writing Benchmark

Now inside the sourceCode_benchmark_test.go script, you will write the benchmark function. A basic format is as follows:

func BenchmarkYourFunction(b *testing.B) {
        ... // prepare
        for i := 0; i < b.N; i++ {
                ... // your function here
        ... // output dumping
  • You start off with Benchmark prefix in the function name.
  • Then pass in the benchmark parameters (b *testing.B).
  • Inside the function, you must place your function inside the benchmark loops and capture the output for later output dumping.
    • Do not forget this since a lot of beginner tends to forget the loop that is important to quantify the performance.

A good example is:

func BenchmarkAlgoSearchShort(b *testing.B) {
        var r bool
        for i := 0; i < b.N; i++ {
                r = fragmentSearch(&patShort, &txtShort, sensitivity)
        res = r

A beginner mistake would be:

func BenchmarkAlgoSearchShort(b *testing.B) {
        r := fragmentSearch(&patShort, &txtShort, sensitivity)
        res = r

One Benchmark Per Case Study

Now, you must write your benchmark for each case study. That also means for different output, you need to duplicate the functions. Here is an example, wrapping the benchmark function in a private function benchmarkAlgos and then call them out consistently across each case studies:

package main

import (

const (
        benchmarkSensitivity = uint(3)
        benchmarkPatShort    = "ABC"

func benchmarkAlgos(selector int, pat []byte, txt []byte, b *testing.B) bool {
        r := false
        for i := 0; i < b.N; i++ {
                switch selector {
                case 0:
                        r = fragmentSearch(&pat, &txt, benchmarkSensitivity)
                case 1:
                        r = fragmentSearchX2(&pat, &txt)
                case 2:
                        r = fragmentSearch2(&pat, &txt, benchmarkSensitivity)
        return r

func BenchmarkAlgoSearchShort(b *testing.B) {
        _ = benchmarkAlgos(0,

func BenchmarkAlgoSearchMedium(b *testing.B) {
        _ = benchmarkAlgos(0,

func BenchmarkAlgoSearchLong(b *testing.B) {
        _ = benchmarkAlgos(0,

func BenchmarkAlgoX2SearchShort(b *testing.B) {
        _ = benchmarkAlgos(1,

func BenchmarkAlgoX2SearchMedium(b *testing.B) {
        _ = benchmarkAlgos(1,

func BenchmarkAlgoX2SearchLong(b *testing.B) {
        _ = benchmarkAlgos(1,

func BenchmarkAlgo2SearchShort(b *testing.B) {
        _ = benchmarkAlgos(2,

func BenchmarkAlgo2SearchMedium(b *testing.B) {
        _ = benchmarkAlgos(2,

func BenchmarkAlgo2SearchLong(b *testing.B) {
        _ = benchmarkAlgos(2,


  • If you are doing compiler optimization, you need to not only save the output inside the function but also all the way to the package level. Example from Dave Cheney:
var result int

func BenchmarkFibComplete(b *testing.B) {
        var r int
        for n := 0; n < b.N; n++ {
                // always record the result of Fib to prevent
                // the compiler eliminating the function call.
                r = Fib(10)
        // always store the result to a package level variable
        // so the compiler cannot eliminate the Benchmark itself.
        result = r

Run Benchmark

Now that we have the benchmark script, we execute the benchmark command. There are a lot of forms depending on the type of profiles you are interested in.

The basic CPU and Memory Profiling

The basic and simplest profiling is to run CPU and Memory testing. Also, we need to explicitly tell the go test to skip all the tests. Therefore, the command is as such:

$ go test -run=none -bench=. -benchmem

This produces the following statistics:

goos: linux
goarch: amd64
pkg: gosandbox
BenchmarkAlgoSearchShort-8             100000000      12.8 ns/op       0 B/op       0 allocs/op
BenchmarkAlgoSearchMedium-8             50000000      32.4 ns/op       0 B/op       0 allocs/op
BenchmarkAlgoSearchLong-8               50000000      32.6 ns/op       0 B/op       0 allocs/op
BenchmarkAlgoX2SearchShort-8             5000000       252 ns/op     176 B/op       3 allocs/op
BenchmarkAlgoX2SearchMedium-8              20000     91206 ns/op  352240 B/op      16 allocs/op
BenchmarkAlgoX2SearchLong-8                20000     91688 ns/op  352241 B/op      16 allocs/op
BenchmarkAlgo2SearchShort-8             10000000       152 ns/op       0 B/op       0 allocs/op
BenchmarkAlgo2SearchMedium-8            10000000       219 ns/op       0 B/op       0 allocs/op
BenchmarkAlgo2SearchLong-8              10000000       222 ns/op       0 B/op       0 allocs/op

This is useful for a quick comparison but not insightful for optimization.

Advanced Profiling

Now that we have the basic, it's time to go into advanced profiling. Go allows you to profile not only cpu and memory resources but also, calls, blocking, mutex, etc (refer documentation). Since we already enabled memory allocation statistics, we can proceed to enable everything, for the sake of learning experience. Here we go:

$ go test -run=none \
        -bench=<Benchmark function name> \
        -benchmem \
        -benchtime 1s \
        -blockprofile /tmp/gobenchmark_block.out \
        -cpuprofile /tmp/gobenchmark_cpu.out \
        -memprofile /tmp/gobenchmark_mem.out \
        -mutexprofile /tmp/gobenchmark_mutex.out
  • -benchmem : enable memory allocation statistics
  • -benchtime : set benchmark time. s is second, m is minutes, h is hour etc
  • -blockprofile : profile the go-routine blocking activities. It takes a filepath for writing out the data.
  • -cpuprofile : profile to cpu processing activities. It takes a filepath for writing out the data.
  • -memprofile : profile the memory usage. It takes a filepath for writing out the data.
  • -mutexprofile : profile mutex activities. It takes a filepath for writing out the data.

With all the files ready, it is time to process the data. The easiest way to do it is to process it with go tool pprof command and profile the data graphically. Before that, you need to install graphviz software. On Debian Stretch is:

$ sudo apt install graphviz -y

Here is the example for processing all the benchmark output data file into usable visuals:

$ go tool pprof -output /tmp/gobenchmark_cpu.svg -svg /tmp/gobenchmark_cpu.out
$ go tool pprof -output /tmp/gobenchmark_mem.svg -svg /tmp/gobenchmark_mem.out
$ go tool pprof -output /tmp/gobenchmark_block.svg -svg /tmp/gobenchmark_block.out
$ go tool pprof -output /tmp/gobenchmark_mutex.svg -svg /tmp/gobenchmark_mutex.out

Now that we have all the profile images ready, it's time to open them using the default program. On Debian, it is:

$ xdg-open /tmp/gobenchmark_cpu.svg &> /dev/null
$ xdg-open /tmp/gobenchmark_mem.svg &> /dev/null
$ xdg-open /tmp/gobenchmark_block.svg &> /dev/null
$ xdg-open /tmp/gobenchmark_mutex.svg &> /dev/null

You should see a launch diagram as the following:

Go Benchmark Profile Diagram

Calculating Delta

There are additional tools that can help you calculate the delta between the benchmark results. This tool is known as benchcmp, where you can install it using go get:

$ go get

This program will calculate the delta between the old results with the new results from both statistics. Example, by feeding this command:

$ 2>&1 benchcmp /tmp/gobenchmark_old.log /tmp/gobenchmark.log

we got:

benchmark                         old ns/op     new ns/op     delta
BenchmarkAlgoSearchShort-8        12.9          12.9          +0.00%
BenchmarkAlgoSearchMedium-8       33.0          32.5          -1.52%
BenchmarkAlgoSearchLong-8         34.1          32.5          -4.69%
BenchmarkAlgoX2SearchShort-8      270           255           -5.56%
BenchmarkAlgoX2SearchMedium-8     104180        100286        -3.74%
BenchmarkAlgoX2SearchLong-8       111012        104952        -5.46%
BenchmarkAlgo2SearchShort-8       154           152           -1.30%
BenchmarkAlgo2SearchMedium-8      215           215           +0.00%
BenchmarkAlgo2SearchLong-8        216           215           -0.46%

benchmark                         old allocs     new allocs     delta
BenchmarkAlgoSearchShort-8        0              0              +0.00%
BenchmarkAlgoSearchMedium-8       0              0              +0.00%
BenchmarkAlgoSearchLong-8         0              0              +0.00%
BenchmarkAlgoX2SearchShort-8      3              3              +0.00%
BenchmarkAlgoX2SearchMedium-8     16             16             +0.00%
BenchmarkAlgoX2SearchLong-8       16             16             +0.00%
BenchmarkAlgo2SearchShort-8       0              0              +0.00%
BenchmarkAlgo2SearchMedium-8      0              0              +0.00%
BenchmarkAlgo2SearchLong-8        0              0              +0.00%

benchmark                         old bytes     new bytes     delta
BenchmarkAlgoSearchShort-8        0             0             +0.00%
BenchmarkAlgoSearchMedium-8       0             0             +0.00%
BenchmarkAlgoSearchLong-8         0             0             +0.00%
BenchmarkAlgoX2SearchShort-8      176           176           +0.00%
BenchmarkAlgoX2SearchMedium-8     352243        352243        +0.00%
BenchmarkAlgoX2SearchLong-8       352243        352243        +0.00%
BenchmarkAlgo2SearchShort-8       0             0             +0.00%
BenchmarkAlgo2SearchMedium-8      0             0             +0.00%
BenchmarkAlgo2SearchLong-8        0             0             +0.00%

The only problem is you need to keep the old statistic each time you perform a benchmark.

Bundling Together

Now that we have all the desired commands for advanced profiling, it's time to bundle them together. I usually bundle the following function in my ~/.bashrc command:

gobenchmark() {
        if [ "$1" == "-r" ]; then
        if [ -f "/tmp/gobenchmark.log" ]; then
                mv "/tmp/gobenchmark.log" "/tmp/gobenchmark_old.log"
        go test -run=none \
                -bench="$arg" \
                -benchmem \
                -benchtime "$timeout" \
                -blockprofile /tmp/gobenchmark_block.out \
                -cpuprofile /tmp/gobenchmark_cpu.out \
                -memprofile /tmp/gobenchmark_mem.out \
                -mutexprofile /tmp/gobenchmark_mutex.out \
                | tee /tmp/gobenchmark.log
        if [ $? != 0 ] ;then
                return 1
        2>&1 go tool pprof \
                -output /tmp/gobenchmark_cpu.svg \
                -svg /tmp/gobenchmark_cpu.out \
                > /dev/null
        2>&1 go tool pprof \
                -output /tmp/gobenchmark_mem.svg \
                -svg /tmp/gobenchmark_mem.out \
                > /dev/null
        2>&1 go tool pprof \
                -output /tmp/gobenchmark_block.svg \
                -svg /tmp/gobenchmark_block.out \
                > /dev/null
        2>&1 go tool pprof \
                -output /tmp/gobenchmark_mutex.svg \
                -svg /tmp/gobenchmark_mutex.out \
                > /dev/null
        if [ ! -z "$(2>&1 type benchcmp)" ]; then
                2>&1 benchcmp \
                        /tmp/gobenchmark_old.log \
                        /tmp/gobenchmark.log \
                        > /tmp/gobenchmark_delta.log
                if [ "$open" == "true" ]; then
                        xdg-open /tmp/gobenchmark_delta.log &> /dev/null
        if [ "$open" == "true" ]; then
                xdg-open /tmp/gobenchmark_cpu.svg &> /dev/null
                xdg-open /tmp/gobenchmark_mem.svg &> /dev/null
                xdg-open /tmp/gobenchmark_block.svg &> /dev/null
                xdg-open /tmp/gobenchmark_mutex.svg &> /dev/null
                xdg-open /tmp/gobenchmark.log &> /dev/null
export -f gobenchmark

Then, the next time I run a benchmark without opening the program to display the data, I could just run this:

$ gobenchmark
$ gobenchmark .
$ gobenchmark ./...
$ gobenchmark "${HOME}/Document/myproject/..."
$ gobenchmark . 20m
$ gobenchmark ./... 20m
$ gobenchmark "${HOME}/Document/myproject/..." 20m

Otherwise, if I want to open the program to display the data, I could just run this:

$ gobenchmark -r
$ gobenchmark -r .
$ gobenchmark -r ./...
$ gobenchmark -r "${HOME}/Document/myproject/..."
$ gobenchmark -r . 20m
$ gobenchmark -r ./... 20m
$ gobenchmark -r "${HOME}/Document/myproject/..." 20m

That's all about benchmark in Go.