Benchmarks

A common use case for testing is the need to compare multiple sets of things in aggregate over time (usually infrequently, like daily, weekly or even monthly).  This can be a collection of URLs, Browsers or even locations.  WebPagetest itself is good at one-off tests and WPT monitor is very good for monitoring a specific set of URLs but neither do a good job of filling the need for benchmarks with aggregate reports.  The HTTP Archive is a closer fit but at a much larger scale and not integrated (long term it might make sense to integrate it in).

This is a plan for a lightweight, easy to configure and incredibly flexible way to extend WebPagetest to include benchmarking support.

Requirements
  • Simple to configure large comparisons (not necessarily requiring a configuration UI - config files are fine)
  • Support comparing url sets
  • Support comparing different scripts
  • Support comparing different locations
  • Support custom configuring all of the test options on a per-set basis
  • Present aggregate results
  • Allow for drill-down into individual tests
Non-Requirements
  • No user permission requirements for viewing results - all are public
  • No setup UI requirements
Configuration

There will be a directory "benchmarks" under /settings where all of the benchmarks are configured.  The list of current benchmarks will be in benchmarks.txt (sorted in presentation order with one benchmark name per line).

Each benchmark will have a configuration file within the "benchmarks" directory "<benchmark>.php"s that MUST include the following definitions:
<?php
$title = "<benchmark friendly title>";
$description = "<descriptive text explaining the benchmark>";    // optional
$configurations = array();    // ordered list of the configurations to compare (short names)
$configurations['config1'] = array();    // array of configuration options for each configuration
$configurations['config2'] = array();
....
?>

Each configuration array will contain the following information:

...['title'] = "<series title>";
...['description'] = "<descriptive text>";    // optional
...['settings'] = array("key" => "value", "key2", "value2");    // array of test options for the given configuration
...['locations'] = array("label" => "location ID");                  // array of test locations (full set of urls will be run across all locations) - includes browser and connectivity
...['url_file'] = "<urls.txt>";    // local file with list of URLs to test (should prefix it with the benchmark name if it isn't intended to be shared across multiple benchmarks)

The URL file should have one URL per line and can optionally be prefixed by a label (if there is a label then there should be a tab between the label and the URL):
AOL    www.aol.com
Yahoo    www.yahoo.com

Instead of individual URLs, scripts can also be specified.  If a script is specified it should start with "script:", newlines should be indicated with \n and tabs with \t:
AOL Navigation    script:logdata\t0\nnavigate\twww.aol.com\nlogdata\t1\nnavigate\tautos.aol.com

If labels are specified then the results will also be aggregated by label for each configuration.

There should also be a function embedded within the file that determines if the benchmark should execute (times are in time() seconds and are UTC):

function <benchmark>ShouldExecute($last_execute_time, $current_time) {
    $should_run = false;
    if (!$last_execute_time || 
        ($current_time > $last_execute_time && 
        $current_time - $last_execute_time > 86400)) {
        $should_run = true;
    }
    return  $should_run;
}

The example runs the benchmark daily but it can have logic to run on a specified day, time of day, etc.

Results

The aggregate results and state information for each benchmark will be stored in /results/benchmarks/<benchmark>:

state.json - tracks state - particularly when it was last run, if it is currently testing, etc as well as a list of the individual test runs and the test ID's for any tests that are currently running.

There will be several directories under the benchmark directory:

data/ - Raw test data for each run with each run being in a separate file in the format YYMMDD_HHMM.json.gz and will be a gzipe-encoded json serialization of the result data:
- the data is an array at the top level with each test run as an element
- each test run is an array of data, combining the page-data as well as entries for the test ID, run, cached state, URL tested and label

aggregate/ - Aggregate stats calculated from the raw data of each run.  Each stat is in a separate file but includes data from all of the runs.
- info.json - tracks state of the individual test runs that have already been aggregated as well as the current version of aggregation (in case we need to re-aggregate everything)
- <metric>.json.gz - json-encoded array of aggregations for the given metric.  There will be array elements for each configuration + cache state combination for each benchmark run and each entry will contain the following aggregates:
  • avg - arithmetic mean
  • geo-mean - geometric mean
  • stddev - standard deviation
  • median - 50th percentile
  • 75pct - 75th percentile
  • 95pct - 95th percentile
  • count - number of records aggregated
- <metric>.labels.json.gz - For tests that include labels for the URLs, each label will be aggregated individually (and will match the layout of the other metric aggregations) and the label will be stored as an additional entry with each record.

Code

The code for the benchmark operation and UI will be under /benchmarks:

cron.php - Cron job for managing the benchmarks - should be requested every 15 minutes or so
index.php - Main benchmark results UI (summary charts)
Comments