This webpage contain the semi-automatic validation framework as described in [1] as well as all gem5 patches required to serve as a case study for the framework.
The patches proposed in the paper are available in:
https://gem5-review.googlesource.com/c/public/gem5/+/28053
https://gem5-review.googlesource.com/c/public/gem5/+/27972
https://gem5-review.googlesource.com/c/public/gem5/+/28052
The patch for Top-Down performance counter information is available here.
The validation framework files are available here.
[1] Juan M. Cebrian and Adrián Barredo and Helena Caminal and Miquel Moretó and Marc Casas and Mateo Valero. (2020). Semi-automatic validation of cycle-accurate simulation infrastructures: The case for gem5-x86. Future Generation Computer Systems. 112.
https://doi.org/10.1016/j.future.2020.06.035
The validation framework is just a set of scripts that draws important information to compare two or more systems. A summary of how the framework works is shown in the Figure above. In the next sections we will show how to extract performance counter information from a real system as well as from the gem5 simulator to build the databases that we will later compare using the framework.
We need the PAPI library to extract performance counter information from the real hardware. You can install this via your distribution repository or building it manually as follows.
cd /tmp
wget http://icl.utk.edu/projects/papi/downloads/papi-6.0.0.1.tar.gz
tar -xzvf papi-6.0.0.1.tar.gz
cd papi-6.0.0.1/src
./configure --prefix=$HOME/extras/papi --with-components="rapl coretemp powercap"
make; make install
# Test if it works
$HOME/extras/papi/bin/papi_avail > papi_avail_output
$HOME/extras/papi/bin/papi_native_avail > native_avail_output
# If no counter is available, change paranoid level (Warning, -1 is for everyone, check other levels that suit you)
sudo sh -c 'echo -1 >/proc/sys/kernel/perf_event_paranoid'
# Fix your system variables (this is an example, fix the routes with your own directories)
add: $HOME/extras/papi/lib64 to LD_RUN_PATH and LD_LIBRARY_PATH
add: $HOME/extras/papi/include and bin to PATH
In order to facilitate performance counter information extraction, we are going to rely in a "hook" library. This library is a "dummy" library called and the beginning of the region of interest of a benchmark and at the end of it. That is, there is a function call, but the function is empty by default. When we run the application we will replace this dummy library with different versions of the library using LD_PRELOAD. We do this so that our performance counter information skips initialization and finalization. Each version of the library will extract a set of performance counter information without the need to rebuild the binary.
Hook libraries for the Skylake X i7 7820X processor are provided as an example (folder hooks_latest).
First, modify the Makefile to point to the correct directories (variables PREFIX, PAPI and LIBTOOL).
Second, since our applications execute in a couple of seconds, we are not going to use sampling, but to run the whole application with different performance counter groups that can be extracted simultaneously. You would need to edit the src/hooks.c and edit the existing groups or add new ones as needed. There are currently 21 groups in the file, if you include additional performance groups you need to modify the Makefile to build/install them.
In order to identify performance counters that can be extracted simultaneously you can use the tool "papi_event_choser", that should be in the $HOME/extras/papi/bin if you installed it manually. For example:
./papi_event_chooser NATIVE UOPS_EXECUTED:THREAD_CYCLES_GE_1 UOPS_EXECUTED:THREAD_CYCLES_GE_2 CYCLE_ACTIVITY:STALLS_MEM_ANY EXE_ACTIVITY:BOUND_ON_STORES
will output the list of performance counters that can be read in parallel with the 4 aforementioned counters.
After the groups have been created for your system, just "make; make install" to build all the hooks libraries.
We will use the parsec-3.0 benchmarks built with gcc-hooks, that already includes calls to the hooks library.
Build all benchmarks with gcc-hooks
./parsecmgmt -a build -p parsec.blackscholes parsec.bodytrack parsec.canneal parsec.dedup parsec.facesim parsec.ferret parsec.fluidanimate parsec.freqmine parsec.streamcluster parsec.swaptions parsec.vips parsec.x264 -c gcc-hooks
Place the script "parsec_bin/gem5_validation_realsystem_run_1p.sh" in the parsec/bin folder.
Edit the script with the installation folders of the papi library (${HOME}/extras/papi/lib) by default and hooks libraries (default to "${HOME}/hooks_latest/inst"). Please note that this example is for one core, for more than one core modify the taskset command to bind the threads depending on your core count, e.g., for 4 cores, taskset -c 0-3. If you have more or less than 21 performance counter groups modify the script accordingly.
Run the script and you will end up with a folder with the current data that stores all the raw data to be used to build the real hardware database.
The folder "hooks_example" contains two files, random_writes_nohooks.cpp and random_writes_hooks.c so that users can meld/diff the files and know what they should add to their own applications in order to use the hooks libraries. Then simply implement something similar to the "parsec_bin/gem5_validation_realsystem_run_1p.sh" script. If you do not want to modify the python scripts that build the database, please try to using the same naming convention in your output files. That is: "output_BENCHNAME_INPUTSIZE_VECTOREXT_eventgroup_N_iter_Y", for example "output_parsec.ferret_simlarge_scalar_eventgroup_1_iter_1".
We are going to follow the same strategy in gem5, to measure stats only in the region of interest. The same binaries as in real hardware are used, using LD_PRELOAD to create checkpoints or reset stats based on our goals.
The folder "gem5/src_kvm" contains the hook library code for gem5. When built by default it creates a dummy hooks library that does nothing when called. The output of the make will be stored in the folder "gem5/src_kvm/.libs/libhooks.so.0.0.0". This file can be renamed as "libhooks.so".
Then users must edit the "config.h" and enable the "ENABLE_M5_TRIGGER" flag (uncomment the define). This will build a hooks library that calls "dumpresetstats" in "/sbin/m5". Make sure the tool is located in that folder within the gem5 ISO. This file can be renamed as "libhooks_stats.so".
Finally, disable "ENABLE_M5_TRIGGER" again and enable the ENABLE_M5_CKPTS flag. This will build a hooks library that creates a checkpoint whenever called. Make sure the m5 tool is located in "/sbin/m5" within the gem5 ISO. This file can be renamed as "libhooks_chkpoint.so".
Now, when running gem5 with or "rS" script we can specify what we want hooks to do at the beginning of the region of interest. Either nothing "libhooks.so", reset the stats "libhooks_stats.so", or create a checkpoint to run in detailed (o3) "libhooks_chkpoint.so" using LD_PRELOAD. For example, for the blackscholes benchmark, our "create_blackscholes_chkp.rS" looks like this:
cd /parsec/parvec/scalar/
/sbin/m5 resetstats
export LD_PRELOAD=/m5/lib_x86/libhooks_chkpoint.so
./blackscholes 1 /parsec/inputs/blackscholes/in_64K.txt /parsec/inputs/blackscholes/prices.txt
/sbin/m5 exit
The dictionary files are wrappers that rename certain performance counter events to the name they have in the different architectures. If an specific performance counter is not available in your system simply write 'none'. The dictionary files allow for a certain degree of complexity, for example, you can write basic formulas. A formula must always begin with "@" followed with the formula withing parentheses. The first element of the formula is the arithmetic operation to perform. The second and third elements are the elements to compute. For example, the following formula:
'@(-,topdown_memory_bound,@(+,topdown_l1_bound,topdown_l2_bound))'
Will compute (topdown_memory_bound - (topdown_l1_bound + topdown_l2_bound)). In the case of gem5 dictionaries, one may want to compute the stats for several cores, this can be easily done with formulas inside the dictionary files. For example in a 4-core gem5 stat file, to compute CLOCKS, we can add the cycles from all 4 cores.
'CLOCKS' : '@(+,system.cpu0.numCycles,system.cpu1.numCycles,system.cpu2.numCycles,system.cpu3.numCycles)',
Examples for these dictionary files are included in the "validation_framework_scripts/dictionaries" folder.
We provide two example parsing files for real hardware ("build_database_1p.py") and for gem5 (build_database_gem5.py). We also provide output files from both our Skylake X i7 7820X (folder "2020-09-04:17:46:09") and gem5 with and without fixes (folders "outs_parsec_1p_skl_gem520" and "outs_parsec_1p_skl_gem520_fix" respectively).
Real system
To build the database with real hardware performance counter information, modify the file "build_database_1p.py" to match your naming convention, by default, this naming convention is "output_BENCHNAME_INPUTSIZE_VECTOREXT_eventgroup_N_iter_Y".
# This list that maps the benchmarks that we want to parse to the names used in the output files we are parsing (BENCHNAME), for example, "blackscholes" is named parsec.blackscholes in our file naming.
benchmarks_keys = { 'blackscholes': 'parsec.blackscholes', 'bodytrack': 'parsec.bodytrack', 'canneal': 'parsec.canneal', 'dedup': 'parsec.dedup', 'facesim': 'parsec.facesim', 'ferret': 'parsec.ferret', 'fluidanimate': 'parsec.fluidanimate', 'freqmine': 'parsec.freqmine', 'streamcluster': parsec.streamcluster', 'swaptions': 'parsec.swaptions', 'x264': 'parsec.x264' }
# This should match the INPUTSIZE in our naming convention.
inputs = "simlarge"
# This should match VECTOREXT in our naming convention
configurations = "scalar"
# max_events and and event_groups should not be modified if you did not include new event groups in the hooks library. If you did, set the new number of event groups from 21 to whatever you are using.
# Finally, we specify the benchmarks that we want to parse from all possible keys.
benchmarks = "blackscholes bodytrack canneal dedup facesim ferret fluidanimate freqmine streamcluster swaptions x264"
Once everything has been set up, just run the parsing script to build the database. For example:
python build_database_1p.py 2020-09-04\:17\:46\:09/ dictionaries/skylakex_i77820x.dic -o results_parsec_skylakex_1p.db
You can use the "-v" parameter to view a complete log of everything that the script is doing.
gem5
Similarly to the real system script, the "build_database_gem5.py" also requires to be modified to match your naming convention. This script follows a slightly different naming convention, "{BENCHNAME}_{GEM5CONFIG}_{INPUTSIZE}_{GEM5BIGCORES}_{GEM5SMALLCORES}_{NTHREADS}p_{VECTOREXT}_stats_gem5.txt". For example "parsec.blackscholes_O3_Skylake_simlarge_1_0_1p_scalar_stats_gem5.txt". You probably do not need to change the number of big and small cores, this is more specific to our simulator version.
Once everything has been set up, just run the parsing script to build the database.
python build_database_gem5.py outs_parsec_1p_skl_gem520 dictionaries/gem5_mesi_3lvl.dic -o results_parsec_gem5_1p.db
Note: Use the dictionary that matches your gem5 configuration, we provide examples for MESI Two Level and MESI Three level. Use the "-v" parameter for verbose mode.
The validation framework contains a set of scripts that allow for drawing figures based on "python-pychart" and "python-scipy" libraries. More specifically, the scripts:
"plot_stacked_bars.py" draws data in stacked bars from a single dictionary file
"plot_stacked_bars_sidebyside.py" compares side by side all the data contained in the dictionaries
"plot_stacked_bars_twosources.py" draws data in stacked bars from two dictionary files
"plot_stacked_bars_threesources.py" draws data in stacked bars from two dictionary files
All scripts must be edited to include the correct naming convention used by the user. Each script contains information on how to use it when executed. Keyword "dummy" is used as a "*" in a sql query, but must be defined in the application list. The file "print_log" shows the script verbose information. To ease this process, the developers provide a set of bash scripts to serve as an example on how to use the "plot_stacked_bars.py" and "plot_stacked_bars_sidebyside.py" scripts. More specifically the scripts "00_drawplots_topdown_gem5.sh" "00_drawplots_topdown_skylake.sh" and "00_gem5_validation_drawplots_dictionary_sidebyside" generate a web interface with information extracted from the databases. These figures can be used to validate the simulation infrastructure.
For example:
./00_gem5_validation_drawplots_dictionary_sidebyside gem5_example_side_by_side dictionaries/skylakex_i77820x.dic results_parsec_skylakex_1p.db dictionaries/gem5_mesi_3lvl.dic results_parsec_gem5_1p.db
./00_drawplots_topdown_skylake.sh test_data_skl dictionaries/skylakex_i77820x.dic results_parsec_skylake_1p.db
./00_drawplots_topdown_gem5.sh gem5_example dictionaries/gem5_mesi_3lvl.dic results_parsec_gem5_1p.db
Folders gem5_example_side_by_side, test_data_skl and gem5_example each contain a "index.html" file that can be used to browse the data.
The three-source script can be used, for example, as follows (note, the database order is not sequential, the real system database in the command line is placed between the other two in the figure):
python plot_stacked_bars_threesources.py dummy simlarge 1 scalar dictionaries/skylakex_i77820x.dic results_parsec_skylakex_1p.db dictionaries/gem5_mesi_3lvl.dic results_parsec_gem5_1p.db results_parsec_gem5_1p_fix.db -k TOPDOWN_RETIRING,TOPDOWN_BADSPEC,TOPDOWN_FRONTEND,TOPDOWN_BACKEND -l print_log -catx GO,HW,GF -ylabel "Top-Down Uops Breakdown" -barlabels Retiring,Bad,Frontend,Backend -groupby benchmark -usertable formulas -glabels BS,CA,DE,FA,FE,FL,FR,ST,SW -outformat pdf -t > topdown_compare_all_simlarge_scalar.pdf