BTR 2.0 (parallel) tests
To evaluate the running speed of BTR-parallel 2.0 several tests are performed, and the calculation time was compared with the last non-parallel version BTR 1.7.
The following two systems were used for the test sessions:
1. Quad-core: Intel Xeon CPU E5345 (2.33 GHz, Quad-core), RAM 2GB; Windows Vista
2. Single-core: Intel Pentium M (1.4GHz), RAM 1.24GB; Windows XP Professional SP2
The injector geometry and beam optics are the same through all the tests:
HNB Line geometry, Deuterium beam: 1MeV, 40 MW, bi-gauss 5/5 mrad + halo (15%, 15 mrad), vertical steering -9 mrad, 1280 beamlets.
The options and main parameters of beam tracing in test calculations are shown in the Table 1.
Tab. 1 Beam tracing options and main parameters used for BTR test calculations
The performance results for single-processor and multi-core machines are shown in the Tabs 2-a,b.
Note. In all the tests the performance is measured with disabled tracks visualizing.
Tab 2. BTR speed (Million Particles per minute) and average memory capture (MBytes per Million Particles) in the test calculations. N is the number of worker threads in BTR-parallel execution.
a) Single-core machine 1.4GHz, RAM 1.24GB.
Table 2-a indicates that BTR 2.0 usage (N = 1) even on a single-processor machine is more effective, than BTR 1.7. The gain can run up to 1.5 – 4 times, depending on BTR-input settings. Multi-thread running (N>1) is no use on a single-processor system: 4 threads running reduced the performance down 20%, comparing to N = 1.
Note. BTR 2.0 is better than BTR 1.7 at least until the free memory space is twice more than the memory captured for particles storage. If the free memory is less than needed, BTR 2.0 (or part of the worker threads) can stop running. This is also true for execution on multi-processor systems.
b) Quad-core machine 2.33 GHz, RAM 2GB.
Table 2-b indicates that BTR-parallel 2.0 speed on a 4-processor machine is 7-11 times higher as compared with BTR 1.7 run on the same system. The output rate of BTR 2.0 in 4-threads execution amounts to 800, 000 – 5, 000, 000 particles/minute. The speed depends on the ions/atoms ratio and on the ions trace-step. The highest processing speed (5 MParticle/minute) naturally is achieved in the Test #1, where the accelerated ions are neutralized at the Neutralizer exit and the atoms are ray-traced till the Duct exit without re-ionization. The average memory capture (BTR 2.0) does not exceed 30 MB per 1 million particles, regardless of the machine used and of the threads number launched.
Table 2-b also shows the BTR 1.7 speed range on the 2.33 GHz computer 140,000 – 470,000 particles/minute (i.e. processing 1 million particles takes 2 – 7 minutes). This speed range differs (in 1.5 - 2 times) from the old values which could be obtained on earlier BTR 1.7 releases (2008). This is because during the BTR 2.0 development some BTR 1.7 modules were optimized or replaced by new.
To make the speed comparison of different BTR versions on various machines more clear, the Total Particles amount (instead of the Source Particles amount in former BTR versions) and the average processing Time per Beamlet are hence displayed in the Status window of BTR 2.0 screen. This allows the user to know the actual number of particles to be processed for the selected input and to evaluate the code performance on the current machine – without completing the calculations for the whole beam, but just tracing a dozen of beamlets.
These data can be used as performance criteria for BTR 2.0 testing on different computer systems and for comparison with other particles tracing codes.