Efficient Timer for Delphi and Freepascal
Description:
It's a timer by Amine Moulay Ramdane based on TSC Pentium register and on RDTSCP that is portable to Windows and Linux and Mac OSX.
So you have to understand the essence of measuring the time in computers , so for example C++ <chrono> library does not provide a direct way to retrieve the CPU frequency. The C++ <chrono> library primarily deals with time-related operations, such as measuring durations and performing time point calculations, but the best accuracy of C++ <chrono> library is in nanosecond, so it is not so good, since you also need the accuracy in CPU cycles or ticks as is providing it my new StopWatch, so i invite you to study my new StopWatch so that to know how to implement a good StopWatch, so i invite you to read my below previous thoughts about the StopWatch so that you understand my views:
So you have to understand that the RDTSCP assembler instruction provides a synchronized timestamp across cores. The RDTSCP assembler instruction ensures that the timestamp is consistent across cores and can be used for accurate timing measurements in multicore/threaded environments. But RDTSCP assembler instruction is available only in newer CPUs, So now i will document more how to use CPU affinity in Windows and Linux so that to solve the following problem with the RDTSC assembler instruction that supports the older CPUs:
- Multicore/Threaded environments: If your system has multiple cores or threads, using rdtsc may not provide synchronized timing across different cores or threads. This can lead to inconsistent and unreliable timing measurements.
So i have just added to my new StopWatch a PreciseSleep() function that is more accurate than the Windows and Linux Sleep() function, so now i think it is the final source code version of my StopWatch, and i have tested it with older CPUs and with newer CPUs and i think it is working correctly, and i have also tested it with both Windows and Linux and i think it is working correctly.
And so that you know how to use it, and so that to have a deep understanding of the SoptWatch, i invite you to read my below previous thoughts:
So i will talk more about the essence of measuring time in the computer, so i will explain more the essence of measuring time in the computer, so here they are: so you have to get the frequency of the CPU, i mean when you are measuring time , you are measuring the CPU frequency too, but in the new CPUs, the frequency can dynamically change, so you have two ways of doing it , so you can disable CPU frequency scaling in the bios and do your exact time's measurement, and you can set it again, but the second way is that you can get a decent approximation without disabling the CPU frequency scaling and do the benchmark timing of your code , as i am explaining it below, and of course the new CPUs today are multicores, so you have to know how to set the CPU affinity as i will explain to you how so that to do the timing with the StopWatch, other than that, you can get a good microsecond accuracy and a decent nanosecond accuracy with RDTSC assembler instruction, but you can get a CPU tick accuracy with RDTSCP assembler instruction, but so that know more about them , read my below thoughts, other than that, i am also explaining much more deeply the implementation of a StopWatch in my below thoughts, so i invite you to read my below thoughts so that to understand my views on how to implement a good StopWatch:
So i have just updated my StopWatch to support both RDTSCP and RDTSC assembler instructions, so when the CPU is not new and it doesn't support RDTSCP , it will use RDTSC, and when it is a new CPU that supports RDTSCP , it will use it, so RDTSC is not a serializing instruction, so i have just correctly used the necessary memory barriers, and RDTSCP is a serializing instruction.
And i invite you to read all my previous following thoughts so that to deeply understand the StopWatch:
So now i have to explain something important, so for a deep understanding of the StopWatch, you have to know more that the assembler instruction RDTSC is supported by the great majority of x86 and x64 CPUs, but it is not a serializing instruction , i mean that it can be subject to out-of-order execution that may affect its accuracy, so it is why i have just added correctly some other memory barriers, and now i think that it is working correctly, so you have to understand that there is another assembler instruction RDTSCP that is serializing instruction and is not subject to out-of-order execution, but it is compatible with just the new x86 and x64 CPUs, so i am supporting it too, but now i think you can be confident with my new updated StopWatch, and i think it is an interesting StopWatch that shows how to implement a good StopWatch from the low level layers.
And i invite you to read my previous below thoughts so that to have a deep understanding of the StopWatch:
So i think that my new StopWatch can give a decent approximation even of you don't disable CPU frequency scaling in the bios, and here is why:
When benchmarking a CPU under a heavy workload, it is generally expected that frequency scaling changes will be relatively small or negligible. This is because the frequency scaling mechanism typically aims to maximize performance during such scenarios.
Under heavy load, the CPU frequency scaling algorithm often increases the CPU frequency to provide higher processing power and performance. The goal is to fully utilize the CPU's capabilities for the benchmarking workload.
In these cases, frequency scaling changes are generally designed to be minimal to avoid introducing significant variations in performance. The CPU frequency may remain relatively stable or vary within a relatively small range during the benchmarking process.
Considering these factors, when benchmarking under heavy workload conditions, the impact of frequency scaling changes on timing measurements using RDTSC is typically limited. As a result, RDTSC can provide a reasonable approximation of timing for benchmarking purposes.
So then i invite you to read my following previous thoughts so that you understand my views on the StopWatch:
I have just updated my new StopWatch, and it now also includes the correct memory barriers for previous 32 bit Delphi versions like Delphi 7 , and you can download it from the just below web link, and i invite you to read my below previous thoughts so that to understand my views about the StopWatch:
So i have just updated my new StopWatch, so the first problem is:
- Instruction reordering: The rdtsc instruction itself is not a serializing instruction, which means that it does not necessarily prevent instruction reordering. In certain cases, the CPU may reorder instructions, leading to inaccuracies in timing measurements.
So i have just used memory barriers so that to solve the above problem.
And here is the second problem:
- CPU frequency scaling: Modern CPUs often have dynamic frequency scaling, where the CPU frequency can change based on factors such as power management and workload. This can result in variations in the time measurement based on the CPU's operating frequency.
So you can disable CPU frequency scaling in the bios so that to solve the above problem , and after that make your timing with my StopWatch.
And for the following third problem:
- Multicore/Threaded environments: If your system has multiple cores or threads, using rdtsc may not provide synchronized timing across different cores or threads. This can lead to inconsistent and unreliable timing measurements.
You can set the CPU affinity so that to solve the third problem.
And now i have just updated my new StopWatch with the necessary memory barriers, and now you can be confident with my new updated StopWatch.
So now my new updated StopWatch uses memory barriers correctly, and it avoids the overflow problem of the Time Stamp Counter (TSC) , and it supports microseconds and nanoseconds and CPU clocks timing, and it is object oriented, and i have just made it support both x86 32 bit and x64 64 bit CPUs and it supports both Delphi and Freepascal compilers and it works in both Windows and Linux, so what is good about my new StopWatch is that it shows how you implement it from the low level layers in assembler etc.
Other than that, read my below previous thoughts so that to understand my views:
So now we have to attain a "deep" understanding of the StopWatch , so i have just discovered that the following StopWatch: https://www.davdata.nl/math/timer.html , from the following engineer from Amsterdam: https://www.davdata.nl/math/about.html , is not working correctly: since he is calling the function GetTickCount() in the constructor, but there is a problem and a bug, since when the tick count value in milliseconds returned by GetTickCount() reaches its maximum value that is high(dword) , it will wrap around to zero and start counting up again. This is because the tick count is typically stored in a fixed-size data type that has a maximum value, so it is why his way of timing in milliseconds in the constructor that he is using is not working, since it is not safe, so even if this StopWatch of this engineer from Amsterdam does effectively avoid the overflow problem of the Time Stamp Counter (TSC), since he is using an int64 in 32 bit x86 architecture in the Intel assembler function of getCPUticks() that i am understanding, and this int64 can, from my calculations, go up to 29318.9829 years , so i think his StopWatch is not working for the reason i am giving just above, and second problem is the accuracy of the timing obtained from the code he provided using rdtsc instruction in assembler is dependent on various factors, including the hardware and software environment. However, it's important to note that directly using rdtsc for timing purposes may not provide the desired accuracy due to several reasons:
- CPU frequency scaling: Modern CPUs often have dynamic frequency scaling, where the CPU frequency can change based on factors such as power management and workload. This can result in variations in the time measurement based on the CPU's operating frequency.
- Instruction reordering: The rdtsc instruction itself is not a serializing instruction, which means that it does not necessarily prevent instruction reordering. In certain cases, the CPU may reorder instructions, leading to inaccuracies in timing measurements.
- Multicore/Threaded environments: If your system has multiple cores or threads, using rdtsc may not provide synchronized timing across different cores or threads. This can lead to inconsistent and unreliable timing measurements.
So I have just thought more and i think i will not support ARM in my new StopWatch, since ARM processors don't support like a Time Stamp Counter (TSC) in x86 processors that is compatible with previous 32 bit and 64 bit CPUs , so ARM has many important weaknesses , so here is one important one:
There is no single generic method that can be universally applied to all Arm processors for measuring time in CPU clocks. The available timing mechanisms and registers can vary significantly across different Arm processor architectures, models, and specific implementations.
In general, Arm processors provide various timer peripherals or system registers that can be used for timing purposes. However, the specific names, addresses, and functionalities of these timers can differ between different processors.
To accurately measure time in CPU clocks on a specific Arm processor, you would need to consult the processor's documentation or technical reference manual. These resources provide detailed information about the available timers, their registers, and how to access and utilize them for timing purposes.
It's worth noting that some Arm processors may provide performance monitoring counters (PMCs) that can be used for fine-grained timing measurements. However, the availability and usage of PMCs can also vary depending on the specific processor model.
Therefore, to achieve accurate and reliable timing measurements in CPU clocks on a particular Arm processor, it's crucial to refer to the documentation and resources provided by the processor manufacturer for the specific processor model you are targeting.
You can go to download the zip files by clicking on the following web link:
https://drive.google.com/drive/folders/1plkj32zOl0sGvo_Uw2El13SjqXSAkNiE?usp=sharing
Language: FPC Pascal v2.2.0+ / Delphi 5+: http://www.freepascal.org/
Required FPC switches: -O3 -Sd
-Sd for delphi mode....
Required Delphi XE-XE7 and Tokyo switch: -$H+
You can configure it as follows from inside defines.inc file:
{$DEFINE CPU32} and {$DEFINE Windows32} for 32 bit systems
{$DEFINE CPU64} and {$DEFINE Windows64} for 64 bit systems
- Platform: Windows, Unix and Linux (on x86)