I’ve recently worked on implementing high-precision timing measurements in the critical path on a few projects recently, where using the Time Stamp Counter (TSC) comes in handy. This post covers the implementation of a TSC-based clock and some considerations when using it.
The Time Stamp Counter (TSC)
The TSC is a 64-bit register on x86 processors that counts CPU cycles since reset. It’s accessible via the RDTSC
instruction, making it a popular choice for high-resolution timing on x86 platforms.
Why Use TSC?
- High Resolution: The TSC increments with each CPU cycle, providing fine-grained timing.
- Low Overhead: Reading the TSC is faster than other timing functions.
- Monotonic: On most modern processors, the TSC is monotonic and consistent across cores.
Invariant TSC
An important concept when working with TSC is the invariant TSC. Most modern processors often implement an invariant TSC, which provides several advantages for timing measurements. The invariant TSC runs at a constant rate regardless of CPU power state or frequency changes, making it more reliable for timing measurements. On multi-core and multi-socket systems, the invariant TSC is typically synchronized across all cores and sockets, ensuring consistent readings. Additionally, the invariant TSC continues to increment even when the core is in a deep sleep state, maintaining timing consistency.
To check if your processor supports invariant TSC, you can use the CPUID instruction. The invariant TSC feature is indicated by bit 8 of the EDX register when CPUID is executed with EAX = 80000007H.
TSC-based Clock Implementation
The implementation is based on Intel’s whitepaper, with modifications for better performance. I will avoid rehashing information already in the whitepaper and recommend reading it to understand the implementation better.
Key points about this implementation:
- We use
MFENCE
andLFENCE
as cheaper alternatives toCPUID
for serializing. - The TSC value is combined directly into a 64-bit value, simplifying register handling.
- For end measurements,
RDTSCP
is used instead ofRDTSC
for better ordering guarantees.
Calibration
To convert TSC ticks to real time, we need to know the TSC frequency:
|
|
This method calculates the TSC frequency by comparing TSC readings with wall clock time over a some interval (one second in this example).
Taking a Start Measurement
We first take a measurement representing the start time:
|
|
Measuring Duration
To measure duration, we convert TSC ticks to real time:
|
|
Considerations
While TSC-based timing can be precise, there are some caveats:
- CPU Frequency Changes: On systems without invariant TSC, the TSC rate might change.
- Multi-Socket Systems: TSCs might not be synchronized across multiple CPU sockets.
- Virtualization: Virtual machines might not provide reliable TSC readings.
Full Implementation
Here’s the complete TscDurationClock
class:
|
|