In other articles, I’ve explored Building a tickless Ubuntu Kernel. In this one I’ll walk through the steps of booting into it with CPUs isolated, and noise offloaded off the isolated CPUs, followed by running a task on that CPU. From my experimentation, these changes almost entirely remove external sources of jitter on the threads being run on isolated CPUs. That said, there still seems to be some source of extremely rare jitter that I haven’t yet tracked down. Perhaps I will update this post when I find it, or write another if the culprit turns out to be the application itself.
CPU Details
The first step would be to collect information about the CPUs available on the machine. This will help make a decision on which CPUs to isolate.
The following commands provide various levels of useful information about the CPUs.
|
|
, In order to get the CPU topography, including topography of multiple NUMA nodes use:
|
|
This will output the results straight to the terminal instead of a GUI.
Isolate CPUs and Boot
After deciding what CPUs to isolate, update /etc/default/grub with configuration to isolate the CPUs. Here I isolate the CPUs 8, 9, 10, and 11:
|
|
isolcpus
: Isolates the defined CPUsnohz_full
: Configure full dynticks. This means that when there’s a single runnable task on the CPU the kernel will stop sending timer ticks to the CPU.rcu_nocbs
: Offloads rcu callbacks from the specified CPUs onto other available CPUs. This config is not explicitly needed withnohz_full
.rcu_nocb_poll
: Removess the need for the CPUs with offloaded RCU callbacks from awakening the RCU offload threads. This will instead be handled by a timer.irqaffinity
: set the irq affinity
|
|
Affine the task to the CPU
The last step is to affine the task in question to the CPU. This can be done either usinng taskset
:
|
|
The other option is to use sched_setaffinity
Other helpful commands
To watch IRQs by core:
|
|
pidstat
the process with detailed information:
|
|
tuna is another very useful command that allows adjusting threads, processes, IRQs and more. For instance, to move all threads starting with a pattern match to CPUs 0-7:
|
|
Note: syscalls might have a higher overhead with this setup. Anectodal and not concretely verified yet.