Commit 57cb26a1 by FritzFlorian

Add more notes on banana pi kernel parameters and cgroups.

parent 4ab4d3fe
Pipeline #1435 passed with stages
in 3 minutes 24 seconds
......@@ -71,19 +71,68 @@ run the following scripts:***
- `sudo ./setup_cpu.sh`
- `sudo ./map_interrupts_core_0.sh`
- `sudo ./setup_rt.sh`
- `sudo ./setup_cgroups.sh`
Then start your tests manually mapped to cores 1 to 7.
Then start your tests manually mapped to cores 1 to 7. We also found that having any interactive sessions
open during the measurements (especially)
### Pin all other processes to core 0
### Tuning kernel parameters
To further reduce inference with our controlled benchmark environment we map all non related
processes to core 0 of the system, running our benchmarks on cores 1 to 7.
Some online references advice on some kernel parameter tweaks for getting better latencies.
To change kernel parameters edit the `boot/armbianEnv.txt` file and add a line with
`extraargs=<your args>`.
The system uses system.d, which makes this the simplest point to change the default process affinity.
Edit the file `/etc/systemd/system.conf` and set `CPUAffinity=0`. This will make all processes forked
from system.d run on core 0. Benchmarks can then be manually mapped to different cores.
Here are some good articles discussing jitter on linux systems:
- https://www.codethink.co.uk/articles/2018/configuring-linux-to-stabilise-latency/ (General Tips and Measurements)
- https://access.redhat.com/sites/default/files/attachments/201501-perf-brief-low-latency-tuning-rhel7-v1.1.pdf (7 - Kernel Command Line)
- https://access.redhat.com/articles/65410 (Power Management/C-States)
- https://community.mellanox.com/s/article/rivermax-linux-performance-tuning-guide--1-x (General Tips)
We use the following settings:
```shell script
mce=ignore_ce nosoftlockup nmi_watchdog=0 transparent_hugepage=never processor.max_cstate=1 idle=poll nohz=on nohz_full=1-7 isolcpus=1-7
```
- ***mce=ignore_ce*** do not scan for hw errors. Reduce the jitter introduced by periodic runs
- ***nosoftlockup*** do not log backtraces for tasks hogging the cpu over some time. This, again, reduces jitter and we do not need the function in our controlled test environment.
- ***nmi_watchdog=0*** disables the nmi watchdog on architectures that support it. Esentially disables a non blockable interrup that is used to detect hanging/stuck systems. We do not need this check during our benchmarks. https://medium.com/@yildirimabdrhm/nmi-watchdog-on-linux-ae3b4c86e8d8
- ***transparent_hugepage=never*** do not scan for small pages to combine to hugepages. We have no issues with memory usage, spare us of this periodic jitter.
- ***processor.max_cstate=1 idle=poll*** do not switch to CPU power saving modes (c-states). Just run all cores at full speed all the time (we do not care about energy during our tests).
- ***nohz=on nohz_full=1-7*** disable houskeeping os ticks on our isolated benchmark cores. core 0 will handle these when needed.
- ***isolcpus=1-7*** ignores cores 1 to 7 for scheduling processes (expect system processes that are bound to cores).
### Pin all other processes to core 0 (crgoups)
We want to isolate our measurements to cores 1 to 7 and use core 0 for all non benchmark related processes.
isolcpus does some of this during startup (TODO: see if there are issues with task migration if we use isolcpus),
however, we need to configure the system to schedule our threads specifically onto the isolated cores.
cgroups is a tool well suited for this. See the tutorial for further information: https://github.com/lpechacek/cpuset/blob/master/doc/tutorial.txt
Essentially, we can partition our cores into two isolated groups, then map all tasks that can be moved away from
our benchmark cores, to ensure low influence of background tasks. Cgroups also nicely interact with
the real time scheduler, as described here https://www.linuxjournal.com/article/10165, because
they allow to adapt the scheduler to ignore the other cores in its decision making process.
Note the exclusive cpu groups in this output:
```shell script
florian@bananapim3:~$ cset set
cset:
Name CPUs-X MEMs-X Tasks Subs Path
------------ ---------- - ------- - ----- ---- ----------
root 0-7 y 0 y 116 2 /
user 1-7 y 0 n 0 0 /user
system 0 y 0 n 58 0 /system
```
Create a file called 'setup_cgroups.sh' and modify it with 'chmod +x setup_cgroups.sh':
```shell script
#!/bin/bash
sudo cset shield --cpu=1-7 -k on
```
This will isolate cores 1 to 7 for our benchmarks. To run the benchmarks on these cores use the following
or a similar command: `sudo chrt --fifo 90 cset shield -e --user=<user> <executable> \-- <arguments>`
***BEFORE TESTS***: to make the config apply ***restart your system***
### CPU frequency
......@@ -181,7 +230,9 @@ sysctl -w kernel.sched_rt_period_us=1000000
To run the tests use the following (or a similar command with different rt policy):
`taskset FFFE chrt --fifo 80 sudo -u $SUDO_USER <benchmark>`
`sudo chrt --fifo 90 cset shield -e --user=<user> <executable> \-- <arguments>`
This maps the process to all cores but core 0 and runs them using the desired real time schedule and priority.
This maps the process to all cores but core 0 and runs them using the round robin real time schedule.
Rplace -rr with --fifo to use the first in first out scheduler.
We found that interactive sessions can cause huge latency spices even with this separation,
therefore we advise on starting the benchmarks and then leaving the system alone until they are done.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or sign in to comment