diff --git a/BANANAPI.md b/BANANAPI.md index 4390425..02c8f8d 100644 --- a/BANANAPI.md +++ b/BANANAPI.md @@ -71,19 +71,68 @@ run the following scripts:*** - `sudo ./setup_cpu.sh` - `sudo ./map_interrupts_core_0.sh` - `sudo ./setup_rt.sh` +- `sudo ./setup_cgroups.sh` -Then start your tests manually mapped to cores 1 to 7. +Then start your tests manually mapped to cores 1 to 7. We also found that having any interactive sessions +open during the measurements (especially) -### Pin all other processes to core 0 +### Tuning kernel parameters -To further reduce inference with our controlled benchmark environment we map all non related -processes to core 0 of the system, running our benchmarks on cores 1 to 7. +Some online references advice on some kernel parameter tweaks for getting better latencies. +To change kernel parameters edit the `boot/armbianEnv.txt` file and add a line with +`extraargs=`. -The system uses system.d, which makes this the simplest point to change the default process affinity. -Edit the file `/etc/systemd/system.conf` and set `CPUAffinity=0`. This will make all processes forked -from system.d run on core 0. Benchmarks can then be manually mapped to different cores. +Here are some good articles discussing jitter on linux systems: +- https://www.codethink.co.uk/articles/2018/configuring-linux-to-stabilise-latency/ (General Tips and Measurements) +- https://access.redhat.com/sites/default/files/attachments/201501-perf-brief-low-latency-tuning-rhel7-v1.1.pdf (7 - Kernel Command Line) +- https://access.redhat.com/articles/65410 (Power Management/C-States) +- https://community.mellanox.com/s/article/rivermax-linux-performance-tuning-guide--1-x (General Tips) + +We use the following settings: +```shell script +mce=ignore_ce nosoftlockup nmi_watchdog=0 transparent_hugepage=never processor.max_cstate=1 idle=poll nohz=on nohz_full=1-7 isolcpus=1-7 +``` + +- ***mce=ignore_ce*** do not scan for hw errors. Reduce the jitter introduced by periodic runs +- ***nosoftlockup*** do not log backtraces for tasks hogging the cpu over some time. This, again, reduces jitter and we do not need the function in our controlled test environment. +- ***nmi_watchdog=0*** disables the nmi watchdog on architectures that support it. Esentially disables a non blockable interrup that is used to detect hanging/stuck systems. We do not need this check during our benchmarks. https://medium.com/@yildirimabdrhm/nmi-watchdog-on-linux-ae3b4c86e8d8 +- ***transparent_hugepage=never*** do not scan for small pages to combine to hugepages. We have no issues with memory usage, spare us of this periodic jitter. +- ***processor.max_cstate=1 idle=poll*** do not switch to CPU power saving modes (c-states). Just run all cores at full speed all the time (we do not care about energy during our tests). +- ***nohz=on nohz_full=1-7*** disable houskeeping os ticks on our isolated benchmark cores. core 0 will handle these when needed. +- ***isolcpus=1-7*** ignores cores 1 to 7 for scheduling processes (expect system processes that are bound to cores). + +### Pin all other processes to core 0 (crgoups) + +We want to isolate our measurements to cores 1 to 7 and use core 0 for all non benchmark related processes. +isolcpus does some of this during startup (TODO: see if there are issues with task migration if we use isolcpus), +however, we need to configure the system to schedule our threads specifically onto the isolated cores. + +cgroups is a tool well suited for this. See the tutorial for further information: https://github.com/lpechacek/cpuset/blob/master/doc/tutorial.txt +Essentially, we can partition our cores into two isolated groups, then map all tasks that can be moved away from +our benchmark cores, to ensure low influence of background tasks. Cgroups also nicely interact with +the real time scheduler, as described here https://www.linuxjournal.com/article/10165, because +they allow to adapt the scheduler to ignore the other cores in its decision making process. +Note the exclusive cpu groups in this output: +```shell script +florian@bananapim3:~$ cset set +cset: + Name CPUs-X MEMs-X Tasks Subs Path + ------------ ---------- - ------- - ----- ---- ---------- + root 0-7 y 0 y 116 2 / + user 1-7 y 0 n 0 0 /user + system 0 y 0 n 58 0 /system +``` + +Create a file called 'setup_cgroups.sh' and modify it with 'chmod +x setup_cgroups.sh': +```shell script +#!/bin/bash + +sudo cset shield --cpu=1-7 -k on +``` + +This will isolate cores 1 to 7 for our benchmarks. To run the benchmarks on these cores use the following +or a similar command: `sudo chrt --fifo 90 cset shield -e --user= \-- ` -***BEFORE TESTS***: to make the config apply ***restart your system*** ### CPU frequency @@ -181,7 +230,9 @@ sysctl -w kernel.sched_rt_period_us=1000000 To run the tests use the following (or a similar command with different rt policy): -`taskset FFFE chrt --fifo 80 sudo -u $SUDO_USER ` +`sudo chrt --fifo 90 cset shield -e --user= \-- ` + +This maps the process to all cores but core 0 and runs them using the desired real time schedule and priority. -This maps the process to all cores but core 0 and runs them using the round robin real time schedule. -Rplace -rr with --fifo to use the first in first out scheduler. +We found that interactive sessions can cause huge latency spices even with this separation, +therefore we advise on starting the benchmarks and then leaving the system alone until they are done.