From c7ae7ee44ad5d410a621f6a30ec15feee496a0c1 Mon Sep 17 00:00:00 2001 From: Michael Schmid Date: Tue, 9 Feb 2021 10:16:10 +0100 Subject: [PATCH] Added Odroid XU4 as another supported board --- BANANAPI.md | 239 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Hardware.md | 241 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 241 insertions(+), 239 deletions(-) delete mode 100644 BANANAPI.md create mode 100644 Hardware.md diff --git a/BANANAPI.md b/BANANAPI.md deleted file mode 100644 index 62962b5..0000000 --- a/BANANAPI.md +++ /dev/null @@ -1,239 +0,0 @@ -# Setup BananaPI for benchmarking - -The goal of this documentation is to get a linux image running on a -bananaPI board that allows for very isolated benchmarks showing full -time distributions of the measurement runs. - -## Base Setup - -First step is to get a linux image running on the banana PI. We choose to use the -[armbian](https://www.armbian.com/) project as it is the only one with support for the -bananaPI m3. For this we generally -[follow the instructions given](https://docs.armbian.com/Developer-Guide_Build-Preparation/), -below are notes on what to do to get the rt kernel patch into it and to build. - -You can also use our pre-build image [on google drive](https://drive.google.com/open?id=1RiHymBO_XjOk5tMAL31iOSJGfncrWFQh) -and skip the build process below. Just use etcher (https://www.balena.io/etcher/) or similar, -flash an sd card and the PI should boot up. Default login is root/1234, follow the instructions, -then continue with the isolating system setup steps for more accurate measurements. - -General Setup: -- Setup an ubuntu bionic 18.04 virtual box VM -- `# apt-get -y -qq install git` -- `$ git clone --depth 1 https://github.com/armbian/build` -- `$ cd build` -- To verify the environment first go for a 'clean build' without patch: `# ./compile.sh` -- Select the bananaPI m3 board and a minimal console build - -Apply RT Pach: -- Find the current kernel version armbian is using, e.g. from the previous build logs -- Download and unpack the matching rt path from https://mirrors.edge.kernel.org/pub/linux/kernel/projects/rt/ -- You should have a single .patch file, place it in build/userpatches/kernel/sunix-current/patch-5.4.28-rt19.patch -- Re-run the `# ./compile.sh` script -- Select BananaPI M3, Command Line Minimal and SHOW KERNEL CONFIG -- The build should pick up the patch (and show it in the logs) -- You will be ask to fill in some settings. Choose (4) fully preemptive at the first option -- Fill out the other asked settings to your liking. To avoid issues just leave them at default. -- You will then be in the kernel config window -- Here disable the file systems AUFS and NFS in the settings (they cause build issues and we do not need them) -- Store the settings and build the image -- If successfull, the flashed image should show the preempt patch with `uname -a` and should have good latencies in cyclictest - -## Run project - -First setup some base dependencies for running the benchmark and tests: - -- `# apt-get install rt-tests` -- `# apt-get install build-essential` -- `# apt-get install cmake` -- `# apt-get install git` -- `# apt-get install cpuset` - -Next EMBB is required as a comparison in the benchmark suite. Install it using the following or similar -(as described on their github page, https://github.com/siemens/embb): -- `$ wget https://github.com/siemens/embb/archive/v1.0.0.zip` -- `$ unzip v1.0.0.zip` -- `$ cd embb-1.0.0` -- `$ mkdir cmake-build-release` -- `$ cd cmake-build-release` -- `$ cmake ../` -- `$ cmake --build .` -- `# cmake --build . --target install` - -This are all dependencies needed for executing the benchmark project and pls itself. -Follow the project specific instructions for how to use them. - -## Tweaking Scheduler, CPU and Interrupts - -We would like to get very little dispersion through system jitter. We recommend tweaking the -scheduler, CPU and interrupt settings before running benchmarks. - -See the sub-sections below for the individual measures. ***Before running tests make sure to -run the following scripts:*** -- `sudo ./setup_cpu.sh` -- `sudo ./map_interrupts_core_0.sh` -- `sudo ./setup_rt.sh` -- `sudo ./setup_cgroups.sh` - -Then start your tests manually mapped to cores 1 to 7. We also found that having any interactive sessions -open during the measurements (especially) - -### Tuning kernel parameters - -Some online references advice on some kernel parameter tweaks for getting better latencies. -To change kernel parameters edit the `boot/armbianEnv.txt` file and add a line with -`extraargs=`. - -Here are some good articles discussing jitter on linux systems: -- https://www.codethink.co.uk/articles/2018/configuring-linux-to-stabilise-latency/ (General Tips and Measurements) -- https://access.redhat.com/sites/default/files/attachments/201501-perf-brief-low-latency-tuning-rhel7-v1.1.pdf (7 - Kernel Command Line) -- https://access.redhat.com/articles/65410 (Power Management/C-States) -- https://community.mellanox.com/s/article/rivermax-linux-performance-tuning-guide--1-x (General Tips) - -We use the following settings: -```shell script -mce=ignore_ce nosoftlockup nmi_watchdog=0 transparent_hugepage=never processor.max_cstate=1 idle=poll nohz=on nohz_full=1-7 -``` - -- ***mce=ignore_ce*** do not scan for hw errors. Reduce the jitter introduced by periodic runs -- ***nosoftlockup*** do not log backtraces for tasks hogging the cpu over some time. This, again, reduces jitter and we do not need the function in our controlled test environment. -- ***nmi_watchdog=0*** disables the nmi watchdog on architectures that support it. Esentially disables a non blockable interrup that is used to detect hanging/stuck systems. We do not need this check during our benchmarks. https://medium.com/@yildirimabdrhm/nmi-watchdog-on-linux-ae3b4c86e8d8 -- ***transparent_hugepage=never*** do not scan for small pages to combine to hugepages. We have no issues with memory usage, spare us of this periodic jitter. -- ***processor.max_cstate=1 idle=poll*** do not switch to CPU power saving modes (c-states). Just run all cores at full speed all the time (we do not care about energy during our tests). -- ***nohz=on nohz_full=1-7*** disable houskeeping os ticks on our isolated benchmark cores. core 0 will handle these when needed. - -### Pin all other processes to core 0 (crgoups) - -We want to isolate our measurements to cores 1 to 7 and use core 0 for all non benchmark related processes. -isolcpus is often used for this, however, we found that it disables the scheduler from balancing tasks -between the isolated cores. A better approach is to use cgroups. -See the tutorial for further information: https://github.com/lpechacek/cpuset/blob/master/doc/tutorial.txt -Essentially, we can partition our cores into two isolated groups, then map all tasks that can be moved away from -our benchmark cores, to ensure low influence of background tasks. Cgroups also nicely interact with -the real time scheduler, as described here https://www.linuxjournal.com/article/10165, because -they allow to adapt the scheduler to ignore the other cores in its decision making process. -Note the exclusive cpu groups in this output: -```shell script -florian@bananapim3:~$ cset set -cset: - Name CPUs-X MEMs-X Tasks Subs Path - ------------ ---------- - ------- - ----- ---- ---------- - root 0-7 y 0 y 116 2 / - user 1-7 y 0 n 0 0 /user - system 0 y 0 n 58 0 /system -``` - -Create a file called 'setup_cgroups.sh' and modify it with 'chmod +x setup_cgroups.sh': -```shell script -#!/bin/bash - -sudo cset shield --cpu=1-7 -k on -``` - -This will isolate cores 1 to 7 for our benchmarks. To run the benchmarks on these cores use the following -or a similar command: `sudo chrt --fifo 90 cset shield -e --user= \-- ` - - -### CPU frequency - -Limiting the frequency to 1GHz makes sure that the banana PI dose not throttle during the tests. -Additionally, disabling any dynamic frequency scaling makes tests more reproducable. - -Create a file called 'setup_cpu.sh' and modify it with 'chmod +x setup_cpu.sh': -```shell script -#!/bin/bash - -echo "Writing frequency utils settings file..." -echo "ENABLE=true -MIN_SPEED=1412000 -MAX_SPEED=1412000 -GOVERNOR=performance" > /etc/default/cpufrequtils - -echo "Restarting frequency utils service..." -systemctl restart cpufrequtils - -echo "Done!" -echo "Try ./watch_cpu.sh to see if everything worked." -echo "Test your cooling by stressing the cpu and watching the temperature output." -``` - -Create a file called 'watch_cpu.sh' and modify it with 'chmod +x watch_cpu.sh': -````shell script -#!/bin/bash - -echo "Min/Max Frequencies" -cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_min_freq -echo "-----" -cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_max_freq - -echo "Scaling Min/Max Frequencies" -cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq -echo "-----" -cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq - -echo "Actual Frequencies" -cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq - -echo "Temps.." -cat /sys/class/thermal/thermal_zone*/temp -```` - -***BEFORE TESTS***: -To setup the CPU run ***`sudo ./setup_cpu.sh`*** before your tests. To see that the change worked -and the temperatures hold stable use the `./watch_cpu.sh` script. - -### Map interrupts to core 0 - -Interrupts can infer with our benchmarks. We therefore map them to core 0 if possible and run our tests on -cores 1 to 7. - -Create a file called 'map_interrupts_core_0.sh' and modify it with 'chmod +x map_interrupts_core_0.sh': -```shell script -#!/bin/bash - -echo "Try to map interrupts to core 0." -echo "Some might fail because they can not be mapped (e.g. core specific timers)." -echo "" -echo "" - -echo 1 > /proc/irq/default_smp_affinity -for dir in /proc/irq/*/ -do - echo "Mapping $dir ..." - echo 1 > $dir/smp_affinity -done -``` - -***BEFORE TESTS***: map the interrupts to core 0 using ***`sudo ./map_interrupts_core_0.sh`*** - -### Full time slices to RT scheduler - -The RT scheduler in linux by default leaves some fraction of its scheduling time to non RT processes, -leaving the system in a responsive state if a RT application eats all CPU. We do not want this, as we -try to get a very predictable behavior in our RT scheduler. - -Create a file called 'setup_rt.sh' and modify it with 'chmod +x setup_rt.sh': -```shell script -#!/bin/bash - -sysctl -w kernel.sched_rt_runtime_us=1000000 -sysctl -w kernel.sched_rt_period_us=1000000 -```` - -***BEFORE TESTS***: give full time slices to RT tasks ***`sudo ./setup_rt.sh`*** - -## Running Tests - -***Before running tests make sure to run the following scripts:*** -- `sudo ./setup_cpu.sh` -- `sudo ./map_interrupts_core_0.sh` -- `sudo ./setup_rt.sh` - -To run the tests use the following (or a similar command with different rt policy): - -`sudo chrt --fifo 90 cset shield -e --user= \-- ` - -This maps the process to all cores but core 0 and runs them using the desired real time schedule and priority. - -We found that interactive sessions can cause huge latency spices even with this separation, -therefore we advise on starting the benchmarks and then leaving the system alone until they are done. diff --git a/Hardware.md b/Hardware.md new file mode 100644 index 0000000..4206139 --- /dev/null +++ b/Hardware.md @@ -0,0 +1,241 @@ +# Setup Hardware (Banana PI and Drdroid XU4) for benchmarking + +The goal of this documentation is to get a linux image running on a +bananaPI board that allows for very isolated benchmarks showing full +time distributions of the measurement runs. + +## Base Setup + +First step is to get a linux image running on the hardware platforms. We choose to use the +[armbian](https://www.armbian.com/) project as it is the only one with support for the +both boards. For this we generally +[follow the instructions given](https://docs.armbian.com/Developer-Guide_Build-Preparation/), +below are notes on what to do to get the rt kernel patch into it and to build. + +You can also use our pre-build image [on google drive](https://drive.google.com/open?id=1RiHymBO_XjOk5tMAL31iOSJGfncrWFQh) +and skip the build process below. Just use etcher (https://www.balena.io/etcher/) or similar, +flash an sd card and the PI should boot up. Default login is root/1234, follow the instructions, +then continue with the isolating system setup steps for more accurate measurements. + +General Setup: +- Setup an ubuntu bionic 18.04 virtual box VM +- `# apt-get -y -qq install git` +- `$ git clone --depth 1 https://github.com/armbian/build` +- `$ cd build` +- To verify the environment first go for a 'clean build' without patch: `# ./compile.sh` +- Select the bananaPI m3/odroid xu4 board and a minimal console build +- NOTE: To this date, the legacy kernel (4.14.y) has to be selected for the odroid xu4 board + +Apply RT Pach: +- Find the current kernel version armbian is using, e.g. from the previous build logs +- Download and unpack the matching rt path from https://mirrors.edge.kernel.org/pub/linux/kernel/projects/rt/ +- You should have a single .patch file, place it in build/userpatches/kernel/sunix-current/patch-5.4.28-rt19.patch +- Re-run the `# ./compile.sh` script +- Select BananaPI M3/Odroid XU4, Command Line Minimal and SHOW KERNEL CONFIG +- The build should pick up the patch (and show it in the logs) +- You will be ask to fill in some settings. Choose (4) fully preemptive at the first option +- Fill out the other asked settings to your liking. To avoid issues just leave them at default. +- You will then be in the kernel config window + - BananaPI: Here disable the file systems AUFS and NFS in the settings (they cause build issues and we do not need them) + - Odroid: Disable Heterogeneous Multi-Processing (HMP) in the settings (does not work with preempt_rt yet) +- Store the settings and build the image +- If successfull, the flashed image should show the preempt patch with `uname -a` and should have good latencies in cyclictest + +## Run project + +First setup some base dependencies for running the benchmark and tests: + +- `# apt-get install rt-tests` +- `# apt-get install build-essential` +- `# apt-get install cmake` +- `# apt-get install git` +- `# apt-get install cpuset` + +Next EMBB is required as a comparison in the benchmark suite. Install it using the following or similar +(as described on their github page, https://github.com/siemens/embb): +- `$ wget https://github.com/siemens/embb/archive/v1.0.0.zip` +- `$ unzip v1.0.0.zip` +- `$ cd embb-1.0.0` +- `$ mkdir cmake-build-release` +- `$ cd cmake-build-release` +- `$ cmake ../` +- `$ cmake --build .` +- `# cmake --build . --target install` + +This are all dependencies needed for executing the benchmark project and pls itself. +Follow the project specific instructions for how to use them. + +## Tweaking Scheduler, CPU and Interrupts + +We would like to get very little dispersion through system jitter. We recommend tweaking the +scheduler, CPU and interrupt settings before running benchmarks. + +See the sub-sections below for the individual measures. ***Before running tests make sure to +run the following scripts:*** +- `sudo ./setup_cpu.sh` +- `sudo ./map_interrupts_core_0.sh` +- `sudo ./setup_rt.sh` +- `sudo ./setup_cgroups.sh` + +Then start your tests manually mapped to cores 1 to 7. We also found that having any interactive sessions +open during the measurements (especially) + +### Tuning kernel parameters + +Some online references advice on some kernel parameter tweaks for getting better latencies. +To change kernel parameters edit the `boot/armbianEnv.txt` file and add a line with +`extraargs=`. + +Here are some good articles discussing jitter on linux systems: +- https://www.codethink.co.uk/articles/2018/configuring-linux-to-stabilise-latency/ (General Tips and Measurements) +- https://access.redhat.com/sites/default/files/attachments/201501-perf-brief-low-latency-tuning-rhel7-v1.1.pdf (7 - Kernel Command Line) +- https://access.redhat.com/articles/65410 (Power Management/C-States) +- https://community.mellanox.com/s/article/rivermax-linux-performance-tuning-guide--1-x (General Tips) + +We use the following settings: +```shell script +mce=ignore_ce nosoftlockup nmi_watchdog=0 transparent_hugepage=never processor.max_cstate=1 idle=poll nohz=on nohz_full=1-7 +``` + +- ***mce=ignore_ce*** do not scan for hw errors. Reduce the jitter introduced by periodic runs +- ***nosoftlockup*** do not log backtraces for tasks hogging the cpu over some time. This, again, reduces jitter and we do not need the function in our controlled test environment. +- ***nmi_watchdog=0*** disables the nmi watchdog on architectures that support it. Esentially disables a non blockable interrup that is used to detect hanging/stuck systems. We do not need this check during our benchmarks. https://medium.com/@yildirimabdrhm/nmi-watchdog-on-linux-ae3b4c86e8d8 +- ***transparent_hugepage=never*** do not scan for small pages to combine to hugepages. We have no issues with memory usage, spare us of this periodic jitter. +- ***processor.max_cstate=1 idle=poll*** do not switch to CPU power saving modes (c-states). Just run all cores at full speed all the time (we do not care about energy during our tests). +- ***nohz=on nohz_full=1-7*** disable houskeeping os ticks on our isolated benchmark cores. core 0 will handle these when needed. + +### Pin all other processes to core 0 (crgoups) + +We want to isolate our measurements to cores 1 to 7 and use core 0 for all non benchmark related processes. +isolcpus is often used for this, however, we found that it disables the scheduler from balancing tasks +between the isolated cores. A better approach is to use cgroups. +See the tutorial for further information: https://github.com/lpechacek/cpuset/blob/master/doc/tutorial.txt +Essentially, we can partition our cores into two isolated groups, then map all tasks that can be moved away from +our benchmark cores, to ensure low influence of background tasks. Cgroups also nicely interact with +the real time scheduler, as described here https://www.linuxjournal.com/article/10165, because +they allow to adapt the scheduler to ignore the other cores in its decision making process. +Note the exclusive cpu groups in this output: +```shell script +florian@bananapim3:~$ cset set +cset: + Name CPUs-X MEMs-X Tasks Subs Path + ------------ ---------- - ------- - ----- ---- ---------- + root 0-7 y 0 y 116 2 / + user 1-7 y 0 n 0 0 /user + system 0 y 0 n 58 0 /system +``` + +Create a file called 'setup_cgroups.sh' and modify it with 'chmod +x setup_cgroups.sh': +```shell script +#!/bin/bash + +sudo cset shield --cpu=1-7 -k on +``` + +This will isolate cores 1 to 7 for our benchmarks. To run the benchmarks on these cores use the following +or a similar command: `sudo chrt --fifo 90 cset shield -e --user= \-- ` + + +### CPU frequency + +Limiting the frequency to 1GHz makes sure that the banana PI dose not throttle during the tests. +Additionally, disabling any dynamic frequency scaling makes tests more reproducable. + +Create a file called 'setup_cpu.sh' and modify it with 'chmod +x setup_cpu.sh': +```shell script +#!/bin/bash + +echo "Writing frequency utils settings file..." +echo "ENABLE=true +MIN_SPEED=1412000 +MAX_SPEED=1412000 +GOVERNOR=performance" > /etc/default/cpufrequtils + +echo "Restarting frequency utils service..." +systemctl restart cpufrequtils + +echo "Done!" +echo "Try ./watch_cpu.sh to see if everything worked." +echo "Test your cooling by stressing the cpu and watching the temperature output." +``` + +Create a file called 'watch_cpu.sh' and modify it with 'chmod +x watch_cpu.sh': +````shell script +#!/bin/bash + +echo "Min/Max Frequencies" +cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_min_freq +echo "-----" +cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_max_freq + +echo "Scaling Min/Max Frequencies" +cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq +echo "-----" +cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq + +echo "Actual Frequencies" +cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq + +echo "Temps.." +cat /sys/class/thermal/thermal_zone*/temp +```` + +***BEFORE TESTS***: +To setup the CPU run ***`sudo ./setup_cpu.sh`*** before your tests. To see that the change worked +and the temperatures hold stable use the `./watch_cpu.sh` script. + +### Map interrupts to core 0 + +Interrupts can infer with our benchmarks. We therefore map them to core 0 if possible and run our tests on +cores 1 to 7. + +Create a file called 'map_interrupts_core_0.sh' and modify it with 'chmod +x map_interrupts_core_0.sh': +```shell script +#!/bin/bash + +echo "Try to map interrupts to core 0." +echo "Some might fail because they can not be mapped (e.g. core specific timers)." +echo "" +echo "" + +echo 1 > /proc/irq/default_smp_affinity +for dir in /proc/irq/*/ +do + echo "Mapping $dir ..." + echo 1 > $dir/smp_affinity +done +``` + +***BEFORE TESTS***: map the interrupts to core 0 using ***`sudo ./map_interrupts_core_0.sh`*** + +### Full time slices to RT scheduler + +The RT scheduler in linux by default leaves some fraction of its scheduling time to non RT processes, +leaving the system in a responsive state if a RT application eats all CPU. We do not want this, as we +try to get a very predictable behavior in our RT scheduler. + +Create a file called 'setup_rt.sh' and modify it with 'chmod +x setup_rt.sh': +```shell script +#!/bin/bash + +sysctl -w kernel.sched_rt_runtime_us=1000000 +sysctl -w kernel.sched_rt_period_us=1000000 +```` + +***BEFORE TESTS***: give full time slices to RT tasks ***`sudo ./setup_rt.sh`*** + +## Running Tests + +***Before running tests make sure to run the following scripts:*** +- `sudo ./setup_cpu.sh` +- `sudo ./map_interrupts_core_0.sh` +- `sudo ./setup_rt.sh` + +To run the tests use the following (or a similar command with different rt policy): + +`sudo chrt --fifo 90 cset shield -e --user= \-- ` + +This maps the process to all cores but core 0 and runs them using the desired real time schedule and priority. + +We found that interactive sessions can cause huge latency spices even with this separation, +therefore we advise on starting the benchmarks and then leaving the system alone until they are done. -- libgit2 0.26.0