Hardware.md 10.5 KB
Newer Older
1
# Setup Hardware (Banana PI and Drdroid XU4) for benchmarking
2 3 4 5 6 7 8

The goal of this documentation is to get a linux image running on a
bananaPI board that allows for very isolated benchmarks showing full
time distributions of the measurement runs.

## Base Setup

9
First step is to get a linux image running on the hardware platforms. We choose to use the
10
[armbian](https://www.armbian.com/) project as it is the only one with support for the
11
both boards. For this we generally 
12 13 14
[follow the instructions given](https://docs.armbian.com/Developer-Guide_Build-Preparation/),
below are notes on what to do to get the rt kernel patch into it and to build.

FritzFlorian committed
15
You can also use our pre-build image [on google drive](https://drive.google.com/open?id=1RiHymBO_XjOk5tMAL31iOSJGfncrWFQh)
16 17 18 19 20 21 22 23 24 25
and skip the build process below. Just use etcher (https://www.balena.io/etcher/) or similar,
flash an sd card and the PI should boot up. Default login is root/1234, follow the instructions,
then continue with the isolating system setup steps for more accurate measurements.

General Setup:
- Setup an ubuntu bionic 18.04 virtual box VM
- `# apt-get -y -qq install git`
- `$ git clone --depth 1 https://github.com/armbian/build`
- `$ cd build`
- To verify the environment first go for a 'clean build' without patch: `# ./compile.sh` 
26 27
- Select the bananaPI m3/odroid xu4 board and a minimal console build
- NOTE: To this date, the legacy kernel (4.14.y) has to be selected for the odroid xu4 board 
28 29 30 31 32 33

Apply RT Pach:
- Find the current kernel version armbian is using, e.g. from the previous build logs
- Download and unpack the matching rt path from https://mirrors.edge.kernel.org/pub/linux/kernel/projects/rt/
- You should have a single .patch file, place it in build/userpatches/kernel/sunix-current/patch-5.4.28-rt19.patch
- Re-run the `# ./compile.sh` script
34
- Select BananaPI M3/Odroid XU4, Command Line Minimal and SHOW KERNEL CONFIG
35 36 37 38
- The build should pick up the patch (and show it in the logs)
- You will be ask to fill in some settings. Choose (4) fully preemptive at the first option
- Fill out the other asked settings to your liking. To avoid issues just leave them at default.
- You will then be in the kernel config window
39 40
    - BananaPI: Here disable the file systems AUFS and NFS in the settings (they cause build issues and we do not need them)
    - Odroid: Disable Heterogeneous Multi-Processing (HMP) in the settings (does not work with preempt_rt yet)
41 42 43 44 45 46 47 48 49 50
- Store the settings and build the image
- If successfull, the flashed image should show the preempt patch with `uname -a` and should have good latencies in cyclictest

## Run project

First setup some base dependencies for running the benchmark and tests:

- `# apt-get install rt-tests`
- `# apt-get install build-essential`
- `# apt-get install cmake`
51 52
- `# apt-get install git`
- `# apt-get install cpuset`
53 54 55 56 57 58 59 60 61 62 63 64 65 66

Next EMBB is required as a comparison in the benchmark suite. Install it using the following or similar
(as described on their github page, https://github.com/siemens/embb):
- `$ wget https://github.com/siemens/embb/archive/v1.0.0.zip`
- `$ unzip v1.0.0.zip`
- `$ cd embb-1.0.0`
- `$ mkdir cmake-build-release`
- `$ cd cmake-build-release`
- `$ cmake ../`
- `$ cmake --build .`
- `# cmake --build . --target install`

This are all dependencies needed for executing the benchmark project and pls itself.
Follow the project specific instructions for how to use them.
67 68 69 70 71 72 73 74 75 76 77

## Tweaking Scheduler, CPU and Interrupts

We would like to get very little dispersion through system jitter. We recommend tweaking the
scheduler, CPU and interrupt settings before running benchmarks.

See the sub-sections below for the individual measures. ***Before running tests make sure to
run the following scripts:***
- `sudo ./setup_cpu.sh`
- `sudo ./map_interrupts_core_0.sh`
- `sudo ./setup_rt.sh`
78
- `sudo ./setup_cgroups.sh`
79

80 81
Then start your tests manually mapped to cores 1 to 7. We also found that having any interactive sessions
open during the measurements (especially)
82

83
### Tuning kernel parameters
84

85 86 87
Some online references advice on some kernel parameter tweaks for getting better latencies.
To change kernel parameters edit the `boot/armbianEnv.txt` file and add a line with
`extraargs=<your args>`.
88

89 90 91 92 93 94 95
Here are some good articles discussing jitter on linux systems:
- https://www.codethink.co.uk/articles/2018/configuring-linux-to-stabilise-latency/ (General Tips and Measurements)
- https://access.redhat.com/sites/default/files/attachments/201501-perf-brief-low-latency-tuning-rhel7-v1.1.pdf  (7 - Kernel Command Line)
- https://access.redhat.com/articles/65410 (Power Management/C-States)
- https://community.mellanox.com/s/article/rivermax-linux-performance-tuning-guide--1-x (General Tips)

We use the following settings: 
96
```
97
mce=ignore_ce nosoftlockup nmi_watchdog=0 transparent_hugepage=never processor.max_cstate=1 idle=poll nohz=on nohz_full=1-7
98 99 100 101 102 103 104 105 106 107 108 109
```

- ***mce=ignore_ce*** do not scan for hw errors. Reduce the jitter introduced by periodic runs
- ***nosoftlockup*** do not log backtraces for tasks hogging the cpu over some time. This, again, reduces jitter and we do not need the function in our controlled test environment.
- ***nmi_watchdog=0*** disables the nmi watchdog on architectures that support it. Esentially disables a non blockable interrup that is used to detect hanging/stuck systems. We do not need this check during our benchmarks. https://medium.com/@yildirimabdrhm/nmi-watchdog-on-linux-ae3b4c86e8d8
- ***transparent_hugepage=never*** do not scan for small pages to combine to hugepages. We have no issues with memory usage, spare us of this periodic jitter. 
- ***processor.max_cstate=1 idle=poll*** do not switch to CPU power saving modes (c-states). Just run all cores at full speed all the time (we do not care about energy during our tests).
- ***nohz=on nohz_full=1-7*** disable houskeeping os ticks on our isolated benchmark cores. core 0 will handle these when needed.

### Pin all other processes to core 0 (crgoups)

We want to isolate our measurements to cores 1 to 7 and use core 0 for all non benchmark related processes.
110 111 112
isolcpus is often used for this, however, we found that it disables the scheduler from balancing tasks
between the isolated cores. A better approach is to use cgroups. 
See the tutorial for further information: https://github.com/lpechacek/cpuset/blob/master/doc/tutorial.txt
113 114 115 116 117
Essentially, we can partition our cores into two isolated groups, then map all tasks that can be moved away from
our benchmark cores, to ensure low influence of background tasks. Cgroups also nicely interact with
the real time scheduler, as described here https://www.linuxjournal.com/article/10165, because
they allow to adapt the scheduler to ignore the other cores in its decision making process.
Note the exclusive cpu groups in this output:
118
```sh
119 120 121 122 123 124 125 126 127 128
florian@bananapim3:~$ cset set
cset: 
         Name       CPUs-X    MEMs-X Tasks Subs Path
 ------------ ---------- - ------- - ----- ---- ----------
         root        0-7 y       0 y   116    2 /
         user        1-7 y       0 n     0    0 /user
       system          0 y       0 n    58    0 /system
```

Create a file called 'setup_cgroups.sh' and modify it with 'chmod +x setup_cgroups.sh':
129
```sh
130 131 132 133 134 135 136
#!/bin/bash

sudo cset shield --cpu=1-7 -k on
```

This will isolate cores 1 to 7 for our benchmarks. To run the benchmarks on these cores use the following
or a similar command: `sudo chrt --fifo 90 cset shield -e --user=<user> <executable> \-- <arguments>`
137 138 139 140 141 142 143 144


### CPU frequency

Limiting the frequency to 1GHz makes sure that the banana PI dose not throttle during the tests.
Additionally, disabling any dynamic frequency scaling makes tests more reproducable.

Create a file called 'setup_cpu.sh' and modify it with 'chmod +x setup_cpu.sh':
145
```sh
146 147
#!/bin/bash

148 149
echo "Writing frequency utils settings file..."
echo "ENABLE=true
150 151 152
MIN_SPEED=1412000
MAX_SPEED=1412000
GOVERNOR=performance" > /etc/default/cpufrequtils
153 154

echo "Restarting frequency utils service..."
155
systemctl restart cpufrequtils
156 157 158 159 160 161 162

echo "Done!"
echo "Try ./watch_cpu.sh to see  if everything worked."
echo "Test your cooling by stressing the cpu and watching the temperature output."
```

Create a file called 'watch_cpu.sh' and modify it with 'chmod +x watch_cpu.sh':
163
````sh
164 165 166 167 168
#!/bin/bash

echo "Min/Max Frequencies"
cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_min_freq
echo "-----"
169 170
cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_max_freq

171 172 173 174 175
echo "Scaling Min/Max Frequencies"
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
echo "-----"
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq

176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192
echo "Actual Frequencies"
cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq

echo "Temps.."
cat /sys/class/thermal/thermal_zone*/temp
````

***BEFORE TESTS***:
To setup the CPU run ***`sudo ./setup_cpu.sh`*** before your tests. To see that the change worked
and the temperatures hold stable use the `./watch_cpu.sh` script.

### Map interrupts to core 0

Interrupts can infer with our benchmarks. We therefore map them to core 0 if possible and run our tests on
cores 1 to 7.

Create a file called 'map_interrupts_core_0.sh' and modify it with 'chmod +x map_interrupts_core_0.sh':
193
```sh
194 195 196 197 198 199 200
#!/bin/bash

echo "Try to map interrupts to core 0."
echo "Some might fail because they can not be  mapped (e.g. core specific timers)."
echo ""
echo ""

201
echo 1 > /proc/irq/default_smp_affinity
202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217
for dir in /proc/irq/*/
do
	echo "Mapping $dir ..."
	echo 1 > $dir/smp_affinity
done
```

***BEFORE TESTS***: map the interrupts to core 0 using ***`sudo ./map_interrupts_core_0.sh`***

### Full time slices to RT scheduler

The RT scheduler in linux by default leaves some fraction of its scheduling time to non RT processes,
leaving the system in a responsive state if a RT application eats all CPU. We do not want this, as we
try to get a very predictable behavior in our RT scheduler.

Create a file called 'setup_rt.sh' and modify it with 'chmod +x setup_rt.sh':
218
```sh
219 220 221
#!/bin/bash

sysctl -w kernel.sched_rt_runtime_us=1000000
222 223 224 225 226 227 228 229 230 231 232 233 234 235
sysctl -w kernel.sched_rt_period_us=1000000
````

***BEFORE TESTS***: give full time slices to RT tasks ***`sudo ./setup_rt.sh`***

## Running Tests

***Before running tests make sure to run the following scripts:***
- `sudo ./setup_cpu.sh`
- `sudo ./map_interrupts_core_0.sh`
- `sudo ./setup_rt.sh`

To run the tests use the following (or a similar command with different rt policy):

236 237 238
`sudo chrt --fifo 90 cset shield -e --user=<user> <executable> \-- <arguments>`

This maps the process to all cores but core 0 and runs them using the desired real time schedule and priority.
239

240 241
We found that interactive sessions can cause huge latency spices even with this separation,
therefore we advise on starting the benchmarks and then leaving the system alone until they are done.