BANANAPI.md

# Setup BananaPI for benchmarking

The goal of this documentation is to get a linux image running on a
bananaPI board that allows for very isolated benchmarks showing full
time distributions of the measurement runs.

## Base Setup

First step is to get a linux image running on the banana PI. We choose to use the
[armbian](https://www.armbian.com/) project as it is the only one with support for the
bananaPI m3. For this we generally 
[follow the instructions given](https://docs.armbian.com/Developer-Guide_Build-Preparation/),
below are notes on what to do to get the rt kernel patch into it and to build.

You can also use [our pre-build image](https://drive.google.com/open?id=1RiHymBO_XjOk5tMAL31iOSJGfncrWFQh)
and skip the build process below. Just use etcher (https://www.balena.io/etcher/) or similar,
flash an sd card and the PI should boot up. Default login is root/1234, follow the instructions,
then continue with the isolating system setup steps for more accurate measurements.

General Setup:
- Setup an ubuntu bionic 18.04 virtual box VM
- `# apt-get -y -qq install git`
- `$ git clone --depth 1 https://github.com/armbian/build`
- `$ cd build`
- To verify the environment first go for a 'clean build' without patch: `# ./compile.sh` 
- Select the bananaPI m3 board and a minimal console build

Apply RT Pach:
- Find the current kernel version armbian is using, e.g. from the previous build logs
- Download and unpack the matching rt path from https://mirrors.edge.kernel.org/pub/linux/kernel/projects/rt/
- You should have a single .patch file, place it in build/userpatches/kernel/sunix-current/patch-5.4.28-rt19.patch
- Re-run the `# ./compile.sh` script
- Select BananaPI M3, Command Line Minimal and SHOW KERNEL CONFIG
- The build should pick up the patch (and show it in the logs)
- You will be ask to fill in some settings. Choose (4) fully preemptive at the first option
- Fill out the other asked settings to your liking. To avoid issues just leave them at default.
- You will then be in the kernel config window
- Here disable the file systems AUFS and NFS in the settings (they cause build issues and we do not need them)
- Store the settings and build the image
- If successfull, the flashed image should show the preempt patch with `uname -a` and should have good latencies in cyclictest

## Run project

First setup some base dependencies for running the benchmark and tests:

- `# apt-get install rt-tests`
- `# apt-get install build-essential`
- `# apt-get install cmake`

Next EMBB is required as a comparison in the benchmark suite. Install it using the following or similar
(as described on their github page, https://github.com/siemens/embb):
- `$ wget https://github.com/siemens/embb/archive/v1.0.0.zip`
- `$ unzip v1.0.0.zip`
- `$ cd embb-1.0.0`
- `$ mkdir cmake-build-release`
- `$ cd cmake-build-release`
- `$ cmake ../`
- `$ cmake --build .`
- `# cmake --build . --target install`

This are all dependencies needed for executing the benchmark project and pls itself.
Follow the project specific instructions for how to use them.

## Tweaking Scheduler, CPU and Interrupts

We would like to get very little dispersion through system jitter. We recommend tweaking the
scheduler, CPU and interrupt settings before running benchmarks.

See the sub-sections below for the individual measures. ***Before running tests make sure to
run the following scripts:***
- `sudo ./setup_cpu.sh`
- `sudo ./map_interrupts_core_0.sh`
- `sudo ./setup_rt.sh`

Then start your tests manually mapped to cores 1 to 7.

### Pin all other processes to core 0

To further reduce inference with our controlled benchmark environment we map all non related
processes to core 0 of the system, running our benchmarks on cores 1 to 7.

The system uses system.d, which makes this the simplest point to change the default process affinity.
Edit the file `/etc/systemd/system.conf` and set `CPUAffinity=0`. This will make all processes forked
from system.d run on core 0. Benchmarks can then be manually mapped to different cores.

***BEFORE TESTS***: to make the config apply ***restart your system***

### CPU frequency

Limiting the frequency to 1GHz makes sure that the banana PI dose not throttle during the tests.
Additionally, disabling any dynamic frequency scaling makes tests more reproducable.

Create a file called 'setup_cpu.sh' and modify it with 'chmod +x setup_cpu.sh':
```shell script
#!/bin/bash

echo "Writing frequency utils settings file..."
echo "ENABLE=true
MIN_SPEED=1412000
MAX_SPEED=1412000
GOVERNOR=performance" > /etc/default/cpufrequtils

echo "Restarting frequency utils service..."
systemctl restart cpufrequtils

echo "Done!"
echo "Try ./watch_cpu.sh to see  if everything worked."
echo "Test your cooling by stressing the cpu and watching the temperature output."
```

Create a file called 'watch_cpu.sh' and modify it with 'chmod +x watch_cpu.sh':
````shell script
#!/bin/bash

echo "Min/Max Frequencies"
cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_min_freq
echo "-----"
cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_max_freq

echo "Scaling Min/Max Frequencies"
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
echo "-----"
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq

echo "Actual Frequencies"
cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq

echo "Temps.."
cat /sys/class/thermal/thermal_zone*/temp
````

***BEFORE TESTS***:
To setup the CPU run ***`sudo ./setup_cpu.sh`*** before your tests. To see that the change worked
and the temperatures hold stable use the `./watch_cpu.sh` script.

### Map interrupts to core 0

Interrupts can infer with our benchmarks. We therefore map them to core 0 if possible and run our tests on
cores 1 to 7.

Create a file called 'map_interrupts_core_0.sh' and modify it with 'chmod +x map_interrupts_core_0.sh':
```shell script
#!/bin/bash

echo "Try to map interrupts to core 0."
echo "Some might fail because they can not be  mapped (e.g. core specific timers)."
echo ""
echo ""

for dir in /proc/irq/*/
do
	echo "Mapping $dir ..."
	echo 1 > $dir/smp_affinity
done
```

***BEFORE TESTS***: map the interrupts to core 0 using ***`sudo ./map_interrupts_core_0.sh`***

### Full time slices to RT scheduler

The RT scheduler in linux by default leaves some fraction of its scheduling time to non RT processes,
leaving the system in a responsive state if a RT application eats all CPU. We do not want this, as we
try to get a very predictable behavior in our RT scheduler.

Create a file called 'setup_rt.sh' and modify it with 'chmod +x setup_rt.sh':
```shell script
#!/bin/bash

sysctl -w kernel.sched_rt_runtime_us=1000000
sysctl -w kernel.sched_rt_period_us=1000000
````

***BEFORE TESTS***: give full time slices to RT tasks ***`sudo ./setup_rt.sh`***

## Running Tests

***Before running tests make sure to run the following scripts:***
- `sudo ./setup_cpu.sh`
- `sudo ./map_interrupts_core_0.sh`
- `sudo ./setup_rt.sh`

To run the tests use the following (or a similar command with different rt policy):

`taskset FFFE chrt --fifo 80 sudo -u $SUDO_USER <benchmark>`

This maps the process to all cores but core 0 and runs them using the round robin real time schedule.
Rplace -rr with --fifo to use the first in first out scheduler.