NOTES.md 5.79 KB
Newer Older
1 2 3 4 5 6
# Notes

A collection of stuff that we noticed during development.
Useful later on two write a project report and to go back
in time to find out why certain decisions where made.

7 8 9 10 11 12 13 14 15 16 17 18 19
## 09.02.2019 - Cache Alignment

Aligning the cache needs all parts (both data types with correct alignment
and base memory with correct alignment).

Our first tests show that the initial alignment (Commit 3535cbd8),
boostet the performance in the fft_benchmark from our library to
Intel TBB's speedup when running on up to 4 threads.
When crossing the boundary to hyper-threading this falls of.
We therefore think that contemption/cache misses are the reason for
bad performance above 4 threads, but have to investigate further to
pin down the issue.

20 21 22 23 24 25 26 27
## 08.04.2019 - Random Numbers

We decided to go for a simple linear random number generator
as [std::minstd_rand](http://www.cplusplus.com/reference/random/minstd_rand/),
as this requires less memory and is faster. The decreased quality
in random numbers is probably ok (read up if there is literature on this),
as work stealing does not rely on a mathematically perfect distribution.

28 29 30 31 32 33 34 35 36 37
## 02.04.2019 - CMake Export

We built our project using CMake to make it portable and easy to setup.
To allow others to use our library we need to make it installable on
other systems. For this we use CMake's install feature and
a [tutorial](https://pabloariasal.github.io/2018/02/19/its-time-to-do-cmake-right/)
on how to correctly configure a CMake library to be included by other
projects.

## 28.03.2019 - custom new operators
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61

When initializing sub_tasks we want to place them on our custom
'stack like' data structure per thread. We looked at TBB's API
and noticed them somehow implicitly setting parent relationships
in the new operator. After further investigation we see that the
initialization in this manner is a 'hack' to avoid passing
of references and counters.

It can be found at the bottom of the `task.h` file:

```C++
inline void *operator new( size_t bytes, const tbb::internal::allocate_child_proxy& p ) {
    return &p.allocate(bytes);
}

inline void operator delete( void* task, const tbb::internal::allocate_child_proxy& p ) {
    p.free( *static_cast<tbb::task*>(task) );
}
```

It simlpy constructs a temp 'allocator type' passed as the second
argument to new. This type then is called in new and
allocates the memory required.

62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91
## 27.03.2019 - atomics

C++ 11 offers atomics, however these require careful usage
and are not always lock free. We plan on doing more research
for these operations when we try to transform our code form using
spin locks to using more fine grained locks.

Resources can be found [here](https://www.justsoftwaresolutions.co.uk/files/ndc_oslo_2016_safety_off.pdf)
and [here](http://www.modernescpp.com/index.php/c-core-guidelines-the-remaining-rules-to-lock-free-programming).

## 27.03.2019 - variable sized lambdas

When working with lambdas one faces the problem of them having not
a fixed size because they can capture variables from the surrounding
scope.

To 'fix' this in normal C++ one would use a std::function,
wrapping the lambda by moving it onto the heap. This is of course
a problem when trying to prevent dynamic memory allocation.

When we want static allocation we have two options:
1) keep the lambda on the stack and only call into it while it is valid
2) use templating to create variable sized classes for each lambda used

Option 1) is preferable, as it does not create extra templating code
(longer compile time, can not separate code into CPP files). However
we can encounter situations where the lambda is not on the stack when
used, especially when working with sub-tasks.

## 21.03.2019 - Allocation on stack/static memory
92 93 94 95 96 97

We can use the [placement new](https://www.geeksforgeeks.org/placement-new-operator-cpp/)
operator for our tasks and other stuff to manage memory.
This can allow the pure 'stack based' approach without any memory
management suggested by mike.

98
## 20.03.2019 - Prohibit New
99 100 101 102 103 104 105 106 107 108 109 110

We want to write this library without using any runtime memory
allocation to better fit the needs of the embedded marked.
To make sure we do not do so we add a trick:
we link an new implementation into the project (when testing) that
will cause an linker error if new is used somewhere.
If the linker reports such an error we can switch to debugging
by using a new implementation with a break point in it.

That way we for example ruled out std::thread, as we found the dynamic
memory allocation used in it.

111
## 20.03.2019 - callable objects and memory allocation / why we use no std::thread
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138

When working with any sort of functionality that can be passed
to an object or function it is usually passed as:
1. an function pointer and a list of parameter values
2. an lambda, capturing any surrounding parameters needed

When we want to pass ANY functionality (with any number of parameters
or captured variables) we can not determine the amount of memory before
the call is done, making the callable (function + parameters) dynamicly
sized.

This can be a problem when implementing e.g. a thread class,
as the callable has to be stored somewhere. The **std::thread**
implementation allocates memory at runtime using **new** when
called with any form of parameters for the started function.
Because of this (and because the implementation can differ from
system to system) we decided to not provide an **std::thread** backend
for our internal thread class (that does not use dynamic memory,
as it lives on the stack, knowing its size at compile time using
templates).

Lambdas can be used, as long as we are sure the outer scope still exists
while executing (lambda lies in the callers stack), or if we copy the
lambda manually to some memory that we know will persist during the call.
It is important to know that the lambda wont be freed while it is
used, as the captured variables used inside the body are held in the
lambda object.