README.md 7.9 KB
Newer Older
FritzFlorian committed
1 2 3
<img src="./media/logo.png" height="200"/>


4 5
**P**redictable **P**arallel **P**atterns **L**ibrary for **S**calable **S**mart **S**ystems

6
[![pipeline status](http://lab.las3.de/gitlab/las3/development/scheduling/predictable_parallel_patterns/badges/master/pipeline.svg)](http://lab.las3.de/gitlab/las3/development/scheduling/predictable_parallel_patterns/commits/master)
7

8 9 10 11
## Getting Started

This section will give a brief introduction on how to get a minimal
project setup that uses the PLS library.
12
Further [general notes](NOTES.md) and [performance notes](PERFORMANCE-v2.md) can be found in
13
their respective files.
14 15 16

### Installation

17 18 19 20 21
PLS has no external dependencies. To compile and install it you
only need cmake and a recent C++ 17 compatible compiler.
Care might be required on not explicitly supported systems
(currently we support Linux x86 and ARMv7).

22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
Clone the repository and open a terminal session in its folder.
Create a build folder using `mkdir cmake-build-release`
and switch into it `cd cmake-build-release`.
Setup the cmake project using `cmake ../ -DCMAKE_BUILD_TYPE=RELEASE`,
then install it as a system wide dependency using `sudo make install.pls`.

At this point the library is installed on your system.
To use it simply add it to your existing cmake project using
`find_package(pls REQUIRED)` and then link it to your project
using `target_link_libraries(your_target pls::pls)`.

### Basic Usage

```c++
#include <pls/pls.h>
#include <iostream>

39 40 41 42 43
// Static memory allocation (see execution trees for how to configure)
static const int MAX_NUM_TASKS = 32;
static const int MAX_STACK_SIZE = 4096;
static const int NUM_THREADS = 8;

44 45 46
long fib(long n);

int main() {
47 48 49 50 51 52 53 54 55 56 57 58 59
  // Create a scheduler with the static amount of resources.
  // All memory and system resources are allocated here.
  pls::scheduler scheduler{NUM_THREADS, MAX_NUM_TASKS, MAX_STACK_SIZE};

  // Wake up the thread pool and perform work.
  scheduler.perform_work([&] {
    long result = fib(20);
    std::cout << "fib(20)=" << result << std::endl;
  });
  // At this point the thread pool sleeps.
  // This can for example be used for periodic work.

  // The scheduler is destroyed at the end of the scope
60 61 62
}

long fib(long n) {
63 64 65 66 67 68 69 70 71 72 73 74
  if (n <= 1) {
    return n;
  }

  // Example for the high level API.
  // Will run both functions in parallel as separate tasks.
  int a, b;
  pls::invoke(
      [&a, n] { a = fib(n - 1); },
      [&b, n] { b = fib(n - 2); }
  );
  return a + b;
75 76 77
}
```

78 79 80 81
### Execution Trees and Static Resource Allocation

TODO: For the static memory allocation you need to find the maximum required resources.

82 83 84 85
## Project Structure

The project uses [CMAKE](https://cmake.org/) as it's build system,
the recommended IDE is either a simple text editor or [CLion](https://www.jetbrains.com/clion/).
86
We divide the project into sub-targets to separate for the library
87
itself, testing and example code. The library itself can be found in
88 89
`lib/pls`, the context switching implementation in `lib/context_switcher`,
testing related code is in `test`, example and playground/benchmark apps are in `app`.
90 91 92 93 94 95 96 97 98 99 100 101 102 103

### Buiding

To build the project first create a folder for the build
(typically as a subfolder to the project) using `mkdir cmake-build-debug`.
Change to the new folder `cd cmake-build-debug` and init the cmake
project using `cmake ../ -DCMAKE_BUILD_TYPE=DEBUG`. For realease builds
do the same only with build type `RELEASE`. Other build time settings
can also be passed at this setup step.

After this is done you can use normal `make` commands like
`make` to build everything `make <target>` to build a target
or `make install` to install the library globally.

104
Available Settings:
105 106 107 108 109 110 111 112 113
- `-DPLS_PROFILER=ON/OFF`
    - default OFF
    - Enabling it will record execution DAGs with memory and runtime stats
    - Enabling has a BIG performance hit (use only for development)
- `-DSLEEP_WORKERS=ON/OFF`
    - default OFF
    - Enabling it will make workers keep a central 'all workers empty flag'
    - Workers try to sleep if there is no work in the system
    - Has performance impact on isolated runs, but can benefit multiprogrammed systems
114 115 116 117 118 119 120 121 122 123 124 125 126 127
- `-DEASY_PROFILER=ON/OFF`
    - default OFF
    - Enabling will link the easy profiler library and enable its macros
    - Enabling has a performance hit (do not use in releases)
- `-DADDRESS_SANITIZER=ON/OFF`
    - default OFF
    - Enables address sanitizer to be linked to the executable
    - Only one sanitizer can be active at once
    - Enabling has a performance hit (do not use in releases)
- `-DTHREAD_SANITIZER=ON/OFF`
    - default OFF
    - Enables thread/datarace sanitizer to be linked to the executable
    - Only one sanitizer can be active at once
    - Enabling has a performance hit (do not use in releases)
128
- `-DDEBUG_SYMBOLS=ON/OFF`
129 130 131
    - default OFF
    - Enables the build with debug symbols
    - Use for e.g. profiling the release build
132

133 134 135 136 137
Note that these settings are persistent for one CMake build folder.
If you e.g. set a flag in the debug build it will not influence
the release build, but it will persist in the debug build folder
until you explicitly change it back.

138 139 140 141 142 143
### Testing

Testing is done using [Catch2](https://github.com/catchorg/Catch2/)
in the test subfolder. Tests are build into a target called `tests`
and can be executed simply by building this executabe and running it.

144 145 146 147 148 149 150 151 152 153
### PLS profiler

The PLS profiler records the DAG for each scheduler invocation.
Stats can be queried form it and it can be printed in .dot format,
which can later be rendered by the dot software to inspect the actual
executed graph.

The most useful tools are to analyze the maximum memory required per
coroutine stack, th computational depth, T_1 and T_inf.

154 155
### Data Race Detection

156 157 158
WARNING: the latest build of clang/thread sanitizer is required for this to work,
as a recent bug-fix regarding user level thread is required!

159 160 161 162 163
As this project contains a lot concurrent code we use
[Thread Sanitizer](https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual)
in our CI process and optional in other builds. To setup CMake builds
with sanitizer enabled add the cmake option `-DTHREAD_SANITIZER=ON`.
Please regularly test with thread sanitizer enabled and make sure to not
164 165 166
keep the repository in a state where the sanitizer reports errors.

Consider reading [the section on common data races](https://github.com/google/sanitizers/wiki/ThreadSanitizerPopularDataRaces)
167
to get an idea of what we try to avoid in our code.
168

169
### Profiling EasyProfiler
170 171 172 173 174 175 176 177 178 179 180 181 182

To make profiling portable and allow us to later analyze the logs
programaticly we use [easy_profiler](https://github.com/yse/easy_profiler)
for capturing data. To enable profiling install the library on your system
(best building it and then running `make install`) and set the
cmake option `-DEASY_PROFILER=ON`.

After that see the `invoke_parallel` example app for activating the
profiler. This will generate a trace file that can be viewed with
the `profiler_gui <output.prof>` command.

Please note that the profiler adds overhead when looking at sub millisecond
method invokations as we do and it can not replace a seperate
183
profiler like `gperf`, `valgrind` or `vtune amplifier` for detailed analysis.
184 185 186 187 188 189 190
We still think it makes sense to add it in as an optional feature,
as the customizable colors and fine grained events (including collection
of variables) can be used to visualize the `big picture` of
program execution. Also, we hope to use it to log 'events' like
successful and failed steals in the future, as the general idea of logging
information per thread efficiently might be helpful for further
analysis.
191 192 193 194 195 196 197 198


### Profiling VTune Amplifier

For detailed profiling of small performance hotspots we prefer
to use [Intel's VTune Amplifier](https://software.intel.com/en-us/vtune).
It gives insights in detailed microachitecture usage and performance
hotspots. Follow the instructions by Intel for using it.
199 200 201 202
Make sure to enable debug symbols (`-DDEBUG_SYMBOLS=ON`) in the
analyzed build and that all optimizations are turned on
(by choosing the release build).

203