- 24 Mar, 2020 1 commit
-
-
It is now possible to use a memory mapped stack that throws a SIGSEV if thes coroutine stacks are exhausted.
FritzFlorian committed
-
- 18 Mar, 2020 1 commit
-
-
Remove the strict static memory allocation scheme in favour of placing objects on the heap at startup. This still keeps the requirements posed for modern, high performance embedded systems, but makes APIs a lot cleaner.
FritzFlorian committed
-
- 13 Mar, 2020 2 commits
-
-
The cas size could exceed an unsigned long, so we use the correct cas_integer type for the traded_cas_field representations.
FritzFlorian committed -
We fixed the bug in tsan causing it to crash after creating/deleting many fibers, because of that there is no need for this cache mechanism (you have to use the most recent clang build with the patch for it to work thought).
FritzFlorian committed
-
- 23 Feb, 2020 1 commit
-
-
We yield after num_thread failed steals in a row. This parameter can be tuned for better performance, but we stick to a sensible default just to prevent massive spinning.
FritzFlorian committed
-
- 09 Feb, 2020 2 commits
-
-
FritzFlorian committed
-
FritzFlorian committed
-
- 05 Feb, 2020 2 commits
-
-
FritzFlorian committed
-
Tsan does not cope well with rapidely destroyed/created fibers. As it is currently too much effort to fully investigate the tsan issue we work around it by caching the shourt lived fibers based on their stack base address. This allows us to use thread sanitizer for now.
FritzFlorian committed
-
- 03 Feb, 2020 1 commit
-
-
FritzFlorian committed
-
- 30 Jan, 2020 3 commits
-
-
Older CMAKE versions wont work with export targets in different directories. For now we simply add the context_switcher manually to the export target of pls.
FritzFlorian committed -
FritzFlorian committed
-
We still see very sporadic crashes, however the current version is at least a starting point for refactoring and debugging. Next steps have to be to re-enable tooling support (i.e. add code to let sanitizers do their work).
FritzFlorian committed
-
- 29 Jan, 2020 1 commit
-
-
The current version has race conditions and is hard to debug (especially because of the fibers, if a wrong thread executes on a fiber we get segfalts very fast). To combat this mess we now refactor the code bit by bit while also adding tests where it can be done with reasonably effort).
FritzFlorian committed
-
- 27 Jan, 2020 2 commits
-
-
The project is currently really messy and there are sporadic sigsevs. This indicates that we still have a race in our code. Thread Sanitizer does not work with our current implementation, as it needs annotations for fibers. The next step is to clean up the project and maybe add thread sanitizer support to our fiber implementation. This should help finding the remaining bugs.
FritzFlorian committed -
FritzFlorian committed
-
- 26 Jan, 2020 2 commits
-
-
FritzFlorian committed
-
FritzFlorian committed
-
- 24 Jan, 2020 2 commits
-
-
The deque trades tasks when stealing. Right now only the fast local path is tested and implemented. For the next step to work we also need to add the resource stack and resource tarding to the system.
FritzFlorian committed -
The current state shows the minimum actions taken to execute a parallel call: get the thread local, find the active frame, execute on the next frame and return to the active frame.
FritzFlorian committed
-
- 23 Jan, 2020 2 commits
-
-
FritzFlorian committed
-
The rationale to do an custom implementation is that the existing solutions are quite a bit slower and/or require more memory.
FritzFlorian committed
-
- 20 Dec, 2019 1 commit
-
-
FritzFlorian committed
-
- 05 Dec, 2019 1 commit
-
-
The idea is to exclude as many sources as possible that could lead to issues with contention and cache misses. After some experimentation, we think that hyperthreading is simply not working very well with our kind of workload. In the future we might simply test on other hardware.
FritzFlorian committed
-
- 04 Dec, 2019 1 commit
-
-
FritzFlorian committed
-
- 02 Dec, 2019 1 commit
-
-
FritzFlorian committed
-
- 29 Nov, 2019 3 commits
-
-
This version runs through our initial fft and fib tests. However, it is not tested further in any way. Additionally, we added a locking deque, potentially hurting performance and moving away from our initial goal.
FritzFlorian committed -
The main issue seems to still be the fact that we have a lock free protocol where a steal can be pending. We plan to remove this fact next by introducing a protocol that works on a single atomic update.
FritzFlorian committed -
The start_chain property does not make sense, as chains are purely 'virtual', i.e. they only fully exist when walking through the computation (by patching them on important events). We initially added the property as a helper for better runtime and simpler implementation, but we think without it we will not get as much inconsistency in the runtime state. Performance can be 're-added' later on.
FritzFlorian committed
-
- 27 Nov, 2019 2 commits
-
-
FritzFlorian committed
-
It is still not working, however we now have no more redundant code, making debugging it simpler.
FritzFlorian committed
-
- 25 Nov, 2019 1 commit
-
-
We changed up some of the memory constraints in the lock free deque and will need to see if this is ok. If so, the single threaded performance looks very good.
FritzFlorian committed
-
- 19 Nov, 2019 1 commit
-
-
Everything so far is untested. We only made sure tha fast path still seems to function correctly. Next up is writing tests for both the fast and slow path to then introduce the slow path. After that we can look at performance optimizations.
FritzFlorian committed
-
- 07 Nov, 2019 1 commit
-
-
This showcases the expected performance when a task executes a sub-tree without inference from other threads. We target to stay about 6x slower than a normal function call.
FritzFlorian committed
-
- 06 Nov, 2019 3 commits
-
-
FritzFlorian committed
-
FritzFlorian committed
-
This first sketch of the classes captures what we think is needed in terms of general interface and very mich WIP.
FritzFlorian committed
-
- 05 Nov, 2019 1 commit
-
-
We changed how the memory is allocated from passing char* buffers to then store objects into to creating 'fat objects' for all scheduler state. This eases development for us, as we can make changes to data structures without too much effort (e.g. add a second array to manage tasks if required).
FritzFlorian committed
-
- 02 Oct, 2019 1 commit
-
-
Our stack is not calling deconstructors of its elements. This is problematic for e.g. the graph implementation where reference counted images are hold in tasks. To solve this for now we manually call the deconstructor after each tasks (we do so, because a generic, virtual deconstructor adds runtime costs to primitive tasks, requiring us to re-run all benchmarks; with this change we do not need to do this and as we re-work the scheduler anyways we postpone a clean implementation for then).
FritzFlorian committed
-
- 01 Oct, 2019 1 commit
-
-
FritzFlorian committed
-