- 29 Jan, 2020 1 commit
-
-
The current version has race conditions and is hard to debug (especially because of the fibers, if a wrong thread executes on a fiber we get segfalts very fast). To combat this mess we now refactor the code bit by bit while also adding tests where it can be done with reasonably effort).
FritzFlorian committed
-
- 27 Jan, 2020 2 commits
-
-
The project is currently really messy and there are sporadic sigsevs. This indicates that we still have a race in our code. Thread Sanitizer does not work with our current implementation, as it needs annotations for fibers. The next step is to clean up the project and maybe add thread sanitizer support to our fiber implementation. This should help finding the remaining bugs.
FritzFlorian committed -
FritzFlorian committed
-
- 26 Jan, 2020 2 commits
-
-
FritzFlorian committed
-
FritzFlorian committed
-
- 24 Jan, 2020 2 commits
-
-
The deque trades tasks when stealing. Right now only the fast local path is tested and implemented. For the next step to work we also need to add the resource stack and resource tarding to the system.
FritzFlorian committed -
The current state shows the minimum actions taken to execute a parallel call: get the thread local, find the active frame, execute on the next frame and return to the active frame.
FritzFlorian committed
-
- 23 Jan, 2020 4 commits
-
-
FritzFlorian committed
-
FritzFlorian committed
-
The rationale to do an custom implementation is that the existing solutions are quite a bit slower and/or require more memory.
FritzFlorian committed -
FritzFlorian committed
-
- 22 Jan, 2020 1 commit
-
-
The basic calling works, next we measure on both x86 and arm and then decide on how we implement our fiber/'staggered stack' abstraction.
FritzFlorian committed
-
- 21 Jan, 2020 1 commit
-
-
We now cover all implementations that have a chance of being fast. ARM implementations for our 'fast fiber call' are still missing. After we add them we decide on how to proceed.
FritzFlorian committed
-
- 20 Jan, 2020 1 commit
-
-
FritzFlorian committed
-
- 13 Jan, 2020 1 commit
-
-
FritzFlorian committed
-
- 10 Jan, 2020 1 commit
-
-
We implement a minimal concepts of user level threads. This shows the minimum requirements for our 'staggered' stack implementation: we need to be able to switch to a new stack and allow someone else to continue the calling function right before the switch.
FritzFlorian committed
-
- 04 Jan, 2020 1 commit
-
-
FritzFlorian committed
-
- 20 Dec, 2019 1 commit
-
-
FritzFlorian committed
-
- 05 Dec, 2019 1 commit
-
-
The idea is to exclude as many sources as possible that could lead to issues with contention and cache misses. After some experimentation, we think that hyperthreading is simply not working very well with our kind of workload. In the future we might simply test on other hardware.
FritzFlorian committed
-
- 04 Dec, 2019 1 commit
-
-
FritzFlorian committed
-
- 02 Dec, 2019 1 commit
-
-
FritzFlorian committed
-
- 29 Nov, 2019 3 commits
-
-
This version runs through our initial fft and fib tests. However, it is not tested further in any way. Additionally, we added a locking deque, potentially hurting performance and moving away from our initial goal.
FritzFlorian committed -
The main issue seems to still be the fact that we have a lock free protocol where a steal can be pending. We plan to remove this fact next by introducing a protocol that works on a single atomic update.
FritzFlorian committed -
The start_chain property does not make sense, as chains are purely 'virtual', i.e. they only fully exist when walking through the computation (by patching them on important events). We initially added the property as a helper for better runtime and simpler implementation, but we think without it we will not get as much inconsistency in the runtime state. Performance can be 're-added' later on.
FritzFlorian committed
-
- 27 Nov, 2019 2 commits
-
-
FritzFlorian committed
-
It is still not working, however we now have no more redundant code, making debugging it simpler.
FritzFlorian committed
-
- 25 Nov, 2019 1 commit
-
-
We changed up some of the memory constraints in the lock free deque and will need to see if this is ok. If so, the single threaded performance looks very good.
FritzFlorian committed
-
- 19 Nov, 2019 1 commit
-
-
Everything so far is untested. We only made sure tha fast path still seems to function correctly. Next up is writing tests for both the fast and slow path to then introduce the slow path. After that we can look at performance optimizations.
FritzFlorian committed
-
- 07 Nov, 2019 1 commit
-
-
This showcases the expected performance when a task executes a sub-tree without inference from other threads. We target to stay about 6x slower than a normal function call.
FritzFlorian committed
-
- 06 Nov, 2019 3 commits
-
-
FritzFlorian committed
-
FritzFlorian committed
-
This first sketch of the classes captures what we think is needed in terms of general interface and very mich WIP.
FritzFlorian committed
-
- 05 Nov, 2019 1 commit
-
-
We changed how the memory is allocated from passing char* buffers to then store objects into to creating 'fat objects' for all scheduler state. This eases development for us, as we can make changes to data structures without too much effort (e.g. add a second array to manage tasks if required).
FritzFlorian committed
-
- 02 Oct, 2019 1 commit
-
-
Our stack is not calling deconstructors of its elements. This is problematic for e.g. the graph implementation where reference counted images are hold in tasks. To solve this for now we manually call the deconstructor after each tasks (we do so, because a generic, virtual deconstructor adds runtime costs to primitive tasks, requiring us to re-run all benchmarks; with this change we do not need to do this and as we re-work the scheduler anyways we postpone a clean implementation for then).
FritzFlorian committed
-
- 01 Oct, 2019 1 commit
-
-
FritzFlorian committed
-
- 30 Sep, 2019 1 commit
-
-
FritzFlorian committed
-
- 16 Sep, 2019 2 commits
-
-
FritzFlorian committed
-
This allows us to more easily handle them and makes their interface closer to std::thread.
FritzFlorian committed
-
- 02 Sep, 2019 1 commit
-
-
The scheduler yields if it failed to steal any work due to the task lists being empty. This should improve performance on multiprogrammed systems, as it potentially makes room for other worker threads which still have work to perform.
FritzFlorian committed
-
- 30 Aug, 2019 1 commit
-
-
FritzFlorian committed
-