- 30 Jan, 2020 3 commits
-
-
FritzFlorian committed
-
FritzFlorian committed
-
We still see very sporadic crashes, however the current version is at least a starting point for refactoring and debugging. Next steps have to be to re-enable tooling support (i.e. add code to let sanitizers do their work).
FritzFlorian committed
-
- 29 Jan, 2020 1 commit
-
-
The current version has race conditions and is hard to debug (especially because of the fibers, if a wrong thread executes on a fiber we get segfalts very fast). To combat this mess we now refactor the code bit by bit while also adding tests where it can be done with reasonably effort).
FritzFlorian committed
-
- 27 Jan, 2020 2 commits
-
-
The project is currently really messy and there are sporadic sigsevs. This indicates that we still have a race in our code. Thread Sanitizer does not work with our current implementation, as it needs annotations for fibers. The next step is to clean up the project and maybe add thread sanitizer support to our fiber implementation. This should help finding the remaining bugs.
FritzFlorian committed -
FritzFlorian committed
-
- 26 Jan, 2020 1 commit
-
-
FritzFlorian committed
-
- 24 Jan, 2020 2 commits
-
-
The deque trades tasks when stealing. Right now only the fast local path is tested and implemented. For the next step to work we also need to add the resource stack and resource tarding to the system.
FritzFlorian committed -
The current state shows the minimum actions taken to execute a parallel call: get the thread local, find the active frame, execute on the next frame and return to the active frame.
FritzFlorian committed
-
- 23 Jan, 2020 4 commits
-
-
FritzFlorian committed
-
FritzFlorian committed
-
The rationale to do an custom implementation is that the existing solutions are quite a bit slower and/or require more memory.
FritzFlorian committed -
FritzFlorian committed
-
- 22 Jan, 2020 1 commit
-
-
The basic calling works, next we measure on both x86 and arm and then decide on how we implement our fiber/'staggered stack' abstraction.
FritzFlorian committed
-
- 21 Jan, 2020 1 commit
-
-
We now cover all implementations that have a chance of being fast. ARM implementations for our 'fast fiber call' are still missing. After we add them we decide on how to proceed.
FritzFlorian committed
-
- 20 Jan, 2020 1 commit
-
-
FritzFlorian committed
-
- 13 Jan, 2020 1 commit
-
-
FritzFlorian committed
-
- 10 Jan, 2020 1 commit
-
-
We implement a minimal concepts of user level threads. This shows the minimum requirements for our 'staggered' stack implementation: we need to be able to switch to a new stack and allow someone else to continue the calling function right before the switch.
FritzFlorian committed
-
- 04 Jan, 2020 1 commit
-
-
FritzFlorian committed
-
- 20 Dec, 2019 1 commit
-
-
FritzFlorian committed
-
- 05 Dec, 2019 1 commit
-
-
The idea is to exclude as many sources as possible that could lead to issues with contention and cache misses. After some experimentation, we think that hyperthreading is simply not working very well with our kind of workload. In the future we might simply test on other hardware.
FritzFlorian committed
-
- 04 Dec, 2019 1 commit
-
-
FritzFlorian committed
-
- 29 Nov, 2019 3 commits
-
-
This version runs through our initial fft and fib tests. However, it is not tested further in any way. Additionally, we added a locking deque, potentially hurting performance and moving away from our initial goal.
FritzFlorian committed -
The main issue seems to still be the fact that we have a lock free protocol where a steal can be pending. We plan to remove this fact next by introducing a protocol that works on a single atomic update.
FritzFlorian committed -
The start_chain property does not make sense, as chains are purely 'virtual', i.e. they only fully exist when walking through the computation (by patching them on important events). We initially added the property as a helper for better runtime and simpler implementation, but we think without it we will not get as much inconsistency in the runtime state. Performance can be 're-added' later on.
FritzFlorian committed
-
- 27 Nov, 2019 2 commits
-
-
FritzFlorian committed
-
It is still not working, however we now have no more redundant code, making debugging it simpler.
FritzFlorian committed
-
- 25 Nov, 2019 1 commit
-
-
We changed up some of the memory constraints in the lock free deque and will need to see if this is ok. If so, the single threaded performance looks very good.
FritzFlorian committed
-
- 19 Nov, 2019 1 commit
-
-
Everything so far is untested. We only made sure tha fast path still seems to function correctly. Next up is writing tests for both the fast and slow path to then introduce the slow path. After that we can look at performance optimizations.
FritzFlorian committed
-
- 07 Nov, 2019 1 commit
-
-
This showcases the expected performance when a task executes a sub-tree without inference from other threads. We target to stay about 6x slower than a normal function call.
FritzFlorian committed
-
- 01 Oct, 2019 1 commit
-
-
FritzFlorian committed
-
- 16 Sep, 2019 1 commit
-
-
FritzFlorian committed
-
- 01 Aug, 2019 1 commit
-
-
This allows the stack and deque class to use the same offset, making it work better with each other.
FritzFlorian committed
-
- 31 Jul, 2019 1 commit
-
-
FritzFlorian committed
-
- 30 Jul, 2019 1 commit
-
-
FritzFlorian committed
-
- 29 Jul, 2019 2 commits
-
-
This makes the programming model a full dataflow implementation, as it allows for branching and recursion.
FritzFlorian committed -
Recursion works by using a function node, calling the graph again. We separated an graph invocation form an function invocation within an graph, making the graph only handle one concern.
FritzFlorian committed
-
- 24 Jul, 2019 1 commit
-
-
FritzFlorian committed
-
- 22 Jul, 2019 1 commit
-
-
FritzFlorian committed
-
- 19 Jul, 2019 1 commit
-
-
We separated the structure (input-output flow) from the rest of the architecture and reworked some template programming to have better access to the types required at compile time.
FritzFlorian committed
-