- 24 Jan, 2020 2 commits
-
-
The deque trades tasks when stealing. Right now only the fast local path is tested and implemented. For the next step to work we also need to add the resource stack and resource tarding to the system.
FritzFlorian committed -
The current state shows the minimum actions taken to execute a parallel call: get the thread local, find the active frame, execute on the next frame and return to the active frame.
FritzFlorian committed
-
- 23 Jan, 2020 4 commits
-
-
FritzFlorian committed
-
FritzFlorian committed
-
The rationale to do an custom implementation is that the existing solutions are quite a bit slower and/or require more memory.
FritzFlorian committed -
FritzFlorian committed
-
- 22 Jan, 2020 1 commit
-
-
The basic calling works, next we measure on both x86 and arm and then decide on how we implement our fiber/'staggered stack' abstraction.
FritzFlorian committed
-
- 21 Jan, 2020 1 commit
-
-
We now cover all implementations that have a chance of being fast. ARM implementations for our 'fast fiber call' are still missing. After we add them we decide on how to proceed.
FritzFlorian committed
-
- 20 Jan, 2020 1 commit
-
-
FritzFlorian committed
-
- 13 Jan, 2020 1 commit
-
-
FritzFlorian committed
-
- 10 Jan, 2020 1 commit
-
-
We implement a minimal concepts of user level threads. This shows the minimum requirements for our 'staggered' stack implementation: we need to be able to switch to a new stack and allow someone else to continue the calling function right before the switch.
FritzFlorian committed
-
- 04 Jan, 2020 1 commit
-
-
FritzFlorian committed
-
- 20 Dec, 2019 1 commit
-
-
FritzFlorian committed
-
- 05 Dec, 2019 1 commit
-
-
The idea is to exclude as many sources as possible that could lead to issues with contention and cache misses. After some experimentation, we think that hyperthreading is simply not working very well with our kind of workload. In the future we might simply test on other hardware.
FritzFlorian committed
-
- 04 Dec, 2019 1 commit
-
-
FritzFlorian committed
-
- 29 Nov, 2019 3 commits
-
-
This version runs through our initial fft and fib tests. However, it is not tested further in any way. Additionally, we added a locking deque, potentially hurting performance and moving away from our initial goal.
FritzFlorian committed -
The main issue seems to still be the fact that we have a lock free protocol where a steal can be pending. We plan to remove this fact next by introducing a protocol that works on a single atomic update.
FritzFlorian committed -
The start_chain property does not make sense, as chains are purely 'virtual', i.e. they only fully exist when walking through the computation (by patching them on important events). We initially added the property as a helper for better runtime and simpler implementation, but we think without it we will not get as much inconsistency in the runtime state. Performance can be 're-added' later on.
FritzFlorian committed
-
- 27 Nov, 2019 2 commits
-
-
FritzFlorian committed
-
It is still not working, however we now have no more redundant code, making debugging it simpler.
FritzFlorian committed
-
- 25 Nov, 2019 1 commit
-
-
We changed up some of the memory constraints in the lock free deque and will need to see if this is ok. If so, the single threaded performance looks very good.
FritzFlorian committed
-
- 19 Nov, 2019 1 commit
-
-
Everything so far is untested. We only made sure tha fast path still seems to function correctly. Next up is writing tests for both the fast and slow path to then introduce the slow path. After that we can look at performance optimizations.
FritzFlorian committed
-
- 07 Nov, 2019 1 commit
-
-
This showcases the expected performance when a task executes a sub-tree without inference from other threads. We target to stay about 6x slower than a normal function call.
FritzFlorian committed
-
- 01 Oct, 2019 1 commit
-
-
FritzFlorian committed
-
- 16 Sep, 2019 1 commit
-
-
FritzFlorian committed
-
- 01 Aug, 2019 1 commit
-
-
This allows the stack and deque class to use the same offset, making it work better with each other.
FritzFlorian committed
-
- 31 Jul, 2019 1 commit
-
-
FritzFlorian committed
-
- 30 Jul, 2019 1 commit
-
-
FritzFlorian committed
-
- 29 Jul, 2019 2 commits
-
-
This makes the programming model a full dataflow implementation, as it allows for branching and recursion.
FritzFlorian committed -
Recursion works by using a function node, calling the graph again. We separated an graph invocation form an function invocation within an graph, making the graph only handle one concern.
FritzFlorian committed
-
- 24 Jul, 2019 1 commit
-
-
FritzFlorian committed
-
- 22 Jul, 2019 1 commit
-
-
FritzFlorian committed
-
- 19 Jul, 2019 1 commit
-
-
We separated the structure (input-output flow) from the rest of the architecture and reworked some template programming to have better access to the types required at compile time.
FritzFlorian committed
-
- 11 Jul, 2019 1 commit
-
-
The data can now flow into a graph, follow its path on inptus/outputs and be fetched from the graph after execution. Currently graphs are executed synchronous.
FritzFlorian committed
-
- 10 Jul, 2019 1 commit
-
-
FritzFlorian committed
-
- 09 Jul, 2019 2 commits
-
-
We need some tricks in template programming to have a clean user facing API while internally using our classes with more capabilities.
FritzFlorian committed -
This is the base building block for creating the actual dataflow functions and networks, as their main functionality is having data flow through connected nodes.
FritzFlorian committed
-
- 20 Jun, 2019 2 commits
-
-
Florian Fritz committed
-
FritzFlorian committed
-
- 18 Jun, 2019 1 commit
-
-
FritzFlorian committed
-