Commit a4b03ffe by FritzFlorian

Found general problem for FFT performance.

parent 5044f0a1
Pipeline #1253 failed with stages
in 3 minutes 38 seconds
......@@ -318,3 +318,40 @@ threads (threads without any actual work) and the threads actually
performing work. Most likely there is a resource on the same cache
line used that hinders the working threads, but we can not really
figure out which one it is.
### Commit be2cdbfe - Locking Deque
Switching to a locking deque has not improved (or even slightly hurt)
performance, we therefore think that the deque itself is not the
portion slowing down our execution.
### Commit 5044f0a1 - Performance Bottelneck in FFT FIXED
By moving from directly calling one of the parallel invocations
```c++
scheduler::spawn_child(sub_task_2);
function1(); // Execute first function 'inline' without spawning a sub_task object
```
to spawning two tasks
```c++
scheduler::spawn_child(sub_task_2);
scheduler::spawn_child(sub_task_1);
```
we where able to fix the bad performance of our framework in the
FFT benchmark (where there is a lot spinning/idling of some
worker threads).
We think this is due to some sort of cache misses/bus contemption
on the finishing counters. This would make sense, as the drop
at the hyperthreading mark indicates problems with this part of the
CPU pipeline (althought it did not show clearly in our profiling runs).
We will now try to find the exact spot where the problem originates and
fix the source rather then 'circumventing' it with these extra tasks.
(This then aigain, should hopefully even boost all other workloads
performance, as contemption on the bus/cache is always bad)
......@@ -13,10 +13,13 @@ template<typename Function1, typename Function2>
void invoke_parallel(const Function1 &function1, const Function2 &function2) {
using namespace ::pls::internal::scheduling;
auto sub_task_1 = lambda_task_by_reference<Function1>(function1);
auto sub_task_2 = lambda_task_by_reference<Function2>(function2);
scheduler::spawn_child(sub_task_2);
function1(); // Execute first function 'inline' without spawning a sub_task object
scheduler::spawn_child(sub_task_1);
// TODO: Research the exact cause of this being faster
// function1(); // Execute first function 'inline' without spawning a sub_task object
scheduler::wait_for_all();
}
......@@ -24,12 +27,15 @@ template<typename Function1, typename Function2, typename Function3>
void invoke_parallel(const Function1 &function1, const Function2 &function2, const Function3 &function3) {
using namespace ::pls::internal::scheduling;
auto sub_task_1 = lambda_task_by_reference<Function1>(function1);
auto sub_task_2 = lambda_task_by_reference<Function2>(function2);
auto sub_task_3 = lambda_task_by_reference<Function3>(function3);
scheduler::spawn_child(sub_task_2);
scheduler::spawn_child(sub_task_3);
function1(); // Execute first function 'inline' without spawning a sub_task object
scheduler::spawn_child(sub_task_1);
// TODO: Research the exact cause of this being faster
// function1(); // Execute first function 'inline' without spawning a sub_task object
scheduler::wait_for_all();
}
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or sign in to comment