Commit a78ff0d1 by FritzFlorian

Add further notes on CPI performance.

parent 5b73b30a
Pipeline #1230 passed with stages
in 3 minutes 53 seconds
......@@ -155,3 +155,21 @@ Best case Native:
Average case Native:
<img src="media/afd0331b_matrix_average_case_native.png" width="400"/>
What we find very interesting is, that the best case times of our
pls library are very fast (as good as TBB), but the average times
drop badly. We currently do not know why this is the case.
### Commit afd0331b - Intel VTune Amplifier
We did serval measurements with intel's VTune Amplifier profiling
tool. The main thing that we notice is, that the cycles per instruction
for our useful work blocks increase, thus requiring more CPU time
for the acutal useful work.
We also measured an implementation using TBB and found no significante
difference, e.g. TBB also has a higher CPI with 8 threads.
Our conclusion after this long hunting for performance is, that we
might just be bound by some general performance issues with our code.
The next step will therefore be to read the other frameworks and our
code carefully, trying to find potential issues.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or sign in to comment