diff --git a/PERFORMANCE.md b/PERFORMANCE.md index dfe7d0c..343c880 100644 --- a/PERFORMANCE.md +++ b/PERFORMANCE.md @@ -155,3 +155,21 @@ Best case Native: Average case Native: + +What we find very interesting is, that the best case times of our +pls library are very fast (as good as TBB), but the average times +drop badly. We currently do not know why this is the case. + +### Commit afd0331b - Intel VTune Amplifier + +We did serval measurements with intel's VTune Amplifier profiling +tool. The main thing that we notice is, that the cycles per instruction +for our useful work blocks increase, thus requiring more CPU time +for the acutal useful work. + +We also measured an implementation using TBB and found no significante +difference, e.g. TBB also has a higher CPI with 8 threads. +Our conclusion after this long hunting for performance is, that we +might just be bound by some general performance issues with our code. +The next step will therefore be to read the other frameworks and our +code carefully, trying to find potential issues.