diff --git a/PERFORMANCE.md b/PERFORMANCE.md
index dfe7d0c..343c880 100644
--- a/PERFORMANCE.md
+++ b/PERFORMANCE.md
@@ -155,3 +155,21 @@ Best case Native:
 Average case Native:
 
 <img src="media/afd0331b_matrix_average_case_native.png" width="400"/>
+
+What we find very interesting is, that the best case times of our
+pls library are very fast (as good as TBB), but the average times
+drop badly. We currently do not know why this is the case.
+
+### Commit afd0331b - Intel VTune Amplifier
+
+We did serval measurements with intel's VTune Amplifier profiling
+tool. The main thing that we notice is, that the cycles per instruction
+for our useful work blocks increase, thus requiring more CPU time
+for the acutal useful work.
+
+We also measured an implementation using TBB and found no significante
+difference, e.g. TBB also has a higher CPI with 8 threads.
+Our conclusion after this long hunting for performance is, that we
+might just be bound by some general performance issues with our code.
+The next step will therefore be to read the other frameworks and our
+code carefully, trying to find potential issues.