diff --git a/PERFORMANCE.md b/PERFORMANCE.md
index a88b3ef..f9cc49e 100644
--- a/PERFORMANCE.md
+++ b/PERFORMANCE.md
@@ -66,3 +66,23 @@ Additionaly, the first one uses our high level API (parallel invoke),
 while the second one uses our low level API.
 It is worth investigating if either or high level API or the structure
 of the memory access in FFT are the problem.
+
+### Commit cf056856 - Remove two-level scheduler
+
+In this test we replace the two level scheduler with ONLY fork_join
+tasks. This removes the top level steal overhead and performs only
+internal stealing. For this we set the fork_join task as the only
+possible task type and removed the top level rw-lock, the digging
+down to our level and solely use internal stealing.
+
+Average results FFT:
+
+<img src="media/cf056856_fft_average.png" width="600"/>
+
+Average results Unbalanced:
+
+<img src="media/cf056856_unbalanced_average.png" width="600"/>
+
+There seems to be only a minor performance difference between the two,
+suggesting tha our two-level approach is not the part causing our
+weaker performance.
diff --git a/media/cf056856_fft_average.png b/media/cf056856_fft_average.png
new file mode 100644
index 0000000..ec55027
Binary files /dev/null and b/media/cf056856_fft_average.png differ
diff --git a/media/cf056856_unbalanced_average.png b/media/cf056856_unbalanced_average.png
new file mode 100644
index 0000000..75d2829
Binary files /dev/null and b/media/cf056856_unbalanced_average.png differ