Commit 60f179ba by FritzFlorian

Notes on performance after switching to pure fork join.

parent edae8f27
Pipeline #1246 failed with stages
in 29 seconds
......@@ -281,3 +281,40 @@ parallel_for, 512 heat array size):
We observe solid performance from our implementation.
(Again, not very scientific test environment, but good enough for
our general direction)
### Commit 3bdaba42 - Move to pure fork-join tasks (remove two level)
We moved away from our two-level scheduler approach towards a
pure fork-join task model (in order to remove any lock's in the
code more easily and to make further tests simpler/more focused
on one specific aspecs.
These are the measurements made after the change
(without any performance optimizations done):
FFT Average:
<img src="media/3bdaba42_fft_average.png" width="400"/>
Heat Diffusion Average:
<img src="media/3bdaba42_heat_average.png" width="400"/>
Matrix Multiplication Average:
<img src="media/3bdaba42_matrix_average.png" width="400"/>
Unbalanced Tree Search Average:
<img src="media/3bdaba42_unbalanced_average.png" width="400"/>
We note that in heat diffusion, matrix multiplication and unbalanced
tree search - all three benchmarks with mostly enough work avaliable at
all time - our implementation performs head on head with intel's
TBB. Only the FFT benchmark is a major problem four our library.
We notice a MAJOR drop in performance exactly at the hyperthreading
mark, indicating problems with limited resources due to the spinning
threads (threads without any actual work) and the threads actually
performing work. Most likely there is a resource on the same cache
line used that hinders the working threads, but we can not really
figure out which one it is.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or sign in to comment