Commit a7a3dc9b by FritzFlorian

Add further notes on our findings on preformance issues.

parent 3ff10baa
Pipeline #1155 passed with stages
in 3 minutes 32 seconds
...@@ -3,6 +3,10 @@ ...@@ -3,6 +3,10 @@
#### Commit 52fcb51f - Add basic random stealing #### Commit 52fcb51f - Add basic random stealing
Slight improvement, needs further measurement after removing more important bottlenecks. Slight improvement, needs further measurement after removing more important bottlenecks.
Below are three individual measurements of the difference.
Overall the trend (sum of all numbers/last number),
go down (98.7%, 96.9% and 100.6%), but with the one measurement
above 100% we think the improvements are minor.
| | | | | | | | | | | | | | | | | | | | | |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
...@@ -27,3 +31,35 @@ change | 98.96 %| 100.93 %| 96.13 %| 104.21 %| 103.86 %| ...@@ -27,3 +31,35 @@ change | 98.96 %| 100.93 %| 96.13 %| 104.21 %| 103.86 %|
Big improvements of about 6% in our test. This seems like a little, Big improvements of about 6% in our test. This seems like a little,
but 6% from the scheduler is a lot, as the 'main work' is the tasks but 6% from the scheduler is a lot, as the 'main work' is the tasks
itself, not the scheduler. itself, not the scheduler.
This change unsurprisingly yields the biggest improvement yet.
#### Commit b9bb90a4 - Try to figure out the 'high thread bottleneck'
We are currently seeing good performance on low core counts
(up to 1/2 of the machines cores), but after that performance
plumishes:
Bana-Pi Best-Case:
<img src="./media/b9bb90a4-banana-pi-best-case.png" width="400"/>
Bana-Pi Average-Case:
<img src="./media/b9bb90a4-banana-pi-average-case.png" width="400"/>
Laptop Best-Case:
<img src="./media/b9bb90a4-laptop-best-case.png" width="400"/>
Laptop Average-Case:
<img src="./media/b9bb90a4-laptop-average-case.png" width="400"/>
As we can see, in average the performance of PLS starts getting
way worse than TBB and EMBB after 4 cores. We suspect this is due
to contemption, but could not resolve it with any combination
of `tas_spinlock` vs `ttas_spinlock` and `lock` vs `try_lock`.
This issue clearly needs further investigation.
...@@ -8,6 +8,10 @@ ...@@ -8,6 +8,10 @@
This section will give a brief introduction on how to get a minimal This section will give a brief introduction on how to get a minimal
project setup that uses the PLS library. project setup that uses the PLS library.
Further notes on [performance](PERFORMANCE.md) and general
[notes](NOTES.md) on the development progress can be found in
the linked documents.
### Installation ### Installation
Clone the repository and open a terminal session in its folder. Clone the repository and open a terminal session in its folder.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or sign in to comment