PERFORMANCE.md 2.96 KB
Newer Older
1 2
# Notes on performance measures during development

3 4 5
#### Commit 52fcb51f - Add basic random stealing

Slight improvement, needs further measurement after removing more important bottlenecks.
6 7 8 9
Below are three individual measurements of the difference.
Overall the trend (sum of all numbers/last number),
go down (98.7%, 96.9% and 100.6%), but with the one measurement
above 100% we think the improvements are minor.
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

| | | | | | | | | | |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
old     |    1659.01 us|    967.19 us|    830.08 us|    682.69 us|    737.71 us|    747.92 us|    749.37 us|    829.75 us|   7203.73 us
new     |    1676.06 us|    981.56 us|    814.71 us|    698.72 us|    680.87 us|    737.68 us|    756.91 us|    764.71 us|   7111.22 us
change  |    101.03  %|    101.49  %|     98.15  %|    102.35  %|     92.30  %|     98.63  %|    101.01  %|     92.16  %|     98.72  %

| | | | | | | | | | |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
old     |    1648.65 us|    973.33 us|    820.18 us|    678.80 us|    746.21 us|    767.63 us|    747.17 us|   1025.35 us|   7407.32 us
new     |    1655.09 us|    964.99 us|    807.57 us|    731.34 us|    747.47 us|    714.71 us|    794.35 us|    760.28 us|   7175.80 us
change  |    100.39  %|     99.14  %|     98.46  %|    107.74  %|    100.17  %|     93.11  %|    106.31  %|     74.15  %|     96.87  %

| | | | | | | | | | |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
old     |    1654.26 us|    969.12 us|    832.13 us|    680.69 us|    718.70 us|    750.80 us|    744.12 us|    775.24 us|   7125.07 us
new     |    1637.04 us|    978.09 us|    799.93 us|    709.33 us|    746.42 us|    684.87 us|    822.30 us|    787.61 us|   7165.59 us
change  |     98.96  %|    100.93  %|     96.13  %|    104.21  %|    103.86  %|     91.22  %|    110.51  %|    101.60  %|    100.57  %
28 29 30 31 32 33

#### Commit 3535cbd8  - Cache Align scheduler_memory

Big improvements of about 6% in our test. This seems like a little,
but 6% from the scheduler is a lot, as the 'main work' is the tasks
itself, not the scheduler.
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

This change unsurprisingly yields the biggest improvement yet.

#### Commit b9bb90a4  - Try to figure out the 'high thread bottleneck'

We are currently seeing good performance on low core counts
(up to 1/2 of the machines cores), but after that performance
plumishes:

Bana-Pi Best-Case:

<img src="./media/b9bb90a4-banana-pi-best-case.png" width="400"/>

Bana-Pi Average-Case:

<img src="./media/b9bb90a4-banana-pi-average-case.png" width="400"/>

Laptop Best-Case:

<img src="./media/b9bb90a4-laptop-best-case.png" width="400"/>

Laptop Average-Case:

<img src="./media/b9bb90a4-laptop-average-case.png" width="400"/>


As we can see, in average the performance of PLS starts getting
way worse than TBB and EMBB after 4 cores. We suspect this is due
to contemption, but could not resolve it with any combination
of `tas_spinlock` vs `ttas_spinlock` and `lock` vs `try_lock`.

This issue clearly needs further investigation.