Skip to content
Toggle navigation
P
Projects
G
Groups
S
Snippets
Help
las3_pub
/
predictable_parallel_patterns
This project
Loading...
Sign in
Toggle navigation
Go to a project
Project
Repository
Issues
0
Merge Requests
0
Pipelines
Wiki
Members
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Commit
a78ff0d1
authored
May 28, 2019
by
FritzFlorian
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Add further notes on CPI performance.
parent
5b73b30a
Pipeline
#1230
passed with stages
in 3 minutes 53 seconds
Changes
1
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
18 additions
and
0 deletions
+18
-0
PERFORMANCE.md
+18
-0
No files found.
PERFORMANCE.md
View file @
a78ff0d1
...
...
@@ -155,3 +155,21 @@ Best case Native:
Average case Native:
<img
src=
"media/afd0331b_matrix_average_case_native.png"
width=
"400"
/>
What we find very interesting is, that the best case times of our
pls library are very fast (as good as TBB), but the average times
drop badly. We currently do not know why this is the case.
### Commit afd0331b - Intel VTune Amplifier
We did serval measurements with intel's VTune Amplifier profiling
tool. The main thing that we notice is, that the cycles per instruction
for our useful work blocks increase, thus requiring more CPU time
for the acutal useful work.
We also measured an implementation using TBB and found no significante
difference, e.g. TBB also has a higher CPI with 8 threads.
Our conclusion after this long hunting for performance is, that we
might just be bound by some general performance issues with our code.
The next step will therefore be to read the other frameworks and our
code carefully, trying to find potential issues.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment