@@ -4,6 +4,72 @@ A collection of stuff that we noticed during development.
Useful later on two write a project report and to go back
in time to find out why certain decisions where made.
## 26.06.2019 - Notes on Dataflow Implementation
### Dataflow in general
Dataflow based programming is nothing unique to task based
programming, it rather is a own programming paradigma and using
it inside a task scheduler is simply an adoption of the general
idea of organizing programs by flow of available data.
Therefore, we first look at the general domain of dataflow
programming, to understand the basic concepts.
The work in \[1] gives a good overview of dataflow programming
before 2000. It presents the basic execution concept
and the idea of data driving program execution.
Two main ways of execution can be distinguished: static and dynamic
execution. Static execution allows only one token per arc in the
execution graph, making it simple, but do not allow a lot of
parallel execution. Dynamic execution allows an unbounded number
of tokens per edge, making it harder to implment, but allowing
unlimited parallelism. There are also models allowing a maximum
of k tokens at an arch, which is probably where we are going as
we will try to keep memory allocation static once again.
Normally, dataflow programming is seen as on pure execution model,
meaning that everything is dataflow, even expressions like
`x = y + z` are executed with dataflow and the idea is to create
special hardware to execute such dependency graphs.
We are interested in tasks coupled by dataflow dependencies.
\[1] mentions this under names like threaded, hybrid or large-grain
dataflow.
Looking further into these groase-grained dataflow models we
discovered \[2] as an good overview of this area.
The paper focuses mainly no implementations that also rely on
special hardware, but the general principles hold for our implementation.
Our kind of dataflow falls under the flow/instruction category:
high-level coordination is achieved using dataflow,
individual work items are tasks in an imperative language.
We need to further see if ideas from these languages are helpful to us,
maybe individual papers can give some clues if we need them later.
For now we can conclude that we will probably implement some sort
of k-limited, dynamic system (using tokens with different ID's).
\[1] W. M. Johnston, J. R. P. Hanna, and R. J. Millar, “Advances in dataflow programming languages,” ACM Computing Surveys, vol. 36, no. 1, pp. 1–34, Mar. 2004.
\[2] F. Yazdanpanah, C. Alvarez-Martinez, D. Jimenez-Gonzalez, and Y. Etsion, “Hybrid Dataflow/von-Neumann Architectures,” IEEE Transactions on Parallel and Distributed Systems, vol. 25, no. 6, pp. 1489–1509, Jun. 2014.
### Dataflow in TBB and EMBB
TBB's dataflow implementation follows (to what we can see) no
general dataflow theory, and implements an own method.
It relies on explicit concepts of buffering, multiple arc's
ending in the same node, concepts of pullin gand pushing modes for
arcs, ...
We think this is overly complicated, and too far away from
the classic model.
EMBB seems to follow a token based models with id's distinguishing
tokens belonging to different parallel executions.
It uses arcs that can buffer a limited number of data items.
Overall this seems rather close to what we have in mind and we will
further look into it.
## 24.06.2019 - Further Benchmark Settings and First Results
As a further option to reduce potential inference of our benchmark