From 938be84f1f080a7a6781ce1f8f873282c9ee2dbf Mon Sep 17 00:00:00 2001 From: FritzFlorian Date: Fri, 28 Jun 2019 10:32:42 +0200 Subject: [PATCH] Add notes on Dataflow --- NOTES.md | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 66 insertions(+) diff --git a/NOTES.md b/NOTES.md index ef4845c..1a70ff5 100644 --- a/NOTES.md +++ b/NOTES.md @@ -4,6 +4,72 @@ A collection of stuff that we noticed during development. Useful later on two write a project report and to go back in time to find out why certain decisions where made. +## 26.06.2019 - Notes on Dataflow Implementation + +### Dataflow in general + +Dataflow based programming is nothing unique to task based +programming, it rather is a own programming paradigma and using +it inside a task scheduler is simply an adoption of the general +idea of organizing programs by flow of available data. + +Therefore, we first look at the general domain of dataflow +programming, to understand the basic concepts. + +The work in \[1] gives a good overview of dataflow programming +before 2000. It presents the basic execution concept +and the idea of data driving program execution. +Two main ways of execution can be distinguished: static and dynamic +execution. Static execution allows only one token per arc in the +execution graph, making it simple, but do not allow a lot of +parallel execution. Dynamic execution allows an unbounded number +of tokens per edge, making it harder to implment, but allowing +unlimited parallelism. There are also models allowing a maximum +of k tokens at an arch, which is probably where we are going as +we will try to keep memory allocation static once again. + +Normally, dataflow programming is seen as on pure execution model, +meaning that everything is dataflow, even expressions like +`x = y + z` are executed with dataflow and the idea is to create +special hardware to execute such dependency graphs. +We are interested in tasks coupled by dataflow dependencies. +\[1] mentions this under names like threaded, hybrid or large-grain +dataflow. + +Looking further into these groase-grained dataflow models we +discovered \[2] as an good overview of this area. +The paper focuses mainly no implementations that also rely on +special hardware, but the general principles hold for our implementation. +Our kind of dataflow falls under the flow/instruction category: +high-level coordination is achieved using dataflow, +individual work items are tasks in an imperative language. +We need to further see if ideas from these languages are helpful to us, +maybe individual papers can give some clues if we need them later. + +For now we can conclude that we will probably implement some sort +of k-limited, dynamic system (using tokens with different ID's). + + +\[1] W. M. Johnston, J. R. P. Hanna, and R. J. Millar, “Advances in dataflow programming languages,” ACM Computing Surveys, vol. 36, no. 1, pp. 1–34, Mar. 2004. + +\[2] F. Yazdanpanah, C. Alvarez-Martinez, D. Jimenez-Gonzalez, and Y. Etsion, “Hybrid Dataflow/von-Neumann Architectures,” IEEE Transactions on Parallel and Distributed Systems, vol. 25, no. 6, pp. 1489–1509, Jun. 2014. + +### Dataflow in TBB and EMBB + +TBB's dataflow implementation follows (to what we can see) no +general dataflow theory, and implements an own method. +It relies on explicit concepts of buffering, multiple arc's +ending in the same node, concepts of pullin gand pushing modes for +arcs, ... +We think this is overly complicated, and too far away from +the classic model. + +EMBB seems to follow a token based models with id's distinguishing +tokens belonging to different parallel executions. +It uses arcs that can buffer a limited number of data items. +Overall this seems rather close to what we have in mind and we will +further look into it. + ## 24.06.2019 - Further Benchmark Settings and First Results As a further option to reduce potential inference of our benchmark -- libgit2 0.26.0