Commit 198e8868 by Christian Kern

Merge branch 'embb483_merge_examples' into development

parents 98fd607b 399e6daf
...@@ -190,7 +190,7 @@ add_subdirectory(algorithms_cpp) ...@@ -190,7 +190,7 @@ add_subdirectory(algorithms_cpp)
add_subdirectory(dataflow_cpp) add_subdirectory(dataflow_cpp)
if (BUILD_EXAMPLES STREQUAL ON) if (BUILD_EXAMPLES STREQUAL ON)
message("-- Building examples enabled") message("-- Building examples enabled")
add_subdirectory(doc/examples) add_subdirectory(doc/examples_raw)
else() else()
message("-- Building examples disabled (default)") message("-- Building examples disabled (default)")
endif() endif()
......
...@@ -13,21 +13,21 @@ In the following, we look at parallel function invocation (Section~\ref{sec:algo ...@@ -13,21 +13,21 @@ In the following, we look at parallel function invocation (Section~\ref{sec:algo
%In the previous section, we have considered data parallelism, that is, parallel execution of the same operation on a range of elements. Next, we consider parallel execution of several work packages encapsulated in functions. %In the previous section, we have considered data parallelism, that is, parallel execution of the same operation on a range of elements. Next, we consider parallel execution of several work packages encapsulated in functions.
Let us start with the parallel execution of several work packages encapsulated in functions. Suppose that the following functions operate on different data sets, and thus are independent of each other: Let us start with the parallel execution of several work packages encapsulated in functions. Suppose that the following functions operate on different data sets, and thus are independent of each other:
% %
\\\inputlisting{../examples/algorithms/invoke/packages-snippet.h} \\\inputlisting{../examples_raw/algorithms/invoke/packages-snippet.h}
% %
The functions can be executed in parallel using the \lstinline|ParallelInvoke| construct provided by {\embb}: The functions can be executed in parallel using the \lstinline|ParallelInvoke| construct provided by {\embb}:
% %
\\\inputlisting{../examples/algorithms/invoke/invocation-snippet.h} \\\inputlisting{../examples_raw/algorithms/invoke/invocation-snippet.h}
% %
Note that \lstinline|ParallelInvoke| waits until all its arguments have finished execution. Note that \lstinline|ParallelInvoke| waits until all its arguments have finished execution.
Next, let us consider a more elaborate example. The following piece of code shows a serial quick sort algorithm which we want to parallelize (do not care about the details of the \lstinline|Partition| function for the moment): Next, let us consider a more elaborate example. The following piece of code shows a serial quick sort algorithm which we want to parallelize (do not care about the details of the \lstinline|Partition| function for the moment):
% %
\\\inputlisting{../examples/algorithms/invoke/quick_sort-snippet.h} \\\inputlisting{../examples_raw/algorithms/invoke/quick_sort-snippet.h}
% %
A straightforward approach to parallelize this algorithm is to execute the recursive calls to \lstinline|Quicksort| in parallel. With \lstinline|ParallelInvoke| and lambdas, it is as simple as that: A straightforward approach to parallelize this algorithm is to execute the recursive calls to \lstinline|Quicksort| in parallel. With \lstinline|ParallelInvoke| and lambdas, it is as simple as that:
% %
\\\inputlisting{../examples/algorithms/invoke/parallel_quick_sort-snippet.h} \\\inputlisting{../examples_raw/algorithms/invoke/parallel_quick_sort-snippet.h}
% %
The lambdas capture the \lstinline|first|, \lstinline|mid|, and \lstinline|last| pointers to the range to be sorted and forward them to the recursive calls of quick sort. These are executed in parallel, where \lstinline|Invoke| does not return before both have finished execution. The above implementation of parallel quick sort is of course not yet optimal. In particular, the creation of new tasks should be stopped when a certain lower bound on the size of the subranges has been reached. The subranges can then be sorted sequentially in order to reduce the overhead for task creation and management. Fortunately, {\embb} already provides solutions for parallel sorting, which will be covered in the following section. The lambdas capture the \lstinline|first|, \lstinline|mid|, and \lstinline|last| pointers to the range to be sorted and forward them to the recursive calls of quick sort. These are executed in parallel, where \lstinline|Invoke| does not return before both have finished execution. The above implementation of parallel quick sort is of course not yet optimal. In particular, the creation of new tasks should be stopped when a certain lower bound on the size of the subranges has been reached. The subranges can then be sorted sequentially in order to reduce the overhead for task creation and management. Fortunately, {\embb} already provides solutions for parallel sorting, which will be covered in the following section.
...@@ -37,19 +37,19 @@ The lambdas capture the \lstinline|first|, \lstinline|mid|, and \lstinline|last| ...@@ -37,19 +37,19 @@ The lambdas capture the \lstinline|first|, \lstinline|mid|, and \lstinline|last|
%As sorting is a prominent problem that can benefit from multicore processors, {\embb} provides ready-to-use algorithms for parallel sorting. %As sorting is a prominent problem that can benefit from multicore processors, {\embb} provides ready-to-use algorithms for parallel sorting.
For systems with constraints on memory consumption, the quick sort algorithm provided by \embb is usually the best choice, since it works in-place which means that it does not require additional memory. Considering real-time systems, however, its worst-case runtime of $O(N^2)$, where $N$ is the number of elements to be sorted, can be a problem. For this reason, {\embb} also provides a parallel merge sort algorithm. Merge sort does not work in-place, but has a predictable runtime complexity of $\Theta(N \log N)$. Assume we want to sort a vector of integers: For systems with constraints on memory consumption, the quick sort algorithm provided by \embb is usually the best choice, since it works in-place which means that it does not require additional memory. Considering real-time systems, however, its worst-case runtime of $O(N^2)$, where $N$ is the number of elements to be sorted, can be a problem. For this reason, {\embb} also provides a parallel merge sort algorithm. Merge sort does not work in-place, but has a predictable runtime complexity of $\Theta(N \log N)$. Assume we want to sort a vector of integers:
% %
\\\inputlisting{../examples/algorithms/sorting/range_define-snippet.h} \\\inputlisting{../examples_raw/algorithms/sorting/range_define-snippet.h}
% %
Using quick sort, we simply write: Using quick sort, we simply write:
% %
\\\inputlisting{../examples/algorithms/sorting/quick_sort-snippet.h} \\\inputlisting{../examples_raw/algorithms/sorting/quick_sort-snippet.h}
% %
The default invocation of \lstinline|QuickSort| uses \lstinline|std::less| with the iterators' \lstinline|value_type| as comparison operation. As a result, the range is sorted in ascending order. It is possible to provide a custom comparison operation, for example \lstinline|std::greater|, by passing it as a function object to the algorithm. Sorting the elements in descending can be accomplished as follows: The default invocation of \lstinline|QuickSort| uses \lstinline|std::less| with the iterators' \lstinline|value_type| as comparison operation. As a result, the range is sorted in ascending order. It is possible to provide a custom comparison operation, for example \lstinline|std::greater|, by passing it as a function object to the algorithm. Sorting the elements in descending can be accomplished as follows:
% %
\\\inputlisting{../examples/algorithms/sorting/quick_sort_custom_compare-snippet.h} \\\inputlisting{../examples_raw/algorithms/sorting/quick_sort_custom_compare-snippet.h}
The merge sort algorithm comes in two versions. The first version automatically allocates dynamic memory for temporary values when the algorithm is called. Its name is \lstinline|MergeSortAllocate| and it has the same parameters as \lstinline|QuickSort|. To enable the use of merge sort in environments that forbid dynamic memory allocation after initialization, the second version can be called with a pre-allocated temporary range of values: The merge sort algorithm comes in two versions. The first version automatically allocates dynamic memory for temporary values when the algorithm is called. Its name is \lstinline|MergeSortAllocate| and it has the same parameters as \lstinline|QuickSort|. To enable the use of merge sort in environments that forbid dynamic memory allocation after initialization, the second version can be called with a pre-allocated temporary range of values:
% %
\\\inputlisting{../examples/algorithms/sorting/merge_sort_preallocated-snippet.h} \\\inputlisting{../examples_raw/algorithms/sorting/merge_sort_preallocated-snippet.h}
% %
The temporary range can be allocated at any time, e.g., during the initialization phase of the system. The temporary range can be allocated at any time, e.g., during the initialization phase of the system.
...@@ -59,33 +59,33 @@ The temporary range can be allocated at any time, e.g., during the initializatio ...@@ -59,33 +59,33 @@ The temporary range can be allocated at any time, e.g., during the initializatio
%Related to the above described summation reductions are the so-called counting operations. %Related to the above described summation reductions are the so-called counting operations.
\embb also provides functions for counting the number of elements in a range. Consider a range of integers from 0 to 3: \embb also provides functions for counting the number of elements in a range. Consider a range of integers from 0 to 3:
% %
\\\inputlisting{../examples/algorithms/counting/setup-snippet.h} \\\inputlisting{../examples_raw/algorithms/counting/setup-snippet.h}
% %
To determine how often a specific value appears within the range, we could simply iterate over it and compare each element with the specified one. The \lstinline|Count| function does this in parallel, where the first two arguments specify the range and the third one the element to be counted: To determine how often a specific value appears within the range, we could simply iterate over it and compare each element with the specified one. The \lstinline|Count| function does this in parallel, where the first two arguments specify the range and the third one the element to be counted:
%have to go through each of them, perform a comparison, and count the elements that compare equal. As in the reduction, the problem here is that a global counter is involved. The counting with equal comparison can be realized using the \lstinline|Count| function as %have to go through each of them, perform a comparison, and count the elements that compare equal. As in the reduction, the problem here is that a global counter is involved. The counting with equal comparison can be realized using the \lstinline|Count| function as
% %
\\\inputlisting{../examples/algorithms/counting/count-snippet.h} \\\inputlisting{../examples_raw/algorithms/counting/count-snippet.h}
% %
For the range given above, we have \lstinline|count == 2|. For the range given above, we have \lstinline|count == 2|.
In case the comparison operation is not equality, we can employ the \lstinline|CountIf| function. Here, the third argument is a unary predicate which evaluates to \lstinline|true| for each element to be counted. The following example shows how to count the number of values greater than 0: In case the comparison operation is not equality, we can employ the \lstinline|CountIf| function. Here, the third argument is a unary predicate which evaluates to \lstinline|true| for each element to be counted. The following example shows how to count the number of values greater than 0:
% %
\\\inputlisting{../examples/algorithms/counting/count_if-snippet.h} \\\inputlisting{../examples_raw/algorithms/counting/count_if-snippet.h}
\section{Foreach Loops} \section{Foreach Loops}
\label{sec:algorithms_foreach} \label{sec:algorithms_foreach}
A frequently encountered task in parallel programming is to apply some operation on a range of values, as illustrated in the example of Section~\ref{sec:introduction_function_objects}. In principle, one could apply the operation to all elements in parallel provided that there are no data dependencies. However, this results in unnecessary overhead if the number of elements is greater than the number of available processor cores $p$. A better solution is to partition the range into $p$ blocks and to process the elements of a block sequentially. With the \lstinline|ForEach| construct provided by \embb, users do not have to care about the partitioning, since this is done automatically. Similar to the Standard Library's \lstinline|for_each| function, it is sufficient to pass the operation in form of a function object. The following piece of code shows how to implement the example of Section~\ref{sec:introduction_function_objects} using \embb: A frequently encountered task in parallel programming is to apply some operation on a range of values, as illustrated in the example of Section~\ref{sec:introduction_function_objects}. In principle, one could apply the operation to all elements in parallel provided that there are no data dependencies. However, this results in unnecessary overhead if the number of elements is greater than the number of available processor cores $p$. A better solution is to partition the range into $p$ blocks and to process the elements of a block sequentially. With the \lstinline|ForEach| construct provided by \embb, users do not have to care about the partitioning, since this is done automatically. Similar to the Standard Library's \lstinline|for_each| function, it is sufficient to pass the operation in form of a function object. The following piece of code shows how to implement the example of Section~\ref{sec:introduction_function_objects} using \embb:
% %
\\\inputlisting{../examples/algorithms/for_each/doubling-snippet.h} \\\inputlisting{../examples_raw/algorithms/for_each/doubling-snippet.h}
In the above code snippet, the results of the computation overwrite the input. If the input has to be left unchanged, the results must be written to a separate output range. Thus, the operation requires two ranges. {\embb} supports such scenarios by the \lstinline|ZipIterator|, which wraps two iterators into one. Consider the following revised example for doubling the elements of a vector: In the above code snippet, the results of the computation overwrite the input. If the input has to be left unchanged, the results must be written to a separate output range. Thus, the operation requires two ranges. {\embb} supports such scenarios by the \lstinline|ZipIterator|, which wraps two iterators into one. Consider the following revised example for doubling the elements of a vector:
% %
\\\inputlisting{../examples/algorithms/for_each/setup_zip-snippet.h} \\\inputlisting{../examples_raw/algorithms/for_each/setup_zip-snippet.h}
% %
Using the \lstinline|Zip| function as convenient way to create a zip iterator, the doubling can be performed as follows: Using the \lstinline|Zip| function as convenient way to create a zip iterator, the doubling can be performed as follows:
% %
\\\inputlisting{../examples/algorithms/for_each/doubling_zip-snippet.h} \\\inputlisting{../examples_raw/algorithms/for_each/doubling_zip-snippet.h}
% %
The argument to the lambda function is a \lstinline|ZipPair| with the iterators' reference value as template parameters. The elements pointed to by the zip iterator can be accessed via \lstinline|First()| and \lstinline|Second()|, similar to \lstinline|std::pair|. The argument to the lambda function is a \lstinline|ZipPair| with the iterators' reference value as template parameters. The elements pointed to by the zip iterator can be accessed via \lstinline|First()| and \lstinline|Second()|, similar to \lstinline|std::pair|.
...@@ -94,25 +94,25 @@ The argument to the lambda function is a \lstinline|ZipPair| with the iterators' ...@@ -94,25 +94,25 @@ The argument to the lambda function is a \lstinline|ZipPair| with the iterators'
As mentioned in the previous section, the \lstinline|ForEach| construct requires the loop iterations do be independent of each other. However, this is not always the case. Imagine we want to sum up the values of a range, e.g., a vector of integers: As mentioned in the previous section, the \lstinline|ForEach| construct requires the loop iterations do be independent of each other. However, this is not always the case. Imagine we want to sum up the values of a range, e.g., a vector of integers:
% %
\\\inputlisting{../examples/algorithms/reduce/range_init-snippet.h} \\\inputlisting{../examples_raw/algorithms/reduce/range_init-snippet.h}
% %
Sequentially, this can be done by a simple loop: Sequentially, this can be done by a simple loop:
% %
\\\inputlisting{../examples/algorithms/reduce/sequential-snippet.h} \\\inputlisting{../examples_raw/algorithms/reduce/sequential-snippet.h}
% %
One might be tempted to sum up the elements in parallel using a foreach loop. The problem is that parallel accesses to \lstinline|sum| must be synchronized to avoid race conditions, which in fact sequentializes the loop. A more efficient approach is to compute intermediate sums for each block of the range and to sum them up at the end. For such purposes, {\embb} provides the function \lstinline|Reduce|: One might be tempted to sum up the elements in parallel using a foreach loop. The problem is that parallel accesses to \lstinline|sum| must be synchronized to avoid race conditions, which in fact sequentializes the loop. A more efficient approach is to compute intermediate sums for each block of the range and to sum them up at the end. For such purposes, {\embb} provides the function \lstinline|Reduce|:
% %
\\\inputlisting{../examples/algorithms/reduce/parallel-snippet.h} \\\inputlisting{../examples_raw/algorithms/reduce/parallel-snippet.h}
% %
The third argument to \lstinline|Reduce| is the neutral element of the reduction operation, i.e., the element that does not change the result. In case of addition (\lstinline|std::plus|), the neutral element is 0. If we wanted to compute the product of the vector elements, the neutral element would be 1. The third argument to \lstinline|Reduce| is the neutral element of the reduction operation, i.e., the element that does not change the result. In case of addition (\lstinline|std::plus|), the neutral element is 0. If we wanted to compute the product of the vector elements, the neutral element would be 1.
Next, let us consider the parallel computation of a dot product. Given two input ranges, we want to multiply each pair of input elements and sum up the products. The second input range is given as follows: Next, let us consider the parallel computation of a dot product. Given two input ranges, we want to multiply each pair of input elements and sum up the products. The second input range is given as follows:
% %
\\\inputlisting{../examples/algorithms/reduce/second_range_init-snippet.h} \\\inputlisting{../examples_raw/algorithms/reduce/second_range_init-snippet.h}
% %
The reduction consists of two steps: First, the input ranges are transformed and then, the reduction is performed on the transformed range. For that purpose, the \lstinline|Reduce| function allows to specify a transformation function object. By default, this is the identity functor which does not modify the input range. To implement the dot product, we can use the \lstinline|Zip| function (see Section~\ref{sec:algorithms_foreach}) and a lambda function for computing the transformed range: The reduction consists of two steps: First, the input ranges are transformed and then, the reduction is performed on the transformed range. For that purpose, the \lstinline|Reduce| function allows to specify a transformation function object. By default, this is the identity functor which does not modify the input range. To implement the dot product, we can use the \lstinline|Zip| function (see Section~\ref{sec:algorithms_foreach}) and a lambda function for computing the transformed range:
% %
\\\inputlisting{../examples/algorithms/reduce/dot_product-snippet.h} \\\inputlisting{../examples_raw/algorithms/reduce/dot_product-snippet.h}
\section{Prefix Computations} \section{Prefix Computations}
\label{sec:algorithms_prefix} \label{sec:algorithms_prefix}
...@@ -128,14 +128,14 @@ y_n &=& y_{n-1} \cdot x_n ...@@ -128,14 +128,14 @@ y_n &=& y_{n-1} \cdot x_n
\end{eqnarray*} \end{eqnarray*}
where $id$ is the identity (neutral element) with respect to the operation $(\cdot): X \times X \rightarrow Y$. As an example, consider the following range: where $id$ is the identity (neutral element) with respect to the operation $(\cdot): X \times X \rightarrow Y$. As an example, consider the following range:
% %
\\\inputlisting{../examples/algorithms/scan/setup-snippet.h} \\\inputlisting{../examples_raw/algorithms/scan/setup-snippet.h}
% %
Computing the prefix sums of \lstinline|input_range| sequentially is easy: Computing the prefix sums of \lstinline|input_range| sequentially is easy:
% %
\\\inputlisting{../examples/algorithms/scan/sequential_prefix_sum-snippet.h} \\\inputlisting{../examples_raw/algorithms/scan/sequential_prefix_sum-snippet.h}
% %
Note the dependency on loop iteration $i-1$ to compute the result in iteration $i$. A special two-pass algorithm is used in the {\embb} function \lstinline|Scan| to perform prefix computations in parallel. Using \lstinline|Scan| to compute the prefix sums, we get: Note the dependency on loop iteration $i-1$ to compute the result in iteration $i$. A special two-pass algorithm is used in the {\embb} function \lstinline|Scan| to perform prefix computations in parallel. Using \lstinline|Scan| to compute the prefix sums, we get:
% %
\\\inputlisting{../examples/algorithms/scan/prefix_sum-snippet.h} \\\inputlisting{../examples_raw/algorithms/scan/prefix_sum-snippet.h}
% %
As in the case of reductions, the neutral element has to be given explicitly. Also, a transformation function can be passed as additional argument to \lstinline|Scan|. The elements of the input range are then transformed before given to the prefix operation. As in the case of reductions, the neutral element has to be given explicitly. Also, a transformation function can be passed as additional argument to \lstinline|Scan|. The elements of the input range are then transformed before given to the prefix operation.
\ No newline at end of file
...@@ -15,12 +15,12 @@ An object pool allocates a fixed number of objects at construction. Objects can ...@@ -15,12 +15,12 @@ An object pool allocates a fixed number of objects at construction. Objects can
Listing~\ref{lst:object_pool_lst1} shows an example, where we create in Line~\ref{lst:object_pool_lst1:line_create} an object pool with five objects of type \lstinline|int|. If nothing else is specified, the object-pool uses a wait-free implementation. Then, we allocate five objects from the object pool and store the obtained pointers in a temporary array. The actual allocation takes place in Line~\ref{lst:object_pool_lst1:line_allocate}. After that, we deallocate them in the second loop be calling \lstinline|FreeObject| on each pointer (see Line~\ref{lst:object_pool_lst1:line_free}). Listing~\ref{lst:object_pool_lst1} shows an example, where we create in Line~\ref{lst:object_pool_lst1:line_create} an object pool with five objects of type \lstinline|int|. If nothing else is specified, the object-pool uses a wait-free implementation. Then, we allocate five objects from the object pool and store the obtained pointers in a temporary array. The actual allocation takes place in Line~\ref{lst:object_pool_lst1:line_allocate}. After that, we deallocate them in the second loop be calling \lstinline|FreeObject| on each pointer (see Line~\ref{lst:object_pool_lst1:line_free}).
\lstinputlisting[caption={Object pool -- initialization, allocation and \lstinputlisting[caption={Object pool -- initialization, allocation and
deallocation},label={lst:object_pool_lst1}]{../examples/containers/object_pool-snippet.h} deallocation},label={lst:object_pool_lst1}]{../examples_raw/containers/object_pool-snippet.h}
For actually allocating and deallocating objects, the object pool's implementation relies on a value pool which keeps track of the objects in use. If the value pool is implemented in a lock-free manner, the object pool is lock-free as well (analogously for wait-free pools). Currently, \embb provides two value pools: \lstinline|WaitFreeArrayValuePool| and \lstinline|LockFreeTreeValuePool|. Normally (if nothing is specified), the wait-free pool is used. For having a lock-free object pool instead, one has to specify the corresponding value pool to use as additional template parameter. If we replace Line~\ref{lst:object_pool_lst1:line_create} of the previous example with the following lines, the object pool is not wait-free anymore but lock-free (the values are of type For actually allocating and deallocating objects, the object pool's implementation relies on a value pool which keeps track of the objects in use. If the value pool is implemented in a lock-free manner, the object pool is lock-free as well (analogously for wait-free pools). Currently, \embb provides two value pools: \lstinline|WaitFreeArrayValuePool| and \lstinline|LockFreeTreeValuePool|. Normally (if nothing is specified), the wait-free pool is used. For having a lock-free object pool instead, one has to specify the corresponding value pool to use as additional template parameter. If we replace Line~\ref{lst:object_pool_lst1:line_create} of the previous example with the following lines, the object pool is not wait-free anymore but lock-free (the values are of type
\lstinline|int| and initialized to \lstinline|0|): \lstinline|int| and initialized to \lstinline|0|):
% %
\lstinputlisting{../examples/containers/object_pool_2-snippet.h} \lstinputlisting{../examples_raw/containers/object_pool_2-snippet.h}
% %
This will result in a speed-up for most applications, but progress guarantees are weaker. This will result in a speed-up for most applications, but progress guarantees are weaker.
...@@ -30,7 +30,7 @@ This will result in a speed-up for most applications, but progress guarantees ar ...@@ -30,7 +30,7 @@ This will result in a speed-up for most applications, but progress guarantees ar
As the name indicates, the class template \lstinline|LockFreeStack| implements a lock-free stack which stores elements according to the LIFO (Last-In, First-Out) principle. Listing~\ref{lst:stack_lst1} shows a simple example. In Line~\ref{lst:stack_lst1:line_create}, we create a stack of integers with a capacity of 10 elements.\footnote{Due to the necessary over-provisioning of memory in thread-safe memory management, the stack might be able to hold more than 10 elements, but is guaranteed to be able to hold at least 10 elements.} The stack provides two methods, \lstinline|TryPush| and \lstinline|TryPop|, both returning a Boolean value indicating success of the operation: \lstinline|TryPop| returns \lstinline|false| if the stack is empty, and \lstinline|TryPush| returns false if the stack is full. \lstinline|TryPop| returns the element removed from the stack via reference. As the name indicates, the class template \lstinline|LockFreeStack| implements a lock-free stack which stores elements according to the LIFO (Last-In, First-Out) principle. Listing~\ref{lst:stack_lst1} shows a simple example. In Line~\ref{lst:stack_lst1:line_create}, we create a stack of integers with a capacity of 10 elements.\footnote{Due to the necessary over-provisioning of memory in thread-safe memory management, the stack might be able to hold more than 10 elements, but is guaranteed to be able to hold at least 10 elements.} The stack provides two methods, \lstinline|TryPush| and \lstinline|TryPop|, both returning a Boolean value indicating success of the operation: \lstinline|TryPop| returns \lstinline|false| if the stack is empty, and \lstinline|TryPush| returns false if the stack is full. \lstinline|TryPop| returns the element removed from the stack via reference.
\lstinputlisting[caption={Stack -- initialization, push and \lstinputlisting[caption={Stack -- initialization, push and
pop},label={lst:stack_lst1}]{../examples/containers/stack-snippet.h} pop},label={lst:stack_lst1}]{../examples_raw/containers/stack-snippet.h}
In Line~\ref{lst:stack_lst1:fail_pop} of Listing~\ref{lst:stack_lst1}, we try to pop an element from the empty stack, which has to fail. In the for-loop in Line \ref{lst:stack_lst1:loop1}, we fill the stack with \lstinline|int| values 0 $\ldots$ 4. Afterwards, in the loop in Line~\ref{lst:stack_lst1:loop2}, we pop five values (Line~\ref{lst:stack_lst1:pop}) from the stack into variable \lstinline|j|. According to the LIFO semantics, the values are popped in reverse order, i.e., we get the sequence 4 $\ldots$ 0. This is checked by the assertion in Line~\ref{lst:stack_lst1:assert}. In Line~\ref{lst:stack_lst1:fail_pop} of Listing~\ref{lst:stack_lst1}, we try to pop an element from the empty stack, which has to fail. In the for-loop in Line \ref{lst:stack_lst1:loop1}, we fill the stack with \lstinline|int| values 0 $\ldots$ 4. Afterwards, in the loop in Line~\ref{lst:stack_lst1:loop2}, we pop five values (Line~\ref{lst:stack_lst1:pop}) from the stack into variable \lstinline|j|. According to the LIFO semantics, the values are popped in reverse order, i.e., we get the sequence 4 $\ldots$ 0. This is checked by the assertion in Line~\ref{lst:stack_lst1:assert}.
...@@ -41,6 +41,6 @@ There are two FIFO (First-In, First-Out) queue implementations in \embb, \lstinl ...@@ -41,6 +41,6 @@ There are two FIFO (First-In, First-Out) queue implementations in \embb, \lstinl
Listing~\ref{lst:queue_lst1} shows an example for the \lstinline|LockFreeMPMCQueue|. In Line~\ref{lst:queue_lst1:line_create}, we create a queue with element type \lstinline|int| and a capacity of 10 elements.\footnote{As in case of stacks, the queue may actually hold more than 10 elements.} The Boolean return value of the methods \lstinline|TryEnqueue| and \lstinline|TryDequeue| indicates success (\lstinline|false| if the queue is full or empty, respectively). Listing~\ref{lst:queue_lst1} shows an example for the \lstinline|LockFreeMPMCQueue|. In Line~\ref{lst:queue_lst1:line_create}, we create a queue with element type \lstinline|int| and a capacity of 10 elements.\footnote{As in case of stacks, the queue may actually hold more than 10 elements.} The Boolean return value of the methods \lstinline|TryEnqueue| and \lstinline|TryDequeue| indicates success (\lstinline|false| if the queue is full or empty, respectively).
\lstinputlisting[caption={Queue -- initialization, enqueue and dequeue},label={lst:queue_lst1}]{../examples/containers/queues-snippet.h} \lstinputlisting[caption={Queue -- initialization, enqueue and dequeue},label={lst:queue_lst1}]{../examples_raw/containers/queues-snippet.h}
In Line~\ref{lst:queue_lst1:fail_pop} of Listing~\ref{lst:queue_lst1}, we try to dequeue an element from the empty queue, which has to fail. In the for-loop in Line~\ref{lst:queue_lst1:loop1}, we fill the queue with \lstinline|int| values 0 $\ldots$ 4. Afterwards, in the loop in Line~\ref{lst:queue_lst1:loop2}, we dequeue five values (Line~\ref{lst:queue_lst1:pop}) from the queue into variable \lstinline|j|. According to the FIFO semantics, the values are dequeued in the same order as they were enqueued, i.e., we get the sequence 0 $\ldots$ 4. This is checked by the assertion in Line~\ref{lst:queue_lst1:assert}. In Line~\ref{lst:queue_lst1:fail_pop} of Listing~\ref{lst:queue_lst1}, we try to dequeue an element from the empty queue, which has to fail. In the for-loop in Line~\ref{lst:queue_lst1:loop1}, we fill the queue with \lstinline|int| values 0 $\ldots$ 4. Afterwards, in the loop in Line~\ref{lst:queue_lst1:loop2}, we dequeue five values (Line~\ref{lst:queue_lst1:pop}) from the queue into variable \lstinline|j|. According to the FIFO semantics, the values are dequeued in the same order as they were enqueued, i.e., we get the sequence 0 $\ldots$ 4. This is checked by the assertion in Line~\ref{lst:queue_lst1:assert}.
...@@ -100,21 +100,21 @@ To run this program on a multicore processor, we may execute the above steps in ...@@ -100,21 +100,21 @@ To run this program on a multicore processor, we may execute the above steps in
This pipeline can be easily implemented using the Dataflow building block. As the first step, we have to include the \lstinline|dataflow.h| header file: This pipeline can be easily implemented using the Dataflow building block. As the first step, we have to include the \lstinline|dataflow.h| header file:
% %
\\\inputlisting{../examples/dataflow/dataflow_include-snippet.h} \\\inputlisting{../examples_raw/dataflow/dataflow_include-snippet.h}
% %
Then, we have to construct a \emph{network}. A network consists of a set of processes that are connected by communication channels. Then, we have to construct a \emph{network}. A network consists of a set of processes that are connected by communication channels.
%\footnote{Pipelines belong to the most simple networks, where the processes are connected in string-like (linear) fashion.} %\footnote{Pipelines belong to the most simple networks, where the processes are connected in string-like (linear) fashion.}
\embb provides a class template \lstinline|Network| that can be customized to your needs. For the moment, we are using 2 as a template argument: \embb provides a class template \lstinline|Network| that can be customized to your needs. For the moment, we are using 2 as a template argument:
% %
\\\inputlisting{../examples/dataflow/dataflow_network-snippet.h} \\\inputlisting{../examples_raw/dataflow/dataflow_network-snippet.h}
% %
As the next step, we have to construct the processes shown in Figure~\ref{fig:replace_par}. The easiest way to construct a process is to wrap the user-defined code in a lambda function and to pass it to the network. The network constructs an object for that process and executes the lambda function whenever new data is available. There are several methods for constructing processes depending on their type. The process \textbf{read} is a \emph{source} process, since it produces data (by reading it from the specified file) but does not consume any data. Source processes are constructed from a function object As the next step, we have to construct the processes shown in Figure~\ref{fig:replace_par}. The easiest way to construct a process is to wrap the user-defined code in a lambda function and to pass it to the network. The network constructs an object for that process and executes the lambda function whenever new data is available. There are several methods for constructing processes depending on their type. The process \textbf{read} is a \emph{source} process, since it produces data (by reading it from the specified file) but does not consume any data. Source processes are constructed from a function object
% %
\\\inputlisting{../examples/dataflow/dataflow_source_function-snippet.h} \\\inputlisting{../examples_raw/dataflow/dataflow_source_function-snippet.h}
% %
like this: like this:
% %
\\\inputlisting{../examples/dataflow/dataflow_declare_source-snippet.h} \\\inputlisting{../examples_raw/dataflow/dataflow_declare_source-snippet.h}
% %
%There are a couple of things to mention here. Firstly, %There are a couple of things to mention here. Firstly,
Note the template argument \lstinline|string| to \lstinline|Source|. This tells \embb that the process has exactly one \emph{port} of type \lstinline|string| and that this port is used to transmit data to other processes. The user-defined code can access the ports via the parameters of the function. Thus, each parameter corresponds to exactly one port. In our example, the result of the process is stored in a variable \lstinline|str|, which is passed by reference. Note the template argument \lstinline|string| to \lstinline|Source|. This tells \embb that the process has exactly one \emph{port} of type \lstinline|string| and that this port is used to transmit data to other processes. The user-defined code can access the ports via the parameters of the function. Thus, each parameter corresponds to exactly one port. In our example, the result of the process is stored in a variable \lstinline|str|, which is passed by reference.
...@@ -122,21 +122,21 @@ Note the template argument \lstinline|string| to \lstinline|Source|. This tells ...@@ -122,21 +122,21 @@ Note the template argument \lstinline|string| to \lstinline|Source|. This tells
The replacement of the strings can be done by a \emph{parallel} process, which means that multiple invocations of the process may be executed simultaneously. In general, processes that neither have any side effects nor maintain a state can safely be executed in parallel. This helps to avoid bottlenecks that arise when some processes are faster than others. Suppose, for example, that \textbf{replace} requires up to 50 ms to execute, whereas \textbf{read} and \textbf{write} each requires 10 ms to execute. If only one invocation of \textbf{replace} could be executed at a time, the throughput would be at most 20 elements per second. Since \textbf{replace} is a parallel process, however, the network may start a new invocation every 10 ms. Hence, up to five invocations may be executed in parallel, yielding a throughput of 100 elements per second. To compensate for variations in the runtime of parallel stages, \embb may execute them \emph{out-of-order}. As a result, the order in which the elements of a stream enter and leave parallel stages is not necessarily preserved. In our example, the runtime of \textbf{replace} may vary significantly due to the fact that not all lines have the same length and that the number of replacements depends on the content. However, let us now return to our example. The \textbf{replace} process is constructed from the function The replacement of the strings can be done by a \emph{parallel} process, which means that multiple invocations of the process may be executed simultaneously. In general, processes that neither have any side effects nor maintain a state can safely be executed in parallel. This helps to avoid bottlenecks that arise when some processes are faster than others. Suppose, for example, that \textbf{replace} requires up to 50 ms to execute, whereas \textbf{read} and \textbf{write} each requires 10 ms to execute. If only one invocation of \textbf{replace} could be executed at a time, the throughput would be at most 20 elements per second. Since \textbf{replace} is a parallel process, however, the network may start a new invocation every 10 ms. Hence, up to five invocations may be executed in parallel, yielding a throughput of 100 elements per second. To compensate for variations in the runtime of parallel stages, \embb may execute them \emph{out-of-order}. As a result, the order in which the elements of a stream enter and leave parallel stages is not necessarily preserved. In our example, the runtime of \textbf{replace} may vary significantly due to the fact that not all lines have the same length and that the number of replacements depends on the content. However, let us now return to our example. The \textbf{replace} process is constructed from the function
% %
\\\inputlisting{../examples/dataflow/dataflow_replace_function-snippet.h} \\\inputlisting{../examples_raw/dataflow/dataflow_replace_function-snippet.h}
% %
like this: like this:
% %
\\\inputlisting{../examples/dataflow/dataflow_declare_replace-snippet.h} \\\inputlisting{../examples_raw/dataflow/dataflow_declare_replace-snippet.h}
% %
The template parameter \lstinline|Network::Inputs<string>| specifies that the process has one port serving as input. Analogously, \lstinline|Network::Outputs<string>| specifies that there is one port serving as output. The template parameter \lstinline|Network::Inputs<string>| specifies that the process has one port serving as input. Analogously, \lstinline|Network::Outputs<string>| specifies that there is one port serving as output.
Since the last process (\textbf{write}) does not have any outputs, we make it a \emph{Sink}. Unlike parallel processes, sinks are always executed \emph{in-order}. \embb takes care that the elements are automatically reordered according to their original order in the stream. In this way, the externally visible behavior is preserved even if some parallel stages may be executed out-of-order. The function Since the last process (\textbf{write}) does not have any outputs, we make it a \emph{Sink}. Unlike parallel processes, sinks are always executed \emph{in-order}. \embb takes care that the elements are automatically reordered according to their original order in the stream. In this way, the externally visible behavior is preserved even if some parallel stages may be executed out-of-order. The function
% %
\\\inputlisting{../examples/dataflow/dataflow_sink_function-snippet.h} \\\inputlisting{../examples_raw/dataflow/dataflow_sink_function-snippet.h}
% %
is used to construct the sink: is used to construct the sink:
% %
\\\inputlisting{../examples/dataflow/dataflow_declare_sink-snippet.h} \\\inputlisting{../examples_raw/dataflow/dataflow_declare_sink-snippet.h}
% %
%In order to avoid that the received string is overwritten accidentally, the parameter \lstinline|str| corresponding to the input port of \lstinline|write| must be constant.\\ %In order to avoid that the received string is overwritten accidentally, the parameter \lstinline|str| corresponding to the input port of \lstinline|write| must be constant.\\
...@@ -144,16 +144,16 @@ is used to construct the sink: ...@@ -144,16 +144,16 @@ is used to construct the sink:
The network needs to know about the source declared above, so we add it to our network: The network needs to know about the source declared above, so we add it to our network:
% %
\\\inputlisting{../examples/dataflow/dataflow_add-snippet.h} \\\inputlisting{../examples_raw/dataflow/dataflow_add-snippet.h}
% %
As the last step, we have to connect the processes (ports). This is straightforward using the C++ stream operator: As the last step, we have to connect the processes (ports). This is straightforward using the C++ stream operator:
% %
\\\inputlisting{../examples/dataflow/dataflow_connect-snippet.h} \\\inputlisting{../examples_raw/dataflow/dataflow_connect-snippet.h}
% %
Once all connections have been established, we can start the network: Once all connections have been established, we can start the network:
% %
\\\inputlisting{../examples/dataflow/dataflow_run-snippet.h} \\\inputlisting{../examples_raw/dataflow/dataflow_run-snippet.h}
% %
The integer given as a template parameter to the network specifies the maximum number of elements that can be in the network at a time. The number of elements is limited to avoid that the network is flooded with new elements before the previous elements have been processed. In a linear pipeline, for example, this may happen if the source is faster than the sink. In our example, at most four elements may be processed simultaneously: one in the source, one in the sink, and two in the middle stage (see above). Finding an optimal value depends on the application and usually requires some experimentation. In general, large values boost the throughput but also increase the latency. Conversely, small values reduce the latency but may lead to a drop of performance in terms of throughput. The integer given as a template parameter to the network specifies the maximum number of elements that can be in the network at a time. The number of elements is limited to avoid that the network is flooded with new elements before the previous elements have been processed. In a linear pipeline, for example, this may happen if the source is faster than the sink. In our example, at most four elements may be processed simultaneously: one in the source, one in the sink, and two in the middle stage (see above). Finding an optimal value depends on the application and usually requires some experimentation. In general, large values boost the throughput but also increase the latency. Conversely, small values reduce the latency but may lead to a drop of performance in terms of throughput.
...@@ -295,7 +295,7 @@ Let us now consider the implementation of the sorting network using \embb. As in ...@@ -295,7 +295,7 @@ Let us now consider the implementation of the sorting network using \embb. As in
The following Listing shows the implementation of the source processes using classes instead of functions.\footnote{For the sake of brevity, we omit the functionality. A complete implementation can be found in the examples directory.} The following Listing shows the implementation of the source processes using classes instead of functions.\footnote{For the sake of brevity, we omit the functionality. A complete implementation can be found in the examples directory.}
% %
\\\inputlisting{../examples/dataflow/dataflow_producer-snippet.h} \\\inputlisting{../examples_raw/dataflow/dataflow_producer-snippet.h}
% %
%In order to use an instance of a class as a process, it must be derived from one of the predfined base classes. %In order to use an instance of a class as a process, it must be derived from one of the predfined base classes.
The class-based approach has several advantages besides the use of templates: Firstly, the creation of multiple processes is straightforward. Secondly, one can derive other processes from a given base class such as \lstinline|Producer|. Thirdly, it eases migration of existing code. For example, if you want to use an object of an existing class \lstinline|foo| as a process, you might derive a class \lstinline|bar| from \lstinline|foo|. The function operator of \lstinline|bar| then has access to the members provided by \lstinline|foo|. The class-based approach has several advantages besides the use of templates: Firstly, the creation of multiple processes is straightforward. Secondly, one can derive other processes from a given base class such as \lstinline|Producer|. Thirdly, it eases migration of existing code. For example, if you want to use an object of an existing class \lstinline|foo| as a process, you might derive a class \lstinline|bar| from \lstinline|foo|. The function operator of \lstinline|bar| then has access to the members provided by \lstinline|foo|.
...@@ -305,13 +305,13 @@ Each instance of the class \lstinline|Network| maintains a list of source proces ...@@ -305,13 +305,13 @@ Each instance of the class \lstinline|Network| maintains a list of source proces
% When you create a source process using \lstinline|MakeSource|, it is automatically added to this list. Otherwise, you must explicitly add it by a call to \lstinline|Add|. For example, if we want to feed our sorting network \lstinline|nw| with streams of integer values, we may write: % When you create a source process using \lstinline|MakeSource|, it is automatically added to this list. Otherwise, you must explicitly add it by a call to \lstinline|Add|. For example, if we want to feed our sorting network \lstinline|nw| with streams of integer values, we may write:
You must explicitly add all sources to the network by a call to \lstinline|AddSource|. For example, if we want to feed our sorting network \lstinline|nw| with four streams of integer values, we may write: You must explicitly add all sources to the network by a call to \lstinline|AddSource|. For example, if we want to feed our sorting network \lstinline|nw| with four streams of integer values, we may write:
% %
\\\inputlisting{../examples/dataflow/dataflow_declare_add_sources-snippet.h} \\\inputlisting{../examples_raw/dataflow/dataflow_declare_add_sources-snippet.h}
% %
This is only necessary for source processes. All other processes are automatically found via a depth-first search starting from the source processes. This is only necessary for source processes. All other processes are automatically found via a depth-first search starting from the source processes.
The code for the comparators looks like this: The code for the comparators looks like this:
% %
\\\inputlisting{../examples/dataflow/dataflow_comparator-snippet.h} \\\inputlisting{../examples_raw/dataflow/dataflow_comparator-snippet.h}
% %
Since the comparators neither have any side effects nor maintain a state, we allow multiple invocations to be executed in parallel. Since the comparators neither have any side effects nor maintain a state, we allow multiple invocations to be executed in parallel.
% by deriving the class \lstinline|comparator| from the base class \lstinline|network<>::parallel|. % by deriving the class \lstinline|comparator| from the base class \lstinline|network<>::parallel|.
...@@ -331,6 +331,6 @@ Since the comparators neither have any side effects nor maintain a state, we all ...@@ -331,6 +331,6 @@ Since the comparators neither have any side effects nor maintain a state, we all
To check whether the resulting values are sorted, we use a single sink with four inputs: To check whether the resulting values are sorted, we use a single sink with four inputs:
% %
\\\inputlisting{../examples/dataflow/dataflow_consumer-snippet.h} \\\inputlisting{../examples_raw/dataflow/dataflow_consumer-snippet.h}
% %
In general, however, we could also have a sink for each output of the sorting network. There is no restriction on the number of sources and sinks a network may have. In general, however, we could also have a sink for each output of the sorting network. There is no restriction on the number of sources and sinks a network may have.
...@@ -32,30 +32,30 @@ Throughout this tutorial, we will encounter C++ types which model the C++ concep ...@@ -32,30 +32,30 @@ Throughout this tutorial, we will encounter C++ types which model the C++ concep
Consider, for example, the transformation of an iterable range of data values. Specifically, consider a vector of integers initialized as follows: Consider, for example, the transformation of an iterable range of data values. Specifically, consider a vector of integers initialized as follows:
% %
\\\inputlisting{../examples/stl_for_each/setup-snippet.h} \\\inputlisting{../examples_raw/stl_for_each/setup-snippet.h}
% %
The range consists of the values (\lstinline|1, 2, 3, 4, 5|) and we now want to double each of them. We could simply get an iterator from the container holding the range, iterate over every element, and multiply it by two: The range consists of the values (\lstinline|1, 2, 3, 4, 5|) and we now want to double each of them. We could simply get an iterator from the container holding the range, iterate over every element, and multiply it by two:
% %
\\\inputlisting{../examples/stl_for_each/manual-snippet.h} \\\inputlisting{../examples_raw/stl_for_each/manual-snippet.h}
% %
The range then contains the values (\lstinline|2, 4, 6, 8, 10|). In order to demonstrate the concept of function objects, we are now going to use the \lstinline|std::for_each| function defined in the \lstinline|algorithm| header of the C++ Standard Library. This function accepts as argument a \lstinline|UnaryFunction|, that is, a function object which takes only one argument. In case of \lstinline|std::for_each|, the argument has to have the same type as the elements in the range, as these are passed to the unary function. In our example, the unary function's task is to double the incoming value. We could define a function for that purpose: The range then contains the values (\lstinline|2, 4, 6, 8, 10|). In order to demonstrate the concept of function objects, we are now going to use the \lstinline|std::for_each| function defined in the \lstinline|algorithm| header of the C++ Standard Library. This function accepts as argument a \lstinline|UnaryFunction|, that is, a function object which takes only one argument. In case of \lstinline|std::for_each|, the argument has to have the same type as the elements in the range, as these are passed to the unary function. In our example, the unary function's task is to double the incoming value. We could define a function for that purpose:
% %
\\\inputlisting{../examples/stl_for_each/function_define-snippet.h} \\\inputlisting{../examples_raw/stl_for_each/function_define-snippet.h}
% %
Since a function pointer models the concept of function objects, we can simply pass \lstinline|&DoubleFunction| to \lstinline|std::for_each|: Since a function pointer models the concept of function objects, we can simply pass \lstinline|&DoubleFunction| to \lstinline|std::for_each|:
% %
\\\inputlisting{../examples/stl_for_each/function-snippet.h} \\\inputlisting{../examples_raw/stl_for_each/function-snippet.h}
% %
Another possibility is to define a functor Another possibility is to define a functor
% %
\\\inputlisting{../examples/stl_for_each/functor_define-snippet.h} \\\inputlisting{../examples_raw/stl_for_each/functor_define-snippet.h}
% %
and to pass an instance of this class to \lstinline|std::for_each|: and to pass an instance of this class to \lstinline|std::for_each|:
% %
\\\inputlisting{../examples/stl_for_each/functor-snippet.h} \\\inputlisting{../examples_raw/stl_for_each/functor-snippet.h}
% %
Functors as well as the function pointers separate the actual implementation from its place of usage which can be useful if the functionality is needed at different places. In many cases, however, it is advantageous to have the implementation of the function object at the same place as it is used. C++11 provides lambda expressions for that purpose which make our example more concise: Functors as well as the function pointers separate the actual implementation from its place of usage which can be useful if the functionality is needed at different places. In many cases, however, it is advantageous to have the implementation of the function object at the same place as it is used. C++11 provides lambda expressions for that purpose which make our example more concise:
% %
\\\inputlisting{../examples/stl_for_each/lambda-snippet.h} \\\inputlisting{../examples_raw/stl_for_each/lambda-snippet.h}
% %
Of course, this example is too simple to really benefit from function objects and the algorithms contained in the C++ Standard Library. However, in combination with the parallelization features provided by \embb, function objects are a helpful mechanism to boost productivity. Within this document, whenever a function object or one of its subtypes is required, one can use a function pointer, a functor, or a lambda. For simplicity, we will restrict ourselves to lambdas in subsequent examples, as they are most suitable for this kind of tutorial. Of course, this example is too simple to really benefit from function objects and the algorithms contained in the C++ Standard Library. However, in combination with the parallelization features provided by \embb, function objects are a helpful mechanism to boost productivity. Within this document, whenever a function object or one of its subtypes is required, one can use a function pointer, a functor, or a lambda. For simplicity, we will restrict ourselves to lambdas in subsequent examples, as they are most suitable for this kind of tutorial.
...@@ -100,53 +100,53 @@ void main(void) { ...@@ -100,53 +100,53 @@ void main(void) {
This algorithm can be parallelized by spawning a task for one of the recursive calls (\lstinline|fib(n - 1)|, for example). When doing this with MTAPI, an action function that represents \lstinline|fib(int n)| is needed. It has the following signature: This algorithm can be parallelized by spawning a task for one of the recursive calls (\lstinline|fib(n - 1)|, for example). When doing this with MTAPI, an action function that represents \lstinline|fib(int n)| is needed. It has the following signature:
% %
\\\inputlisting{../examples/mtapi/mtapi_c_action_signature-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_c_action_signature-snippet.h}
% %
Within the action function, the arguments should be checked, since the user might supply a buffer that is too small: Within the action function, the arguments should be checked, since the user might supply a buffer that is too small:
% %
\\\inputlisting{../examples/mtapi/mtapi_c_validate_arguments-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_c_validate_arguments-snippet.h}
% %
Here, \lstinline|mtapi_context_status_set()| is used to report errors. The error code will be returned by \lstinline|mtapi_task_wait()|. Also, care has to be taken when using the result buffer. The user might not want to use the result and supply a NULL pointer or accidentally a buffer that is too small: Here, \lstinline|mtapi_context_status_set()| is used to report errors. The error code will be returned by \lstinline|mtapi_task_wait()|. Also, care has to be taken when using the result buffer. The user might not want to use the result and supply a NULL pointer or accidentally a buffer that is too small:
% %
\\\inputlisting{../examples/mtapi/mtapi_c_validate_result_buffer-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_c_validate_result_buffer-snippet.h}
% %
At this point, calculation of the result can commence. First, the terminating condition of the recursion is checked: At this point, calculation of the result can commence. First, the terminating condition of the recursion is checked:
% %
\\\inputlisting{../examples/mtapi/mtapi_terminating_condition-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_terminating_condition-snippet.h}
% %
After that, the first part of the computation is launched as a task using \lstinline|mtapi_task_start()| (the action function is registered with the job \lstinline|FIBONACCI_JOB| in the \lstinline|fibonacci()| function and the resulting handle is stored in the global variable \lstinline|mtapi_job_hndl_t fibonacciJob|): After that, the first part of the computation is launched as a task using \lstinline|mtapi_task_start()| (the action function is registered with the job \lstinline|FIBONACCI_JOB| in the \lstinline|fibonacci()| function and the resulting handle is stored in the global variable \lstinline|mtapi_job_hndl_t fibonacciJob|):
% %
\\\inputlisting{../examples/mtapi/mtapi_c_calc_task-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_c_calc_task-snippet.h}
% %
The second part can be executed directly: The second part can be executed directly:
% %
\\\inputlisting{../examples/mtapi/mtapi_c_calc_direct-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_c_calc_direct-snippet.h}
% %
Then, completion of the MTAPI task has to be waited for by calling \lstinline|mtapi_task_wait()|: Then, completion of the MTAPI task has to be waited for by calling \lstinline|mtapi_task_wait()|:
% %
\\\inputlisting{../examples/mtapi/mtapi_c_wait_task-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_c_wait_task-snippet.h}
% %
Finally, the results can be added and written into the result buffer: Finally, the results can be added and written into the result buffer:
% %
\\\inputlisting{../examples/mtapi/mtapi_write_back-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_write_back-snippet.h}
% %
The \lstinline|fibonacci()| function gets a bit more complicated now. The MTAPI runtime has to be initialized first by (optionally) initializing node attributes and then calling \lstinline|mtapi_initialize()|: The \lstinline|fibonacci()| function gets a bit more complicated now. The MTAPI runtime has to be initialized first by (optionally) initializing node attributes and then calling \lstinline|mtapi_initialize()|:
% %
\\\inputlisting{../examples/mtapi/mtapi_c_initialize-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_c_initialize-snippet.h}
% %
Then, the action function needs to be associated to a job. By calling \lstinline|mtapi_action_create()|, the action function is registered with the job \lstinline|FIBONACCI_JOB|. The job handle of this job is stored in the global variable \lstinline|mtapi_job_hndl_t fibonacciJob| so that it can be accessed by the action function later on: Then, the action function needs to be associated to a job. By calling \lstinline|mtapi_action_create()|, the action function is registered with the job \lstinline|FIBONACCI_JOB|. The job handle of this job is stored in the global variable \lstinline|mtapi_job_hndl_t fibonacciJob| so that it can be accessed by the action function later on:
% %
\\\inputlisting{../examples/mtapi/mtapi_c_register_action-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_c_register_action-snippet.h}
% %
Now that the action is registered with a job, the root task can be started with \lstinline|mtapi_task_start()|: Now that the action is registered with a job, the root task can be started with \lstinline|mtapi_task_start()|:
% %
\\\inputlisting{../examples/mtapi/mtapi_c_start_task-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_c_start_task-snippet.h}
% %
%The started task has to be waited for before the result can be returned. %The started task has to be waited for before the result can be returned.
After everything is done, the action is deleted (\lstinline|mtapi_action_delete()|) and the runtime is shut down (\lstinline|mtapi_finalize()|): After everything is done, the action is deleted (\lstinline|mtapi_action_delete()|) and the runtime is shut down (\lstinline|mtapi_finalize()|):
% %
\\\inputlisting{../examples/mtapi/mtapi_c_finalize-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_c_finalize-snippet.h}
% %
\section{C++ Interface} \section{C++ Interface}
...@@ -154,54 +154,54 @@ After everything is done, the action is deleted (\lstinline|mtapi_action_delete( ...@@ -154,54 +154,54 @@ After everything is done, the action is deleted (\lstinline|mtapi_action_delete(
\embb provides C++ wrappers for the MTAPI C interface. The signature of the action function for the C++ interface is the same as in the C interface: \embb provides C++ wrappers for the MTAPI C interface. The signature of the action function for the C++ interface is the same as in the C interface:
% %
\\\inputlisting{../examples/mtapi/mtapi_c_action_signature-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_c_action_signature-snippet.h}
% %
Checking argument and result buffer sizes is the same as in the C example. Also, the terminating condition of the recursion still needs to be checked: Checking argument and result buffer sizes is the same as in the C example. Also, the terminating condition of the recursion still needs to be checked:
% %
\\\inputlisting{../examples/mtapi/mtapi_terminating_condition-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_terminating_condition-snippet.h}
% %
After that, the first part of the computation is launched as an MTAPI task using \lstinline|embb::mtapi::Node::Start()| (the action function is registered with the job \lstinline|FIBONACCI_JOB| in the \lstinline|fibonacci()| function and the resulting handle is stored in the global variable \lstinline|embb::mtapi::Job fibonacciJob|): After that, the first part of the computation is launched as an MTAPI task using \lstinline|embb::mtapi::Node::Start()| (the action function is registered with the job \lstinline|FIBONACCI_JOB| in the \lstinline|fibonacci()| function and the resulting handle is stored in the global variable \lstinline|embb::mtapi::Job fibonacciJob|):
% %
\\\inputlisting{../examples/mtapi/mtapi_cpp_calc_task-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_cpp_calc_task-snippet.h}
% %
The second part can be executed directly: The second part can be executed directly:
% %
\\\inputlisting{../examples/mtapi/mtapi_c_calc_direct-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_c_calc_direct-snippet.h}
% %
Then, completion of the MTAPI task has to be waited for using \lstinline|embb::mtapi::Task::Wait()|: Then, completion of the MTAPI task has to be waited for using \lstinline|embb::mtapi::Task::Wait()|:
% %
\\\inputlisting{../examples/mtapi/mtapi_cpp_wait_task-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_cpp_wait_task-snippet.h}
% %
Finally, the two parts can be added and written into the result buffer: Finally, the two parts can be added and written into the result buffer:
% %
\\\inputlisting{../examples/mtapi/mtapi_write_back-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_write_back-snippet.h}
% %
Note that there is no need to do error checking everywhere, since errors are reported as exceptions. In this example there is only a single try/catch block in the main function: Note that there is no need to do error checking everywhere, since errors are reported as exceptions. In this example there is only a single try/catch block in the main function:
% %
\\\inputlisting{../examples/mtapi/mtapi_cpp_main-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_cpp_main-snippet.h}
% %
The \lstinline|fibonacci()| function is about the same as in the C version. The MTAPI runtime needs to be initialized first: The \lstinline|fibonacci()| function is about the same as in the C version. The MTAPI runtime needs to be initialized first:
% %
\\\inputlisting{../examples/mtapi/mtapi_cpp_initialize-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_cpp_initialize-snippet.h}
% %
Then the node instance can to be fetched: Then the node instance can to be fetched:
% %
\\\inputlisting{../examples/mtapi/mtapi_cpp_get_node-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_cpp_get_node-snippet.h}
% %
After that, the action function needs to be associated to a job. By instancing an \lstinline|embb::mtap::Action| object, the action function is registered with the job \lstinline|FIBONACCI_JOB|. The job is stored in the global variable \lstinline|embb::mtapi::Job fibonacciJob| so that it can be accessed by the action function later on: After that, the action function needs to be associated to a job. By instancing an \lstinline|embb::mtap::Action| object, the action function is registered with the job \lstinline|FIBONACCI_JOB|. The job is stored in the global variable \lstinline|embb::mtapi::Job fibonacciJob| so that it can be accessed by the action function later on:
% %
\\\inputlisting{../examples/mtapi/mtapi_cpp_register_action-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_cpp_register_action-snippet.h}
% %
Not that the action is registered and the job is initialized, the root task can be started: Not that the action is registered and the job is initialized, the root task can be started:
% %
\\\inputlisting{../examples/mtapi/mtapi_cpp_start_task-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_cpp_start_task-snippet.h}
% %
Again, the started task has to be waited for (using \lstinline|embb::mtapi::Task::Wait()|) before the result can be returned. Again, the started task has to be waited for (using \lstinline|embb::mtapi::Task::Wait()|) before the result can be returned.
The registered action will be unregistered when it goes out of scope. The registered action will be unregistered when it goes out of scope.
The runtime needs to be shut down by calling: The runtime needs to be shut down by calling:
\\\inputlisting{../examples/mtapi/mtapi_cpp_finalize-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_cpp_finalize-snippet.h}
\section{Plugins} \section{Plugins}
...@@ -234,32 +234,32 @@ The plugin action is implemented through 3 callbacks, task start, task cancel an ...@@ -234,32 +234,32 @@ The plugin action is implemented through 3 callbacks, task start, task cancel an
For illustration our example plugin will provide a no-op action. The task start callback in that case looks like this: For illustration our example plugin will provide a no-op action. The task start callback in that case looks like this:
% %
\\\inputlisting{../examples/mtapi/mtapi_c_plugin_task_start_cb-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_c_plugin_task_start_cb-snippet.h}
% %
The scheduling operation is responsible for bringing the task to execution, this might involve instructing some hardware to execute the task or pushing the task into a queue for execution by a separate worker thread. Here however, the task is executed directly: The scheduling operation is responsible for bringing the task to execution, this might involve instructing some hardware to execute the task or pushing the task into a queue for execution by a separate worker thread. Here however, the task is executed directly:
% %
\\\inputlisting{../examples/mtapi/mtapi_c_plugin_task_schedule-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_c_plugin_task_schedule-snippet.h}
% %
Since the task gets executed right away, it cannot be canceled and the task cancel callback implementation is empty: Since the task gets executed right away, it cannot be canceled and the task cancel callback implementation is empty:
% %
\\\inputlisting{../examples/mtapi/mtapi_c_plugin_task_cancel_cb-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_c_plugin_task_cancel_cb-snippet.h}
% %
The plugin action did not acquire any resources so the action finalize callback is empty as well: The plugin action did not acquire any resources so the action finalize callback is empty as well:
% %
\\\inputlisting{../examples/mtapi/mtapi_c_plugin_action_finalize_cb-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_c_plugin_action_finalize_cb-snippet.h}
% %
Now that the callbacks are in place, the action can be registered with a job after the node was initialized using \lstinline|mtapi_initialize()|: Now that the callbacks are in place, the action can be registered with a job after the node was initialized using \lstinline|mtapi_initialize()|:
% %
\\\inputlisting{../examples/mtapi/mtapi_c_plugin_action_create-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_c_plugin_action_create-snippet.h}
% %
The job handle can now be obtained the normal MTAPI way. The fact that there is a plugin working behind the scenes is transparent by now: The job handle can now be obtained the normal MTAPI way. The fact that there is a plugin working behind the scenes is transparent by now:
% %
\\\inputlisting{../examples/mtapi/mtapi_c_plugin_get_job-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_c_plugin_get_job-snippet.h}
% %
Using the job handle tasks can be started like normal MTAPI tasks: Using the job handle tasks can be started like normal MTAPI tasks:
% %
\\\inputlisting{../examples/mtapi/mtapi_c_plugin_task_start-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_c_plugin_task_start-snippet.h}
% %
This call will lead to the invocation of then \lstinline|plugin_task_start| callback function, where the plugin implementor is responsible for bringing the task to execution. This call will lead to the invocation of then \lstinline|plugin_task_start| callback function, where the plugin implementor is responsible for bringing the task to execution.
...@@ -267,33 +267,33 @@ This call will lead to the invocation of then \lstinline|plugin_task_start| call ...@@ -267,33 +267,33 @@ This call will lead to the invocation of then \lstinline|plugin_task_start| call
The MTAPI network plugin provides a means to distribute tasks over a TCP/IP network. As an example the following vector addition action is used: The MTAPI network plugin provides a means to distribute tasks over a TCP/IP network. As an example the following vector addition action is used:
% %
\\\inputlisting{../examples/mtapi/mtapi_network_c_action_function-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_network_c_action_function-snippet.h}
% %
It adds two float vectors and a float from node local data and writes the result into the result float vector. In the example code the vectors will hold \lstinline|kElements| floats each. It adds two float vectors and a float from node local data and writes the result into the result float vector. In the example code the vectors will hold \lstinline|kElements| floats each.
To use the network plugin, its header file needs to be included first: To use the network plugin, its header file needs to be included first:
% %
\\\inputlisting{../examples/mtapi/mtapi_network_c_header-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_network_c_header-snippet.h}
% %
After initializing the node using \lstinline|mtapi_initialize()|, the plugin itself needs to be initialized: After initializing the node using \lstinline|mtapi_initialize()|, the plugin itself needs to be initialized:
% %
\\\inputlisting{../examples/mtapi/mtapi_network_c_plugin_initialize-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_network_c_plugin_initialize-snippet.h}
% %
This will set up a listening socket on the localhost interface (127.0.0.1) at port 12345. The socket will allow a maximum of 5 connections and have a maximum transfer buffer size of \lstinline|kElements * 4 * 3 + 32|. This buffer size needs to be big enough to fit at least the argument and result buffer sizes at once. The example uses 3 vectors of \lstinline|kElements| floats using \lstinline|kElements * sizeof(float) * 3| bytes. This will set up a listening socket on the localhost interface (127.0.0.1) at port 12345. The socket will allow a maximum of 5 connections and have a maximum transfer buffer size of \lstinline|kElements * 4 * 3 + 32|. This buffer size needs to be big enough to fit at least the argument and result buffer sizes at once. The example uses 3 vectors of \lstinline|kElements| floats using \lstinline|kElements * sizeof(float) * 3| bytes.
Since the example connects to itself on localhost, the "remote" action needs to be registered with the \lstinline|NETWORK_REMOTE_JOB|: Since the example connects to itself on localhost, the "remote" action needs to be registered with the \lstinline|NETWORK_REMOTE_JOB|:
% %
\\\inputlisting{../examples/mtapi/mtapi_network_c_remote_action_create-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_network_c_remote_action_create-snippet.h}
% %
After that, the local network action is created, that maps \lstinline|NETWORK_LOCAL_JOB| to \lstinline|NETWORK_REMOTE_JOB| through the network: After that, the local network action is created, that maps \lstinline|NETWORK_LOCAL_JOB| to \lstinline|NETWORK_REMOTE_JOB| through the network:
% %
\\\inputlisting{../examples/mtapi/mtapi_network_c_local_action_create-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_network_c_local_action_create-snippet.h}
% %
Now, \lstinline|NETWORK_LOCAL_JOB| can be used to execute tasks by simply calling \lstinline|mtapi_task_start()|. Their parameters will be transmitted through a socket connection and are consumed by the network plugin worker thread. The thread will start a task using the \lstinline|NETWORK_REMOTE_JOB|. When this task is finished, the results will be collected and sent back through the network. Again the network plugin thread will receive the results, provide them to the \lstinline|NETWORK_LOCAL_JOB| task and mark that task as finished. Now, \lstinline|NETWORK_LOCAL_JOB| can be used to execute tasks by simply calling \lstinline|mtapi_task_start()|. Their parameters will be transmitted through a socket connection and are consumed by the network plugin worker thread. The thread will start a task using the \lstinline|NETWORK_REMOTE_JOB|. When this task is finished, the results will be collected and sent back through the network. Again the network plugin thread will receive the results, provide them to the \lstinline|NETWORK_LOCAL_JOB| task and mark that task as finished.
When all work is done, the plugin needs to be finalized. This will stop the plugin worker thread and close the sockets: When all work is done, the plugin needs to be finalized. This will stop the plugin worker thread and close the sockets:
% %
\\\inputlisting{../examples/mtapi/mtapi_network_c_plugin_finalize-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_network_c_plugin_finalize-snippet.h}
% %
Then the node may be finalized by calling \lstinline|mtapi_finalize()|. Then the node may be finalized by calling \lstinline|mtapi_finalize()|.
...@@ -303,19 +303,19 @@ The MTAPI OpenCL plugin allows the user to incorporate the computational power o ...@@ -303,19 +303,19 @@ The MTAPI OpenCL plugin allows the user to incorporate the computational power o
The vector addition example from the network plugin is used again. However, the action function is an OpenCL kernel now: The vector addition example from the network plugin is used again. However, the action function is an OpenCL kernel now:
% %
\\\inputlisting{../examples/mtapi/mtapi_opencl_c_kernel-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_opencl_c_kernel-snippet.h}
% %
The OpenCL plugin header file needs to be included first: The OpenCL plugin header file needs to be included first:
% %
\\\inputlisting{../examples/mtapi/mtapi_opencl_c_header-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_opencl_c_header-snippet.h}
% %
As with the network plugin, the OpenCL plugin needs to be initialized after the node has been initialized: As with the network plugin, the OpenCL plugin needs to be initialized after the node has been initialized:
% %
\\\inputlisting{../examples/mtapi/mtapi_opencl_c_plugin_initialize-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_opencl_c_plugin_initialize-snippet.h}
% %
Then the plugin action can be registered with the \lstinline|OPENCL_JOB|: Then the plugin action can be registered with the \lstinline|OPENCL_JOB|:
% %
\\\inputlisting{../examples/mtapi/mtapi_opencl_c_action_create-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_opencl_c_action_create-snippet.h}
% %
The kernel source and the name of the kernel to use (AddVector) need to be specified while creating the action. The kernel will be compiled using the OpenCL runtime and the provided node local data transferred to accelerator memory. The local work size is the number of threads that will share OpenCL local memory, in this case 32. The element size instructs the OpenCL plugin how many bytes a single element in the result buffer consumes, in this case 4, as a single result is a single float. The OpenCL plugin will launch \lstinline|result_buffer_size/element_size| OpenCL threads to calculate the result. The kernel source and the name of the kernel to use (AddVector) need to be specified while creating the action. The kernel will be compiled using the OpenCL runtime and the provided node local data transferred to accelerator memory. The local work size is the number of threads that will share OpenCL local memory, in this case 32. The element size instructs the OpenCL plugin how many bytes a single element in the result buffer consumes, in this case 4, as a single result is a single float. The OpenCL plugin will launch \lstinline|result_buffer_size/element_size| OpenCL threads to calculate the result.
...@@ -323,5 +323,5 @@ Now the \lstinline|OPENCL_JOB| can be used like a normal MTAPI job to start task ...@@ -323,5 +323,5 @@ Now the \lstinline|OPENCL_JOB| can be used like a normal MTAPI job to start task
After all work is done, the plugin needs to be finalized. This will free all memory on the accelerator and delete the corresponding OpenCL context: After all work is done, the plugin needs to be finalized. This will free all memory on the accelerator and delete the corresponding OpenCL context:
% %
\\\inputlisting{../examples/mtapi/mtapi_opencl_c_plugin_finalize-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_opencl_c_plugin_finalize-snippet.h}
% %
...@@ -3,42 +3,42 @@ ...@@ -3,42 +3,42 @@
\embb provides a simple task management wrapper for the MTAPI interface. Using the example from the previous section, the signature of the action function for the tasks interface looks like this: \embb provides a simple task management wrapper for the MTAPI interface. Using the example from the previous section, the signature of the action function for the tasks interface looks like this:
% %
\\\inputlisting{../examples/tasks/tasks_cpp_action_signature-snippet.h} \\\inputlisting{../examples_raw/tasks/tasks_cpp_action_signature-snippet.h}
% %
First, the node instance needs to be obtained. If the node is not initialized yet, this function will do it. First, the node instance needs to be obtained. If the node is not initialized yet, this function will do it.
% %
\\\inputlisting{../examples/tasks/tasks_cpp_get_node-snippet.h} \\\inputlisting{../examples_raw/tasks/tasks_cpp_get_node-snippet.h}
% %
\emph{\textbf{Note:} Automatic initialization allows for easy usage of the \emph{Algorithms} and \emph{Dataflow} building blocks. For performance measurements however, explicit initialization by calling \lstinline|embb::tasks::Node::Initialize| is imperative since the measurements will otherwise include the initialization time of MTAPI.} \emph{\textbf{Note:} Automatic initialization allows for easy usage of the \emph{Algorithms} and \emph{Dataflow} building blocks. For performance measurements however, explicit initialization by calling \lstinline|embb::tasks::Node::Initialize| is imperative since the measurements will otherwise include the initialization time of MTAPI.}
Checking the arguments and the result buffer is not necessary, since everything is safely typed. However, the terminating condition of the recursion still needs to be checked: Checking the arguments and the result buffer is not necessary, since everything is safely typed. However, the terminating condition of the recursion still needs to be checked:
% %
\\\inputlisting{../examples/mtapi/mtapi_terminating_condition-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_terminating_condition-snippet.h}
% %
After that, the first part of the computation is launched as an MTAPI task using \lstinline|embb::tasks::Node::Spawn()| (registering an action function with a job is done automatically): After that, the first part of the computation is launched as an MTAPI task using \lstinline|embb::tasks::Node::Spawn()| (registering an action function with a job is done automatically):
% %
\\\inputlisting{../examples/tasks/tasks_cpp_calc_task-snippet.h} \\\inputlisting{../examples_raw/tasks/tasks_cpp_calc_task-snippet.h}
% %
The second part can be executed directly: The second part can be executed directly:
% %
\\\inputlisting{../examples/tasks/tasks_cpp_calc_direct-snippet.h} \\\inputlisting{../examples_raw/tasks/tasks_cpp_calc_direct-snippet.h}
% %
Then, completion of the MTAPI task has to be waited for using \lstinline|embb::tasks::Task::Wait()|: Then, completion of the MTAPI task has to be waited for using \lstinline|embb::tasks::Task::Wait()|:
% %
\\\inputlisting{../examples/tasks/tasks_cpp_wait_task-snippet.h} \\\inputlisting{../examples_raw/tasks/tasks_cpp_wait_task-snippet.h}
% %
Finally, the two parts can be added and written into the result buffer: Finally, the two parts can be added and written into the result buffer:
% %
\\\inputlisting{../examples/mtapi/mtapi_write_back-snippet.h} \\\inputlisting{../examples_raw/mtapi/mtapi_write_back-snippet.h}
% %
The \lstinline|fibonacci()| function also gets simpler compared to the C version. The MTAPI runtime is initialized automatically, only the node instance has to be fetched: The \lstinline|fibonacci()| function also gets simpler compared to the C version. The MTAPI runtime is initialized automatically, only the node instance has to be fetched:
% %
\\\inputlisting{../examples/tasks/tasks_cpp_get_node-snippet.h} \\\inputlisting{../examples_raw/tasks/tasks_cpp_get_node-snippet.h}
% %
The root task can be started using \lstinline|embb::tasks::Node::Spawn()| directly, registering with a job is done automatically: The root task can be started using \lstinline|embb::tasks::Node::Spawn()| directly, registering with a job is done automatically:
% %
\\\inputlisting{../examples/tasks/tasks_cpp_start_task-snippet.h} \\\inputlisting{../examples_raw/tasks/tasks_cpp_start_task-snippet.h}
% %
Again, the started task has to be waited for (using \lstinline|embb::tasks::Task::Wait()|) before the result can be returned. The runtime is shut down automatically in an \lstinline|atexit()| handler. Again, the started task has to be waited for (using \lstinline|embb::tasks::Task::Wait()|) before the result can be returned. The runtime is shut down automatically in an \lstinline|atexit()| handler.
......
...@@ -214,6 +214,7 @@ redirect_cmd rsync \ ...@@ -214,6 +214,7 @@ redirect_cmd rsync \
--exclude "scripts/license.*" \ --exclude "scripts/license.*" \
--exclude "scripts/license_*" \ --exclude "scripts/license_*" \
--exclude "scripts/remove_license.sh" \ --exclude "scripts/remove_license.sh" \
--exclude "scripts/merge_examples.sh" \
--exclude "mtapi/MTAPI.mm" \ --exclude "mtapi/MTAPI.mm" \
--exclude ".cproject" \ --exclude ".cproject" \
--exclude ".gitattributes" \ --exclude ".gitattributes" \
...@@ -283,9 +284,9 @@ REFMAN_SOURCE="$MYTMPDIR_DOXY_BUILD/latex/refman.pdf" ...@@ -283,9 +284,9 @@ REFMAN_SOURCE="$MYTMPDIR_DOXY_BUILD/latex/refman.pdf"
echo "--> Integrating Example Snippets" echo "--> Integrating Example Snippets"
REMEMBER_CUR_DIR=$(pwd) REMEMBER_CUR_DIR=$(pwd)
EXAMPLES_DIR="$MYTMPDIR_BUILD/doc/examples" EXAMPLES_DIR="$MYTMPDIR_BUILD/doc/examples_raw"
INTEGRATE_SNIPPETS_SCRIPT="insert_snippets.py" INTEGRATE_SNIPPETS_SCRIPT="insert_snippets.py"
EXAMPLES_TARGET_DIR="$MYTMPDIR/${n}/doc/" EXAMPLES_TARGET_DIR="$MYTMPDIR/${n}/doc/examples"
if [ -f $EXAMPLES_DIR/$INTEGRATE_SNIPPETS_SCRIPT ]; then if [ -f $EXAMPLES_DIR/$INTEGRATE_SNIPPETS_SCRIPT ]; then
cd "$EXAMPLES_DIR" cd "$EXAMPLES_DIR"
...@@ -294,10 +295,15 @@ if [ -f $EXAMPLES_DIR/$INTEGRATE_SNIPPETS_SCRIPT ]; then ...@@ -294,10 +295,15 @@ if [ -f $EXAMPLES_DIR/$INTEGRATE_SNIPPETS_SCRIPT ]; then
echo "---> Calling integrate script" echo "---> Calling integrate script"
redirect_cmd python insert_snippets.py redirect_cmd python insert_snippets.py
if [ ! -d $EXAMPLES_TARGET_DIR ]; then
echo "---> Examples target dir does not exist. Creating..."
redirect_cmd mkdir $EXAMPLES_TARGET_DIR
fi
if [ -d $EXAMPLES_TARGET_DIR ]; then if [ -d $EXAMPLES_TARGET_DIR ]; then
echo "---> Copy integrated examples back" echo "---> Copy integrated examples back"
#The examples have been integrated. Copy the integrated source files. #The examples have been integrated. Copy the integrated source files.
redirect_cmd rsync --archive --recursive $EXAMPLES_DIR $EXAMPLES_TARGET_DIR \ redirect_cmd rsync --archive --recursive "$EXAMPLES_DIR/" "$EXAMPLES_TARGET_DIR/" \
--exclude=*snippet.h \ --exclude=*snippet.h \
--exclude=*fragmented.h \ --exclude=*fragmented.h \
--exclude=*snippet.cc \ --exclude=*snippet.cc \
......
#!/usr/bin/env bash
# Copyright (c) 2014-2015, Siemens AG. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
#function for printing usage
usage() {
echo "This script is used by jenkins, to merge examples from examples_raw";
echo "into examples. Should not be called manually.";
echo ""
echo "Example call (from the scripts directory as working directory):";
echo "$0 -d ../";
echo "";
echo "Usage: $0 [-d <root project dir>]" 1>&2; exit 1;
}
#check if all dependencies are fulfilled
for DEPENDENCY in rsync mktemp cd grep cmake echo python realpath mkdir git
do
command -v $DEPENDENCY >/dev/null 2>&1 || { echo >&2 "This script requires $DEPENDENCY but it's not installed. Exiting."; exit 1; }
done
#get command line options
while getopts "d:vq" o; do
case "${o}" in
d)
d=${OPTARG}
;;
*)
usage
;;
esac
done
shift $((OPTIND-1))
#used as wrapper, for switching between verbose and normal mode
redirect_cmd() {
if [ -z "${v}" ]; then
"$@" > /dev/null 2>&1
else
"$@"
fi
}
#user has to specify directory
if [ -z "${d}" ]; then
usage
fi
#the specified directory has to exist
if [ ! -d "$d" ]; then
echo "--> ! Error, directory $d does not exist or is not a directory!"
echo ""
usage
fi
CMAKEFILE="$d/CMakeLists.txt"
#sanity check, the user specified directory should contain a CMakeLists.txt file.
if [ ! -f "$CMAKEFILE" ]; then
echo "--> ! Error, could no locate CMakeLists.txt. Perhaps you specified a wrong directory?"
echo ""
usage
fi
#temporary directory for building other things (e.g. Latex or integrating snippets into examples)
MYTMPDIR_BUILD=`mktemp -d`
echo "--> Creating temporary directory $MYTMPDIR_BUILD"
#install traps, deleting the temporary directories when exiting
function finish {
rm -rf $MYTMPDIR_BUILD
}
trap finish EXIT
PROJECT_DIR_FULLPATH=`realpath ${d}`
echo "--> Calling rsync to temporary folder ($MYTMPDIR_BUILD)"
#doing a rsync to another temporary folder, which will be used to build things, like e.g. the tutorial pdf.
redirect_cmd rsync \
--archive --recursive ${d} $MYTMPDIR_BUILD
echo "--> Integrating Example Snippets"
REMEMBER_CUR_DIR=$(pwd)
EXAMPLES_DIR="$MYTMPDIR_BUILD/doc/examples_raw"
INTEGRATE_SNIPPETS_SCRIPT="insert_snippets.py"
EXAMPLES_TARGET_DIR="$PROJECT_DIR_FULLPATH/doc/examples"
if [ -f $EXAMPLES_DIR/$INTEGRATE_SNIPPETS_SCRIPT ]; then
cd "$EXAMPLES_DIR"
echo "---> Calling integrate script"
python insert_snippets.py
if [[ $? = 0 ]]; then
echo "success"
else
echo "failure: $?"
exit 1
fi
if [ ! -d $EXAMPLES_TARGET_DIR ]; then
echo "---> Examples target dir does not exist. Creating..."
redirect_cmd mkdir $EXAMPLES_TARGET_DIR
fi
if [ -d $EXAMPLES_TARGET_DIR ]; then
echo "---> Copy integrated examples back"
#The examples have been integrated. Copy the integrated source files back.
redirect_cmd rsync --delete --archive --recursive "$EXAMPLES_DIR/" "$EXAMPLES_TARGET_DIR/" \
--exclude=*snippet.h \
--exclude=*fragmented.h \
--exclude=*snippet.cc \
--exclude=*fragmented.cc \
--exclude=*$INTEGRATE_SNIPPETS_SCRIPT
# for commiting, we must be in the project dir
cd "$PROJECT_DIR_FULLPATH"
redirect_cmd git add -u $EXAMPLES_TARGET_DIR
redirect_cmd git add $EXAMPLES_TARGET_DIR
redirect_cmd git commit -m 'Integrating examples_raw to examples using merge_examples.sh script.'
fi
fi
cd "$REMEMBER_CUR_DIR"
echo "--> Done."
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or sign in to comment