sycl::private_memory
#
template <typename T, int Dimensions = 1>
class private_memory;
Warning
Based on developer and implementation feedback, the hierarchical data parallel kernel feature is undergoing improvements to better align with the frameworks and patterns prevalent in modern programming.
As this is a key part of the SYCL API and we expect to make changes to it, we temporarily recommend that new codes refrain from using this feature until the new API is finished in a near-future version of the SYCL specification, when full use of the updated feature will be recommended for use in new code.
Existing codes using this feature will of course be supported by conformant implementations of this specification.
The sycl::private_memory
class simply constructs private data
for a given group across the entire group. The id
of the
current work-item is passed to any access to grab the correct data.
It is used to wrap the data, which guarantees use of private per-work-item memory in the hierarchical parallel_for invoke.
Private memory is allocated per underlying work-item, not per
iteration of the parallel_for_work_item
loop.
The number of instances of a private memory object is
only under direct control if a work-group size is passed
to the parallel_for_work_group
call.
If the underlying work-group size is chosen by the runtime,
the number of private memory instances is opaque to the program.
Explicit private memory declarations should therefore be used
with care and with a full understanding of which instances of
a parallel_for_work_item
loop will share the same
underlying variable.
(constructors)#
private_memory(const sycl::group<Dimensions>&);
Place an object of type T
in the underlying private
memory of each work-items. The type T
must be default
constructible.
The underlying constructor will be called for each work-item.
Member functions#
T& operator()(const sycl::h_item<Dimensions>& id)
Retrieve a reference to the object for the work-items.
Example 1#
Example of usage of the sycl::private_memory
:
1#include <sycl/sycl.hpp>
2
3#include <iostream>
4
5int main() {
6 sycl::queue myQueue;
7
8 myQueue.submit([&](sycl::handler &cgh) {
9 sycl::stream out(256, 256, cgh);
10
11 // Issue 8 work-groups of 8 work-items each
12 cgh.parallel_for_work_group(
13 sycl::range<3>(2, 2, 2), sycl::range<3>(2, 2, 2),
14 [=](sycl::group<3> myGroup) {
15 //[workgroup code]
16
17 // this variable is shared between workitems
18 int myLocal;
19
20 // this variable will be instantiated for each work-item separately
21 sycl::private_memory<int, 3> myPrivate(myGroup);
22
23 // Issue parallel work-items. The number issued per work-group is
24 // determined by the work-group size range of parallel_for_work_group.
25 // In this case, 8 work-items will execute the parallel_for_work_item
26 // body for each of the 8 work-groups, resulting in 64 executions
27 // globally/total.
28 myGroup.parallel_for_work_item([&](sycl::h_item<3> myItem) {
29 //[work-item code]
30 sycl::id<3> myItemId = myItem.get_local_id();
31 myPrivate(myItem) = myItemId[0] + myItemId[1] + myItemId[2];
32 });
33
34 // Implicit work-group barrier
35
36 // Carry private value across loops
37 myGroup.parallel_for_work_item([&](sycl::h_item<3> myItem) {
38 //[work-item code]
39 out << myPrivate(myItem) << ' ';
40 });
41 out << sycl::stream_manipulator::endl;
42
43 //[workgroup code]
44 });
45 });
46}
Output example:
0 1 1 2 1 2 2 3
0 1 1 2 1 2 2 3
0 1 1 2 1 2 2 3
0 1 1 2 1 2 2 3
0 1 1 2 1 2 2 3
0 1 1 2 1 2 2 3
0 1 1 2 1 2 2 3
0 1 1 2 1 2 2 3