sycl::group#

template <int Dimensions = 1>
class group;

The sycl::group encapsulates all functionality required to represent a particular work-group within a parallel execution. It is not user-constructible.

The local range stored in the group class is provided either by the programmer, when it is passed as an optional parameter to parallel_for_work_group, or by the runtime system when it selects the optimal work-group size. This allows the developer to always know how many work-items are in each executing work-group, even through the abstracted iteration range of the parallel_for_work_item.

Warning

Based on developer and implementation feedback, the hierarchical data parallel kernel feature is undergoing improvements to better align with the frameworks and patterns prevalent in modern programming.

As this is a key part of the SYCL API and we expect to make changes to it, we temporarily recommend that new codes refrain from using this feature until the new API is finished in a near-future version of the SYCL specification, when full use of the updated feature will be recommended for use in new code.

Existing codes using this feature will of course be supported by conformant implementations of this specification.

See also

SYCL Specification Section 4.9.1.7

Member functions#

get_group_id#

sycl::id<Dimensions> get_group_id() const;

Return a sycl::id representing the index of the work-group within the global sycl::nd_range for every dimension. Since the work-items in a work-group have a defined position within the global sycl::nd_range, the returned group id can be used along with the local id to uniquely identify the work-item in the global sycl::nd_range.

size_t get_group_id(int dimension) const;

Return the same value as get_group_id()[dimension].

get_local_id#

sycl::id<Dimensions> get_local_id() const;

Return a sycl::id representing the calling work-item’s position within the work-group.

It is undefined behavior for this member function to be invoked from within a parallel_for_work_item context.

size_t get_local_id(int dimension) const;

Return the same value as get_local_id()[dimension].

It is undefined behavior for this member function to be invoked from within a parallel_for_work_item context.

get_local_range#

sycl::range<Dimensions> get_local_range() const;

Return a sycl::range representing all dimensions of the local range. This local range may have been provided by the programmer, or chosen by the SYCL runtime.

size_t get_local_range(int dimension) const;

Return the same value as get_local_range()[dimension].

get_group_range#

sycl::range<Dimensions> get_group_range() const;

Return a sycl::range representing the number of work-groups in the sycl::nd_range.

size_t get_group_range(int dimension) const;

Return the same value as get_group_range()[dimension].

operator[]#

size_t operator[](int dimension) const;

Return the same value as get_group_id(dimension).

get_max_local_range#

sycl::range<Dimensions> get_max_local_range() const;

Return a sycl::range representing the maximum number of work-items in any work-group in the sycl::nd_range.

get_group_linear_id#

size_t get_group_linear_id() const;

Get a linearized version of the work-group id. Calculating a linear work-group id from a multi-dimensional index follows SYCL Specification Section 3.11.1.

get_group_linear_range#

size_t get_group_linear_range() const;

Return the total number of work-groups in the sycl::nd_range.

get_local_linear_id#

size_t get_local_linear_id() const;

Get a linearized version of the calling work-item’s local id. Calculating a linear local id from a multi-dimensional index follows SYCL Specification Section 3.11.1.

It is undefined behavior for this member function to be invoked from within a parallel_for_work_item context.

get_local_linear_range#

size_t get_local_linear_range() const;

Return the total number of work-items in the work-group.

leader#

bool leader() const;

Return true for exactly one work-item in the work-group, if the calling work-item is the leader of the work-group, and false for all other work-items in the work-group.

The leader of the work-group is determined during construction of the work-group, and is invariant for the lifetime of the work-group. The leader of the work-group is guaranteed to be the work-item with a local id of 0.

parallel_for_work_item#

template <typename WorkItemFunctionT>
void parallel_for_work_item(const WorkItemFunctionT& func) const;

Launch the work-items for this work-group.

func is a function object type with a public member function void F::operator()(sycl::h_item<Dimensions>) representing the work-item computation.

This member function can only be invoked within a parallel_for_work_group context. It is undefined behavior for this member function to be invoked from within the parallel_for_work_group form that does not define work-group size, because then the number of work-items that should execute the code is not defined. It is expected that this form of parallel_for_work_item is invoked within the parallel_for_work_group form that specifies the size of a work-group.

template <typename WorkItemFunctionT>
void parallel_for_work_item(sycl::range<Dimensions> logicalRange,
                            const WorkItemFunctionT& func) const;

Launch the work-items for this work-group using a logical local range. The function object func is executed as if the kernel were invoked with logicalRange as the local range. This new local range is emulated and may not map one-to-one with the physical range.

logicalRange is the new local range to be used. This range can be smaller or larger than the one used to invoke the kernel. func is a function object type with a public member function void F::operator()(sycl::h_item<Dimensions>) representing the work-item computation.

Note that the logical range does not need to be uniform across all work-groups in a kernel. For example the logical range may depend on a work-group varying query (e.g. sycl::group::get_linear_id), such that different work-groups in the same kernel invocation execute different logical range sizes.

This member function can only be invoked within a parallel_for_work_group context.

async_work_group_copy#

template <typename DataT>
sycl::device_event async_work_group_copy(sycl::local_ptr<DataT> dest,
                                         sycl::global_ptr<DataT> src,
                                         size_t numElements) const;

template <typename DataT>
sycl::device_event async_work_group_copy(sycl::global_ptr<DataT> dest,
                                         sycl::local_ptr<DataT> src,
                                         size_t numElements) const;

template <typename DataT>
sycl::device_event async_work_group_copy(sycl::local_ptr<DataT> dest,
                                         sycl::global_ptr<DataT> src,
                                         size_t numElements, size_t srcStride) const;

template <typename DataT>
sycl::device_event async_work_group_copy(sycl::global_ptr<DataT> dest,
                                         sycl::local_ptr<DataT> src,
                                         size_t numElements, size_t destStride) const;

All four async_work_group_copy functions listed above are deprecated in SYCL 2020.

Has the same effect as the overload taking sycl::decorated_local_ptr and sycl::decorated_global_ptr except that the dest and src parameters are sycl::multi_ptr with sycl::access::decorated::legacy.

template <typename DestDataT, typename SrcDataT>
sycl::device_event async_work_group_copy(sycl::decorated_global_ptr<DestDataT> dest,
                                         sycl::decorated_local_ptr<SrcDataT> src,
                                         size_t numElements) const;

template <typename DestDataT, typename SrcDataT>
sycl::device_event async_work_group_copy(sycl::decorated_local_ptr<DestDataT> dest,
                                         sycl::decorated_global_ptr<SrcDataT> src,
                                         size_t numElements, size_t srcStride) const;

template <typename DestDataT, SrcDataT>
sycl::device_event async_work_group_copy(sycl::decorated_global_ptr<DestDataT> dest,
                                         sycl::decorated_local_ptr<SrcDataT> src,
                                         size_t numElements, size_t destStride) const;

All three functions above are available only when: (std::is_same_v<DestDataT, std::remove_const_t<SrcDataT>> == true)

Permitted types for DataT are all scalar and vector types. Asynchronously copies a number of elements specified by numElements from the source pointer src to destination pointer dest with a destination stride specified by destStride and returns a sycl::device_event which can be used to wait on the completion of the copy.

wait_for#

template <typename... EventTN> void wait_for(EventTN... events) const;

Permitted type for EventTN is sycl::device_event. Waits for the asynchronous operations associated with each sycl::device_event to complete.