Group functions#
SYCL provides a number of functions that expose functionality tied to groups of work-items (such as group barriers and collective operations). These group functions act as synchronization points and must be encountered in converged control flow by all work-items in the group. If one work-item in a group calls a group function, then all work-items in that group must call exactly the same function under the same set of conditions --- calling the same function under different conditions (e.g. in different iterations of a loop, or different branches of a conditional statement) results in undefined behavior. Additionally, restrictions may be placed on the arguments passed to each function in order to ensure that all work-items in the group agree on the operation that is being performed. Any such restrictions on the arguments passed to a function are defined within the descriptions of those functions. Violating these restrictions results in undefined behavior.
Using a group function inside of a kernel may introduce additional
limits on the resources available to user code inside the same kernel.
The behavior of these limits is implementation-defined, but must be
reflected by calls to kernel querying functions
(such as sycl::kernel::get_info
) as described
in SYCL Specification Section 4.11.13.1.
It is undefined behavior for any group function to be invoked within
a sycl::parallel_for_work_group
or sycl::parallel_for_work_item
context.
Group type trait
#
namespace sycl {
template <class T> struct is_group;
template <class T> inline constexpr bool is_group_v = is_group<T>::value;
} // namespace sycl
The sycl::is_group
type trait is used to determine which types
of groups are supported by group functions, and to control when group
functions participate in overload resolution.
sycl::is_group<T>
inherits from std::true_type
if T is the type of
a standard SYCL group (sycl::group
or sycl::sub_group
) and
it inherits from std::false_type
otherwise. A SYCL implementation may
introduce additional specializations of sycl::is_group<T>
for
implementation-defined group types, if the interface of those types supports
all member functions and static members common to the sycl::group
and sycl::sub_group
classes.
sycl::group_broadcast
#
The sycl::group_broadcast
function communicates a value held by
one work-item to all other work-items in the group.
template <typename Group, typename T>
T sycl::group_broadcast(Group g, T x); // (1)
template <typename Group, typename T>
T sycl::group_broadcast(Group g, T x, Group::linear_id_type local_linear_id); // (2)
template <typename Group, typename T>
T group_broadcast(Group g, T x, Group::id_type local_id); // (3)
1.Constraints: Available only if sycl::is_group_v<std::decay_t<Group>>
is true and T
is a trivially copyable type.
Returns: The value of x
from the work-item with the smallest
linear id within group g
.
2.Constraints: Available only if sycl::is_group_v<std::decay_t<Group>>
is true and T
is a trivially copyable type.
Preconditions: local_linear_id
must be the same for all work-items in
the group and must be in the range [0, sycl::get_local_linear_range())
.
Returns: The value of x
from the work-item with the specified
linear id within group g
.
3.Constraints: Available only if sycl::is_group_v<std::decay_t<Group>>
is true and T
is a trivially copyable type.
Preconditions: local_id
must be the same for all work-items in the group,
and its dimensionality must match the dimensionality of the group. The value
of local_id
in each dimension must be greater than or equal to 0 and less
than the value of sycl::get_local_range()
in the same dimension.
Returns: The value of x
from the work-item with the
specified id within group g
.
sycl::group_barrier
#
The sycl::group_barrier
function synchronizes all work-items in a
group, using a group barrier.
template <typename Group>
void sycl::group_barrier(Group g,
sycl::memory_scope fence_scope = Group::fence_scope); // (1)
1.Constraints: Available only if
sycl::is_group_v<std::decay_t<Group>>
is true.
Effects: Synchronizes all work-items in group g
. The current work-item
will wait at the barrier until all work-items in group g
have reached
the barrier. In addition, the barrier performs mem-fence
operations
ensuring that memory accesses issued before the barrier are not re-ordered
with those issued after the barrier: all work-items in group g
execute
a release fence prior to synchronizing at the barrier, all work-items in
group g
execute an acquire fence afterwards, and there is an implicit
synchronization of these fences as if provided by an explicit atomic
operation on an atomic object.
By default, the scope of these fences is set to the narrowest scope
including all work-items in group g
(as reported by
Group::fence_scope
). This scope may be optionally overridden with
a wider scope, specified by the fence_scope
argument.