USM allocations#
The USM offers a range of allocation functions, each designed to
accommodate potential future extensions by accepting a sycl::property_list
parameter. It's important to note that the core SYCL specification
currently lacks any specific definitions for USM allocation properties.
Some of the allocation functions support explicit alignment parameters,
and they behave similarly to std::aligned_alloc
by returning nullptr
when the alignment exceeds the implementation's capabilities. Some allocation
functions are templated on the allocated type T
, while others are not.
See also
SYCL Specification Section 4.8.3
C++ allocator interface#
SYCL defines an allocator class named sycl::usm_allocator
that
satisfies the C++ named requirement Allocator
. The AllocKind
template parameter can be either sycl::usm::alloc::host
or
sycl::usm::alloc::shared
, causing the allocator to make
either host USM allocations or shared USM allocations.
The sycl::usm_allocator
class has a template argument Alignment
,
which specifies the minimum alignment for memory that it allocates.
This alignment is used even if the allocator is rebound to a different
type. Memory allocated by this allocator is suitably aligned for objects
of its underlying value_type
or at the alignment specified by
Alignment
, whichever is greater.
sycl::usm_allocator
#
template <typename T, sycl::usm::alloc AllocKind, size_t Alignment = 0>
class usm_allocator;
Template parameters
|
Type of allocated element. |
|
Type of allocation. |
|
Alignment of the allocation. |
There is no specialization for sycl::usm::alloc::device
because an
Allocator
is required to allocate memory that is accessible on the host.
(constructors)#
usm_allocator(const sycl::context& syclContext,
const sycl::device& syclDevice,
const sycl::property_list& propList = {});
usm_allocator(const sycl::queue& syclQueue,
const sycl::property_list& propList = {});
Constructs a sycl::usm_allocator
instance that allocates USM for
the provided context and device presented with first constructor.
The second constructor present a simplified form where syclQueue
provides the sycl::device and sycl::context.
Exceptions
sycl::errc::invalid
If
syclContext
does not encapsulate thesycl::device
returned bydeviceSelector
.sycl::errc::feature_not_supported
If
AllocKind
issycl::usm::alloc::host
, the constructors throws a synchronoussycl::exception
with this error code if no device insyclContext
hassycl::aspect::usm_host_allocations
. ThesyclDevice
is ignored for this allocation kind.If
AllocKind
issycl::usm::alloc::shared
, the constructors throws a synchronoussycl::exception
with this error code if thesyclDevice
does not havesycl::aspect::usm_shared_allocations
.
See also
Allocation functions#
sycl::malloc_device
#
void* sycl::malloc_device(size_t numBytes,
const sycl::device& syclDevice,
const sycl::context& syclContext,
const sycl::property_list& propList = {});
template <typename T>
T* sycl::malloc_device(size_t count,
const sycl::device& syclDevice,
const sycl::context& syclContext,
const sycl::property_list& propList = {});
void* sycl::malloc_device(size_t numBytes,
const sycl::queue& syclQueue,
const sycl::property_list& propList = {});
template <typename T>
T* sycl::malloc_device(size_t count, const sycl::queue& syclQueue,
const sycl::property_list& propList = {});
void* sycl::aligned_alloc_device(size_t alignment, size_t numBytes,
const sycl::device& syclDevice,
const sycl::context& syclContext,
const sycl::property_list& propList = {});
template <typename T>
T* sycl::aligned_alloc_device(size_t alignment, size_t count,
const sycl::device& syclDevice,
const sycl::context& syclContext,
const sycl::property_list& propList = {});
void* sycl::aligned_alloc_device(size_t alignment, size_t numBytes,
const sycl::queue& syclQueue,
const sycl::property_list& propList = {});
template <typename T>
T* sycl::aligned_alloc_device(size_t alignment, size_t count,
const sycl::queue& syclQueue,
const sycl::property_list& propList = {})
Parameters
|
Alignment of allocated data. |
|
Allocation size in bytes. |
|
Number of elements. |
|
See sycl::device. |
|
See sycl::queue. |
|
See sycl::context. |
|
Optional property list. |
On successful USM device allocation, these functions return a pointer to
the newly allocated memory, which must eventually be deallocated with
sycl::free
in order to avoid a memory leak. If there are not enough
resources to allocate the requested memory, these functions
return nullptr
.
When the allocation size is zero bytes (numBytes
or count
is zero),
these functions behave in a manner consistent with C++ std::malloc
.
The value returned is unspecified in this case, and the returned pointer
may not be used to access storage. If this pointer is not null, it must be
passed to sycl::free
to avoid a memory leak.
See Events example 1 for usage.
sycl::malloc_host
#
void* sycl::malloc_host(size_t numBytes,
const sycl::context& syclContext,
const sycl::property_list& propList = {});
template <typename T>
T* sycl::malloc_host(size_t count,
const sycl::context& syclContext,
const sycl::property_list& propList = {});
void* sycl::malloc_host(size_t numBytes,
const sycl::queue& syclQueue,
const sycl::property_list& propList = {});
template <typename T>
T* sycl::malloc_host(size_t count,
const sycl::queue& syclQueue,
const sycl::property_list& propList = {});
void* sycl::aligned_alloc_host(size_t alignment, size_t numBytes,
const sycl::context& syclContext,
const sycl::property_list& propList = {});
template <typename T>
T* sycl::aligned_alloc_host(size_t alignment, size_t count,
const sycl::context& syclContext,
const sycl::property_list& propList = {});
void* sycl::aligned_alloc_host(size_t alignment, size_t numBytes,
const sycl::queue& syclQueue,
const sycl::property_list& propList = {});
template <typename T>
T* sycl::aligned_alloc_host(size_t alignment, size_t count,
const sycl::queue& syclQueue,
const sycl::property_list& propList = {});
Parameters
|
Alignment of allocated data. |
|
Allocation size in bytes. |
|
Number of elements. |
|
See sycl::device. |
|
See sycl::queue. |
|
See sycl::context. |
|
Optional property list. |
On successful USM host allocation, these functions return a pointer
to the newly allocated memory, which must eventually be deallocated
with sycl::free
in order to avoid a memory leak. If there are
not enough resources to allocate the requested memory, these functions
return nullptr
.
When the allocation size is zero bytes (numBytes
or count
is zero),
these functions behave in a manner consistent with C++ std::malloc
.
The value returned is unspecified in this case, and the returned pointer
may not be used to access storage. If this pointer is not null, it must be
passed to sycl::free
to avoid a memory leak.
See Example 2 for usage.
Parameterized allocation functions#
The functions below take a kind
parameter that specifies the type of
USM to allocate. When kind
is sycl::usm::alloc::device
, then the
allocation device must have sycl::aspect::usm_device_allocations
.
When kind is sycl::usm::alloc::host
, at least one device in the allocation
context must have sycl::aspect::usm_host_allocations
. When kind
is
sycl::usm::alloc::shared
, the allocation device must have
sycl::aspect::usm_shared_allocations
. If these requirements are violated,
the allocation function throws a synchronous sycl::exception
with the
sycl::errc::feature_not_supported
error code.
sycl::malloc
#
void* sycl::malloc(size_t numBytes,
const sycl::device& syclDevice,
const sycl::context& syclContext,
sycl::usm::alloc kind,
const sycl::property_list& propList = {});
template <typename T>
T* sycl::malloc(size_t count,
const sycl::device& syclDevice,
const sycl::context& syclContext,
sycl::usm::alloc kind,
const sycl::property_list& propList = {});
void* sycl::malloc(size_t numBytes,
const sycl::queue& syclQueue,
sycl::usm::alloc kind,
const sycl::property_list& propList = {});
template <typename T>
T* sycl::malloc(size_t count,
const sycl::queue& syclQueue,
sycl::usm::alloc kind,
const sycl::property_list& propList = {});
void* sycl::aligned_alloc(size_t alignment, size_t numBytes,
const sycl::device& syclDevice,
const sycl::context& syclContext,
sycl::usm::alloc kind,
const sycl::property_list& propList = {});
template <typename T>
T* sycl::aligned_alloc(size_t alignment, size_t count,
const sycl::device& syclDevice,
const sycl::context& syclContext,
sycl::usm::alloc kind,
const sycl::property_list& propList = {});
void* sycl::aligned_alloc(size_t alignment, size_t numBytes,
const sycl::queue& syclQueue,
sycl::usm::alloc kind,
const sycl::property_list& propList = {});
template <typename T>
T* sycl::aligned_alloc(size_t alignment, size_t count,
const sycl::queue& syclQueue,
sycl::usm::alloc kind,
const sycl::property_list& propList = {});
On successful USM shared allocation, these functions return a pointer to the
newly allocated memory, which must eventually be deallocated with
sycl::free
in order to avoid a memory leak. If there are not enough
resources to allocate the requested memory, these functions return nullptr
.
When the allocation size is zero bytes (numBytes
or count
is zero),
these functions behave in a manner consistent with C++ std::malloc
.
The value returned is unspecified in this case, and the returned pointer
may not be used to access storage. If this pointer is not null, it must be
passed to sycl::free
to avoid a memory leak.
Parameters
|
Alignment of allocated data. |
|
Allocation size in bytes. |
|
Number of elements. |
|
See sycl::device. |
|
See sycl::queue. |
|
See sycl::context. |
|
Kind of USM allocation. |
|
Optional property list. |
Memory deallocation functions#
sycl::free
#
void sycl::free(void* ptr, const sycl::context& syclContext);
void sycl::free(void* ptr, const sycl::queue& syclQueue);
Frees an allocation. The memory pointed to by ptr
must have been
allocated using one of the USM allocation routines. syclContext
must be the same sycl::context
that was used to allocate the memory.
The memory is freed without waiting for commands operating on
it to be completed. If commands that use this memory are
in-progress or are enqueued the behavior is undefined.
In the second form the context is automatically extracted from the
syclQueue
that is passed in, as a convenience.
See Example 2 for usage.
Example 1
1#include <vector>
2
3#include <sycl/sycl.hpp>
4
5constexpr int size = 10;
6
7int main() {
8 sycl::queue q;
9
10 // USM allocator for data of type int in shared memory
11 typedef sycl::usm_allocator<int, sycl::usm::alloc::shared> vec_alloc;
12 // Create allocator for device associated with q
13 vec_alloc myAlloc(q);
14 // Create std vectors with the allocator
15 std::vector<int, vec_alloc> a(size, myAlloc), b(size, myAlloc),
16 c(size, myAlloc);
17
18 // Get pointer to vector data for access in kernel
19 auto A = a.data();
20 auto B = b.data();
21 auto C = c.data();
22
23 for (int i = 0; i < size; i++) {
24 a[i] = i;
25 b[i] = i;
26 c[i] = i;
27 }
28
29 q.submit([&](sycl::handler &h) {
30 h.parallel_for(sycl::range<1>(size),
31 [=](sycl::id<1> idx) { C[idx] = A[idx] + B[idx]; });
32 }).wait();
33
34 for (int i = 0; i < size; i++)
35 std::cout << c[i] << std::endl;
36 return 0;
37}
Output:
0
2
4
6
8
10
12
14
16
18
Example 2
In this example two arrays are created, hostArray
and sharedArray
,
that are host and shared allocations, respectively. While both host
and shared allocations are directly accessible in host code, we only
initialize hostArray
here. Similarly, it can be directly accessed
inside the kernel, performing remote reads of the data. The runtime
ensures that sharedArray
is available on the device before the
kernel accesses it and that it is later available
by the host code, all without programmer intervention.
1#include <sycl/sycl.hpp>
2
3constexpr int N = 42;
4
5int main() {
6 sycl::property_list properties{sycl::property::queue::enable_profiling()};
7 auto q = sycl::queue(sycl::default_selector_v, properties);
8
9 std::cout
10 << " Platform: "
11 << q.get_device().get_platform().get_info<sycl::info::platform::name>()
12 << std::endl;
13
14 int *host_array = sycl::malloc_host<int>(N, q);
15 int *shared_array = sycl::malloc_shared<int>(N, q);
16
17 for (int i = 0; i < N; i++) {
18 // Initialize hostArray on host
19 host_array[i] = i;
20 }
21
22 q.submit([&](sycl::handler &h) {
23 h.parallel_for(N, [=](sycl::id<1> i) {
24 // access sharedArray and hostArray on device
25 shared_array[i] = host_array[i] + 1;
26 });
27 });
28 q.wait();
29
30 for (int i = 0; i < N; i++) {
31 // access sharedArray on host
32 host_array[i] = shared_array[i];
33 }
34
35 sycl::free(shared_array, q);
36 sycl::free(host_array, q);
37
38 return 0;
39}
Output:
Platform: Intel(R) Level-Zero