USM Basic Concept#
Unified Shared Memory (USM) provides a pointer-based alternative to the buffer programming model.
USM enables:
Easier integration into existing code bases by representing allocations as pointers rather than buffers, with full support for pointer arithmetic into allocations.
Fine-grain control over ownership and accessibility of allocations, to optimally choose between performance and programmer convenience.
A simpler programming model, by automatically migrating some allocations between SYCL devices and the host.
See also
SYCL Specification Section 4.8
Unified addressing#
Unified Addressing guarantees that all devices will use a unified address space.
Pointer values in the unified address space will always refer to the same location in memory. The unified address space encompasses the host and one or more devices.
Note that this does not require addresses in the unified address space to be accessible on all devices, just that pointer values will be consistent.
Kinds of USM#
USM builds upon Unified Addressing to define a shared
address space where pointer values in this space always
refer to the same location in memory. USM defines three
types of memory allocations: host
, device
, and shared
.
The following enum
is used to refer to the different
types of allocations inside of a SYCL program:
namespace sycl::usm {
enum class alloc : /* unspecified */ {
host,
device,
shared,
unknown
};
} // namespace sycl::usm
USM allocation type |
Description |
---|---|
|
Allocations in host memory that are accessible by a device (in addition to the host). |
|
Allocations in device memory that are not accessible by the host. |
|
Allocations in shared memory that are accessible by both host and device. |
USM is an optional feature which may not be supported by all devices, and devices that support USM may not support all types of USM allocation.
A SYCL application can use the sycl::device::has()
function to determine the level of USM support for a device
(See Device aspects).
See also
SYCL Specification Section 4.8.2
USM accesses must be within the sycl::context
used for allocation#
Each USM allocation has an associated SYCL context, and any access to that memory must use the same context. Specifically, any SYCL kernel function that dereferences a pointer to a USM allocation must be submitted to a sycl::queue that was constructed with the same context that was used to allocate that memory. The explicit memory operation commands that take USM pointers have a similar restriction.
There are no similar restrictions for dereferencing a USM pointer in a host task. This is legal regardless of which queue the host task was submitted to so long as the USM pointer is accessible on the host.
Warning
Each type of USM allocation has different rules for where that memory is accessible. Attempting to dereference a USM pointer on the host or on a device in violation of these rules results in undefined behavior.
Passing a USM pointer to one of the explicit memory functions where the pointer is not accessible to the device generally results in undefined behavior.
Host allocations#
Host allocations allow devices to directly read and write host memory inside of a kernel.
Host allocations must also be obtained using SYCL routines instead of system allocation routines. While a device may remotely read and write a host allocation, the allocation does not migrate to the device - it remains in host memory.
Warning
Users should take care to properly synchronize access to host allocations between host execution and kernels.
The total size of host allocations will be limited by the amount of pinnable-memory on the host on most systems.
Support for host allocations on a specific device can be queried
through sycl::aspect::usm_host_allocations
.
Support for atomic modification of host allocations on a specific
device can be queried through sycl::aspect::usm_atomic_host_allocations
.
Device allocations#
Device allocations are used for explicitly managing device memory.
Device allocations are obtained through SYCL device USM allocation
routines instead of system allocation routines like std::malloc
or C++ new
.
Device allocations are not accessible on the host, but the pointer values remain consistent on account of Unified addressing.
With device allocations data is directly allocated in the device memory and it must be explicitly copied between the host and a device.
The size of device allocations will be limited by the amount of memory in a device.
Support for device allocations on a specific
device can be queried through sycl::aspect::usm_device_allocations
.
The member functions to copy and initialize data are found in
sycl::queue shortcut functions and
sycl::handler explicit memory operations,
and these functions may be used on device allocations if a device
supports sycl::aspect::usm_device_allocations
.
Example
See usm-example-2.
System allocations#
In the most capable systems, users do not need to use SYCL USM
allocation functions to create shared allocations. The system
allocator (malloc
/new
) may instead be used.
Likewise, std::free
and delete
are used instead of sycl::free
.
Users may query the device to determine if
system allocations are supported for use on the device,
through sycl::aspect::usm_system_allocations
.
Note
Host and device allocations are unaffected by this change and must still be allocated using their respective USM functions in order to guarantee their behavior.
Example 1#
Example of how shared memory can be used between host and device:
1#include <sycl/sycl.hpp>
2
3#include <iostream>
4
5int main() {
6 // Create a default queue to enqueue work to the default device
7 sycl::queue myQueue;
8
9 // Allocate shared memory bound to the device and context associated to the
10 // queue Replacing malloc_shared with malloc_host would yield a correct
11 // program that allocated device-visible memory on the host.
12 int *data = sycl::malloc_shared<int>(1024, myQueue);
13
14 myQueue.parallel_for(1024, [=](sycl::id<1> idx) {
15 // Initialize each buffer element with its own rank number starting at 0
16 data[idx] = idx;
17 }); // End of the kernel function
18
19 // Explicitly wait for kernel execution since there is no accessor involved
20 myQueue.wait();
21
22 // Print result
23 for (int i = 0; i < 1024; i++)
24 std::cout << "data[" << i << "] = " << data[i] << std::endl;
25
26 return 0;
27}
Example 2#
Example of using less capable device memory, which requires an explicit copy between the device and the host:
1#include <sycl/sycl.hpp>
2
3#include <iostream>
4
5int main() {
6 // Create a default queue to enqueue work to the default device
7 sycl::queue myQueue;
8
9 // Allocate shared memory bound to the device and context associated to the
10 // queue
11 int *data = sycl::malloc_device<int>(1024, myQueue);
12
13 myQueue.parallel_for(1024, [=](sycl::id<1> idx) {
14 // Initialize each buffer element with its own rank number starting at 0
15 data[idx] = idx;
16 }); // End of the kernel function
17
18 // Explicitly wait for kernel execution since there is no accessor involved
19 myQueue.wait();
20
21 // Create an array to receive the device content
22 int hostData[1024];
23 // Receive the content from the device
24 myQueue.memcpy(hostData, data, 1024 * sizeof(int));
25 // Wait for the copy to complete
26 myQueue.wait();
27
28 // Print result
29 for (int i = 0; i < 1024; i++)
30 std::cout << "hostData[" << i << "] = " << hostData[i] << std::endl;
31
32 return 0;
33}