Name Strings
SPV_INTEL_subgroup_buffer_prefetch
Contact
To report problems with this extension, please open a new issue at:
Contributors
-
Ben Ashbaugh, Intel
-
Greg Lueck, Intel
-
Andrzej Ratajewski, Intel
-
Grzegorz Wawiorko, Intel
Notice
Copyright (c) 2024 Intel Corporation. All rights reserved.
Status
-
Complete
Version
Last Modified Date |
2024-05-30 |
Revision |
1 |
Dependencies
This extension is written against the SPIR-V Specification, Version 1.6 Revision 3.
This extension requires SPIR-V 1.0.
This extension extends the SPV_INTEL_subgroups extension and interacts with the SPV_INTEL_cache_controls and SPV_KHR_untyped_pointers extensions.
Overview
This extension extends the SPV_INTEL_subgroups extension by adding support for prefetching data from buffers. The functionality added by this extension can improve the performance of some kernels by prefetching data into a cache, so future reads of the data are from a fast cache rather than slower memory.
Extension Name
To use this extension within a SPIR-V module, the appropriate OpExtension must be present in the module:
OpExtension "SPV_INTEL_subgroup_buffer_prefetch"
Modifications to the SPIR-V Specification, Version 1.6
Capabilities
Modify Section 3.31, Capability, adding rows to the Capability table:
Capability | Implicitly Declares | |
---|---|---|
6220 |
SubgroupBufferPrefetchINTEL |
Instructions
Modify Section 3.49.21, Group and Subgroup Instructions, adding to the end of the list of instructions:
Validation Rules
None.
Interactions with Other Extensions
If the SPV_INTEL_cache_controls extension is supported, the CacheControlLoadINTEL decoration may be used to control which cache levels the data will be prefetched into.
If the SPV_KHR_untyped_pointers extension is supported, the Ptr operand to OpSubgroupBlockPrefetchINTEL may be an OpTypeUntypedPointerKHR pointer.
Issues
-
Do we also need to support prefetching data from images?
RESOLVED: We do not currently have a use-case for prefetching data from images, so this extension will only support prefetching data from buffers. The extension is written so support for prefetching data from images could be added by a future extension, if desired.
-
Should the prefetch specify the number of elements to prefetch or the number of bytes to prefetch?
RESOLVED: The prefetch instruction will specify the number of bytes to prefetch, per invocation. Specifying the number of bytes rather than the number of components works best for opaque (also known as un-typed) pointers, where the type of data that the pointer points to is not necessarily known.
For completeness, note that the LLVM prefetch intrinsic only specifies the address to prefetch and does not specify the number of elements or bytes to prefetch, but this probably is not what we want to do.
-
Which storage classes (address spaces) should we support for block prefetches?
RESOLVED: The OpenCL C
prefetch
function and theprefetch
instruction in the OpenCL Extended Instruction Set only supports prefetching from theglobal
address space, or equivalently, from the CrossWorkgroup storage class.The same is also true for the subgroup block reads added by
cl_intel_subgroups
andcl_intel_spirv_subgroups
.Therefore, we will follow this precedent and only support prefetching from the CrossWorkgroup storage class, or equivalently, from the
global
address space. -
What type should be used for the amount of data to prefetch?
RESOLVED: Because we only expect to see a small set of prefetch sizes we can use a 32-bit integer to specify the amount of data to prefetch. This is different than the OpenCL C
prefetch
function and theprefetch
instruction in the OpenCL Extended Instruction Set, which use asize_t
to describe the amount of data to prefetch, though it is sufficient for our use-cases and it is a simpler specification to use a 32-bit integer type unconditionally.We will document this requirement in this SPIR-V specification and not in a client API environment specification.
-
Should the amount of data to prefetch be an <id> and hence have the ability to be specialized, or should it be a compile-time Literal instead?
RESOLVED: We will specify the amount of data to prefetch as an <id>. Although there is no known use-case that requires specializing the amount of data to prefetch, specifying the amount of data to prefetch as an <id> allows this functionality, if necessary. This is also consistent with the number of elements to prefetch for the
prefetch
instruction in the OpenCL Extended Instruction Set. -
What should the behavior be if the amount of data to prefetch is excessively large or some other unexpected value?
RESOLVED: If the amount of data to prefetch is unexpected or otherwise unsupported, it will silently be ignored. The expected amounts of data to prefetch will be: 1, 2, 4, 8, 16, 32, or 64 bytes per invocation. We do not expect to prefetch three-component vectors. We also do not expect to prefetch 16-component vectors, except for very small data types, so we do not expect to prefetch 128 bytes per invocation.
-
Should we require Ptr to point to any specific type?
RESOLVED: Yes, the pointer Ptr must point to an integer-type scalar. Passing a pointer to a concrete type provides alignment information that would not be present for a pointer to OpTypeVoid.
Revision History
Rev | Date | Author | Changes |
---|---|---|---|
1 |
2024-05-30 |
Ben Ashbaugh |
Initial version |