Name Strings

SPV_INTEL_subgroup_buffer_prefetch

Contact

To report problems with this extension, please open a new issue at:

Contributors

  • Ben Ashbaugh, Intel

  • Greg Lueck, Intel

  • Andrzej Ratajewski, Intel

  • Grzegorz Wawiorko, Intel

Notice

Copyright (c) 2024 Intel Corporation. All rights reserved.

Status

  • Complete

Version

Last Modified Date

2024-05-30

Revision

1

Dependencies

This extension is written against the SPIR-V Specification, Version 1.6 Revision 3.

This extension requires SPIR-V 1.0.

This extension extends the SPV_INTEL_subgroups extension and interacts with the SPV_INTEL_cache_controls and SPV_KHR_untyped_pointers extensions.

Overview

This extension extends the SPV_INTEL_subgroups extension by adding support for prefetching data from buffers. The functionality added by this extension can improve the performance of some kernels by prefetching data into a cache, so future reads of the data are from a fast cache rather than slower memory.

Extension Name

To use this extension within a SPIR-V module, the appropriate OpExtension must be present in the module:

OpExtension "SPV_INTEL_subgroup_buffer_prefetch"

Modifications to the SPIR-V Specification, Version 1.6

Capabilities

Modify Section 3.31, Capability, adding rows to the Capability table:

Capability Implicitly Declares

6220

SubgroupBufferPrefetchINTEL

Instructions

Modify Section 3.49.21, Group and Subgroup Instructions, adding to the end of the list of instructions:

OpSubgroupBlockPrefetchINTEL

Prefetches one or more bytes from Ptr for each invocation in the subgroup as a block operation, where the number of bytes to prefetch per invocation is specified by NumBytes. The total number of bytes that is collectively prefetched is therefore NumBytes times SubgroupSize. Prefetching does not affect the functionality of a module but may change its performance characteristics.

Ptr must be a pointer into the CrossWorkgroup Storage Class. If it is an OpTypePointer pointer, it must point to an integer type scalar type.

NumBytes must be a 32-bit integer type scalar whose Signedness operand is 0, and must come from a constant instruction. The prefetch operation may be silently ignored unless NumBytes is a power of two between one and 64 bytes, inclusive.

If present, any Memory Operands must begin with a memory operand literal. If not present, it is the same as specifying the memory operand None.

Behavior is undefined unless Ptr and NumBytes are dynamically uniform for all invocations in the subgroup.

Capability:
SubgroupBufferPrefetchINTEL

3 + variable

6221

<id> Ptr

<id> NumBytes

Optional Memory Operands

Validation Rules

None.

Interactions with Other Extensions

If the SPV_INTEL_cache_controls extension is supported, the CacheControlLoadINTEL decoration may be used to control which cache levels the data will be prefetched into.

If the SPV_KHR_untyped_pointers extension is supported, the Ptr operand to OpSubgroupBlockPrefetchINTEL may be an OpTypeUntypedPointerKHR pointer.

Issues

  1. Do we also need to support prefetching data from images?

    RESOLVED: We do not currently have a use-case for prefetching data from images, so this extension will only support prefetching data from buffers. The extension is written so support for prefetching data from images could be added by a future extension, if desired.

  2. Should the prefetch specify the number of elements to prefetch or the number of bytes to prefetch?

    RESOLVED: The prefetch instruction will specify the number of bytes to prefetch, per invocation. Specifying the number of bytes rather than the number of components works best for opaque (also known as un-typed) pointers, where the type of data that the pointer points to is not necessarily known.

    For completeness, note that the LLVM prefetch intrinsic only specifies the address to prefetch and does not specify the number of elements or bytes to prefetch, but this probably is not what we want to do.

  3. Which storage classes (address spaces) should we support for block prefetches?

    RESOLVED: The OpenCL C prefetch function and the prefetch instruction in the OpenCL Extended Instruction Set only supports prefetching from the global address space, or equivalently, from the CrossWorkgroup storage class.

    The same is also true for the subgroup block reads added by cl_intel_subgroups and cl_intel_spirv_subgroups.

    Therefore, we will follow this precedent and only support prefetching from the CrossWorkgroup storage class, or equivalently, from the global address space.

  4. What type should be used for the amount of data to prefetch?

    RESOLVED: Because we only expect to see a small set of prefetch sizes we can use a 32-bit integer to specify the amount of data to prefetch. This is different than the OpenCL C prefetch function and the prefetch instruction in the OpenCL Extended Instruction Set, which use a size_t to describe the amount of data to prefetch, though it is sufficient for our use-cases and it is a simpler specification to use a 32-bit integer type unconditionally.

    We will document this requirement in this SPIR-V specification and not in a client API environment specification.

  5. Should the amount of data to prefetch be an <id> and hence have the ability to be specialized, or should it be a compile-time Literal instead?

    RESOLVED: We will specify the amount of data to prefetch as an <id>. Although there is no known use-case that requires specializing the amount of data to prefetch, specifying the amount of data to prefetch as an <id> allows this functionality, if necessary. This is also consistent with the number of elements to prefetch for the prefetch instruction in the OpenCL Extended Instruction Set.

  6. What should the behavior be if the amount of data to prefetch is excessively large or some other unexpected value?

    RESOLVED: If the amount of data to prefetch is unexpected or otherwise unsupported, it will silently be ignored. The expected amounts of data to prefetch will be: 1, 2, 4, 8, 16, 32, or 64 bytes per invocation. We do not expect to prefetch three-component vectors. We also do not expect to prefetch 16-component vectors, except for very small data types, so we do not expect to prefetch 128 bytes per invocation.

  7. Should we require Ptr to point to any specific type?

    RESOLVED: Yes, the pointer Ptr must point to an integer-type scalar. Passing a pointer to a concrete type provides alignment information that would not be present for a pointer to OpTypeVoid.

Revision History

Rev Date Author Changes

1

2024-05-30

Ben Ashbaugh

Initial version