Name Strings

SPV_NV_tensor_addressing

Contact

To report problems with this extension, please open a new issue at:

Contributors

  • Jeff Bolz, NVIDIA

  • Karthik Vaidyanathan, NVIDIA

Notice

Copyright (c) 2024 NVIDIA Corp.

Status

  • Complete

Version

Last Modified Date

2024-09-18

Revision

1

Dependencies

This extension is written against the SPIR-V Specification, Version 1.6, Revision 3, Unified.

This extension requires SPIR-V 1.6.

Overview

This extension adds tensor layout and view types which initially can be be used with SPV_NV_cooperative_matrix2. It is written as a separate extension to allow it to potentially be used with other extensions in the future.

Extension Name

To use this extension within a SPIR-V module, the following OpExtension must be present in the module:

OpExtension "SPV_NV_tensor_addressing"

Modifications to the SPIR-V Specification, Version 1.6

2.2 Terms

Add new terms to section 2.2.2 Types:

Tensor Layout: An opaque collection of values manipulated by OpTensorLayout instructions, and used for tensor addressing calculations when loading and storing cooperative matrices.

Tensor View: An opaque collection of values manipulated by OpTensorView instructions, and used for tensor addressing calculations when loading and storing cooperative matrices.

Add Tensor Layout and Tensor View to the list of Opaque Types.

3.31 Capabilities

Modify Section 3.31, "Capability", adding these rows to the Capability table:

Capability Enabling Capabilities

5439

TensorAddressingNV
Enables tensor layout and view instructions.

3.X Tensor Layout and View

Tensor layout and tensor view types are representations of the mapping between matrix coordinates and tensor memory layout. They each have a number of dimensions in the range [1,5], with dimension 0 being the outermost dimension and the last dimension being the innermost. These types have the following logical state:

    struct tensorLayoutNV<uint32_t Dim,
                          TensorClampMode Mode = TensorClampModeUndefined>
    {
      static constexpr uint32_t LDim = Dim;
      static constexpr TensorClampMode clampMode = Mode;

      uint32_t blockSize[LDim];
      uint32_t layoutDimension[LDim];
      uint32_t stride[LDim];
      int32_t offset[LDim];
      uint32_t span[LDim];
      uint32_t clampValue;
    };

    struct tensorViewNV<uint Dim, bool hasDimensions, uint32_t p0, ..., uint32_t p<Dim-1>>
    {
      static constexpr uint32_t VDim = Dim;
      static constexpr bool hasDim = hasDimensions;
      static constexpr uint32_t permutation[VDim] = {p0, ..., p<Dim-1>};

      uint32_t viewDimension[VDim];
      uint32_t viewStride[VDim];
      uint32_t clipRowOffset, clipRowSpan, clipColOffset, clipColSpan;
    };

A tensor layout represents the layout of values in memory (number of dimensions and size), along with a region being accessed (offset and span).

    ---------------------------------------------------------------------------
    |                           layoutDimension1                              |
    |                                                                         |
    |                                                                         |
    |                                                                         |
    |                                                                         |
    |                                                                         |
    |                                                                         |
    |                                                                         |
    |                        span1                                            |
    |                  -----------------                                      |
    |                  |               |                                      |
    |                  |               |                                      |
    |                  |     slice     | span0                                |
    |                  |               |                      layoutDimension0|
    |                  |               |                                      |
    |      offset1     |               |                                      |
    | ---------------> -----------------                                      |
    |                                                                         |
    |                  ^                                                      |
    |                  |                                                      |
    |                  |                                                      |
    |                  | offset0                                              |
    |                  |                                                      |
    |                  |                                                      |
    |                  |                                                      |
    |                  |                                                      |
    ---------------------------------------------------------------------------
    Figure: A 2D tensor layout, and a slice selecting a region within it.

A tensor view allows reinterpreting the dimensions of the region being accessed, including changing the number of dimensions, reordering the dimensions as they are loaded or stored, and clipping the region of the matrix that is loaded or stored. Often the span will have the same number of elements as the matrix, but in some more advanced uses that may not be the case.

How the addressing calculations are performed is left to other extensions to define.

Unlike some other ML APIs, tensor layouts and views only describe addressing calculations and never involve making copies of tensors. For this reason, the functionality is slightly more limited (e.g. there’s no way to slice, then permute, then slice again).

OpTensorLayout and OpTensorView instructions operate by copying existing object state and updating the requested state and returning that as a new result. Some of these instructions initialize multiple related pieces of state, setting some to common default values, so the order of the operations matters.

3.X Tensor Clamp Mode

New section in 3 "Binary Form".

Tensor Clamp Mode Enabling Capabilities

0

Undefined
Out of bounds accesses have undefined behavior.

1

Constant
Out of bounds loads return a constant value. Out of bounds stores are discarded.

2

ClampToEdge
Out of bounds load coordinates are clamped to the closest in-bounds coordinate. Out of bounds stores are discarded.

3

Repeat
Out of bounds load coordinates wrap.
c = c % dim;
Out of bounds stores are discarded.

4

RepeatMirrored
Out of bounds load coordinates wrap with mirroring.
c = c % (2*dim-2);
c = (c >= dim) ? (2*dim-2-c) : c;
Out of bounds stores are discarded.

3.49.6 Type-Declaration Instructions

OpTypeTensorLayoutNV

Dim is the number of dimensions in the tensor layout, and must be a constant instruction with scalar 32-bit integer type. The value must be greater than zero and less than or equal to 5.

ClampMode is a Tensor Clamp Mode which controls how out of bounds coordinates are treated, and must be a constant instruction with scalar 32-bit integer type.

Capability:
TensorAddressingNV

4

5370

Result <id>

<id>
Dim

<id>
ClampMode

OpTypeTensorViewNV

Dim is the number of dimensions in the tensor view, and must be a constant instruction with scalar 32-bit integer type. The value must be greater than zero and less than or equal to 5.

HasDimensions is a boolean indicating whether the view has its own dimensions (reinterpreting those from the tensor layout) or if the tensor layout’s dimensions are used. It must be a constant instruction with scalar boolean type.

p0 …​ p<Dim-1> are integer values indicating how the tensor’s coordinates are permuted. They each must be a constant instruction with scalar 32-bit integer type, and they must form a valid permutation of the range [0,Dim).

Capability:
TensorAddressingNV

5+variable

5371

Result <id>

<id>
Dim

<id>
HasDimensions

<id>, <id>, …​
p0,
…​
p<Dim-1>

3.X Tensor Layout and View Instructions

New section in 3 "Binary Form".

OpCreateTensorLayoutNV

Create a Tensor Layout of the requested type. The layoutDimension, stride, span, and offset elements are initialized to zero. The blockSize elements are initialized to one. clampValue is initialized to zero.

Result Type must be OpTypeTensorLayoutNV.

Capability:
TensorAddressingNV

3

5372

<id>
Result Type

Result <id>

OpTensorLayoutSetBlockSizeNV

Create a copy of TensorLayout, setting the blockSize elements to BlockSize<i>. When the blockSize is not 1, the strides are considered to be in blocks rather than in elements.

The number of BlockSize operands must match the dimension of Result Type.

The BlockSize operands must each be a scalar 32-bit integer type.

The type of TensorLayout must be Result Type.

Capability:
TensorAddressingNV

5+variable

5384

<id>
Result Type

Result <id>

<id>
TensorLayout

<id>, <id>, …​
BlockSize0,
…​
BlockSize<LDim-1>

OpTensorLayoutSetDimensionNV

Create a copy of TensorLayout, setting the layoutDimension and span elements to Dim<i>. Sets offset elements to zero. Sets stride[LDim-1] to 1 and sets stride[i] to stride[i+1] * ceiling(Dim<i+1> / blockSize[i+1]).

The number of Dim operands must match the dimension of Result Type.

The Dim operands must each be a scalar 32-bit integer type.

The type of TensorLayout must be Result Type.

Capability:
TensorAddressingNV

5+variable

5373

<id>
Result Type

Result <id>

<id>
TensorLayout

<id>, <id>, …​
Dim0,
…​
Dim<LDim-1>

OpTensorLayoutSetStrideNV

Create a copy of TensorLayout, setting the stride elements to Stride<i>.

Stride<i> must be at least Stride<i+1> * ceiling(layoutDimension[i+1] / blockSize[i+1]).

The Stride operands must each be a scalar 32-bit integer type.

The number of Stride operands must match the dimension of Result Type.

The type of TensorLayout must be Result Type.

Capability:
TensorAddressingNV

5+variable

5374

<id>
Result Type

Result <id>

<id>
TensorLayout

<id>, <id>, …​
Stride0,
…​
Stride<LDim-1>

OpTensorLayoutSliceNV

Create a copy of TensorLayout, adding Offset<i> to offset[i], and span[i] is set to Span<i>.

Stride<i> must be at least Stride<i+1> times layoutDimension[i+1].

The Offset and Span operands must each be a scalar 32-bit integer type.

The number of Offset and Span operands must each match the dimension of Result Type.

The type of TensorLayout must be Result Type.

Capability:
TensorAddressingNV

6+variable

5375

<id>
Result Type

Result <id>

<id>
TensorLayout

<id>, <id>, …​
Offset0, Span0,
…​
Offset<LDim-1>, Span<LDim-1>

OpTensorLayoutSetClampValueNV

Create a copy of TensorLayout, setting the clampValue to Value.

Value must be a scalar 32-bit integer type.

The type of TensorLayout must be Result Type.

Capability:
TensorAddressingNV

5

5376

<id>
Result Type

Result <id>

<id>
TensorLayout

<id>
Value

OpCreateTensorViewNV

Create a Tensor View of the requested type. The viewDimension and viewStride elements are initialized to zero. The clip values are initialized to offsets of 0, spans of 0xFFFFFFFF.

Result Type must be OpTypeTensorViewNV.

Capability:
TensorAddressingNV

3

5377

<id>
Result Type

Result <id>

OpTensorViewSetDimensionNV

Create a copy of TensorView, setting the viewDimension to Dim<i>. Sets viewStride[LDim-1] to 1 and sets viewStride[i] to the product of Dim<i+1> to Dim<LDim-1>.

The number of Dim operands must match the dimension of Result Type.

The Dim operands must each be a scalar 32-bit integer type.

The type of TensorView must be Result Type.

Capability:
TensorAddressingNV

5+variable

5378

<id>
Result Type

Result <id>

<id>
TensorView

<id>, <id>, …​
Dim0,
…​
Dim<N-1>

OpTensorViewSetStrideNV

Create a copy of TensorView, setting the viewStride to Stride<i>.

The number of Stride operands must match the dimension of Result Type.

The Stride operands must each be a scalar 32-bit integer type.

The type of TensorView must be Result Type.

Capability:
TensorAddressingNV

5+variable

5379

<id>
Result Type

Result <id>

<id>
TensorView

<id>, <id>, …​
Stride0,
…​
Stride<N-1>

OpTensorViewSetClipNV

Create a copy of TensorView, setting the clip elements to the corresponding parameters.

The Clip operands must each be a scalar 32-bit integer type.

The type of TensorView must be Result Type.

Capability:
TensorAddressingNV

8

5382

<id>
Result Type

Result <id>

<id>
TensorView

<id>
ClipRowOffset

<id>
ClipRowSpan

<id>
ClipColOffset

<id>
ClipColSpan

Issues

Revision History

Rev Date Author Changes

1

2024-09-18

Jeff Bolz

Initial revision of SPV_NV_tensor_addressing