Name Strings

SPV_KHR_integer_dot_product

Contact

To report problems with this extension, please open a new issue at:

Contributors

  • Kévin Petit, Arm Ltd.

  • Ben Ashbaugh, Intel

  • Graeme Leese, Broadcom

  • Robert Quill, Imagination Technologies

  • Jeff Bolz, Nvidia

  • Raun Krisch, Samsung

  • Simon Waters, Samsung

  • John Kessenich, Google

  • David Neto, Google

  • Nicolai Hähnle, AMD

  • Ruihao Zhang, Qualcomm

  • Stuart Brady, Arm Ltd.

  • Alan Baker, Google

Notice

Copyright (c) 2019 The Khronos Group Inc. Copyright terms at http://www.khronos.org/registry/speccopyright.html

Status

  • Approved by the SPIR-V Working Group: 2020-05-20

  • Approved by the Khronos Board of Promoters: 2020-07-17

Version

Last Modified Date

2021-09-08

Revision

3

Dependencies

This extension is written against the SPIR-V Specification, Version 1.5 Revision 2.

This extension requires SPIR-V 1.0.

Overview

This extension introduces support for dot product operations on integer vectors with optional accumulation. The specific types accepted as inputs are constrained by capabilities of which this extension introduces three:

  • Generic support for all input vector types

  • Support 4-component vectors of 8 bit integers (several implementers only want to support this case)

  • Support 4-component vectors of 8 bit integers packed into 32-bit integers (for devices that don’t have generic Int8 support)

This extension introduces two groups of three instructions each. Instructions of one of the groups perform simple dot product operations on input vectors with signed, unsigned or mixed-signedness (one signed, one unsigned) components. Instructions of the other group also perform a saturating addition of the dot product result with an accumulator they accept as an operand.

Extension Name

To use this extension within a SPIR-V module, the following OpExtension must be present in the module:

OpExtension "SPV_KHR_integer_dot_product"

Modifications to the SPIR-V Specification, Version 1.5

Capabilities

Modify Section 3.31, "Capability", adding these rows to the Capability table (these capabilities enable specific input types):

Capability Implicitly declares

6019

DotProductKHR
Uses dot product instructions

6016

DotProductInputAllKHR
Uses vector of any integer type as input to the dot product instructions

6017

DotProductInput4x8BitKHR
Uses vectors of four components of 8-bit integer type as inputs to the dot product instructions

Int8

6018

DotProductInput4x8BitPackedKHR
Uses 32-bit integer scalars packing 4-component vectors of 8-bit integers as inputs to the dot product instructions

Packed Vector Format

New section under 3. Binary Form.

Specify how to interpret scalar integers as vectors.

Packed Vector Format Enabling Capabilities

0x0

PackedVectorFormat4x8BitKHR
Interpret 32-bit scalar integer operands as vectors of four 8-bit components. Vector components follow byte significance order with the lowest-numbered component stored in the least significant byte.

Instructions

Add the following new instructions:

OpSDotKHR

Signed integer dot product of Vector 1 and Vector 2.

Result Type must be an integer type whose Width must be greater than or equal to that of the components of Vector 1 and Vector 2.

Vector 1 and Vector 2 must have the same type.

Vector 1 and Vector 2 must be either 32-bit integers (enabled by DotProductInput4x8BitPackedKHR) or vectors of integer type (enabled by DotProductInput4x8BitKHR or DotProductInputAllKHR).

When Vector 1 and Vector 2 are scalar integer types, Packed Vector Format must be specified to select how the integers are to be interpreted as vectors.

All components of the input vectors are sign-extended to the bit width of the result’s type. The sign-extended input vectors are then multiplied component-wise and all components of the vector resulting from the component-wise multiplication are added together. The resulting value will equal the low-order N bits of the correct result R, where N is the result width and R is computed with enough precision to avoid overflow and underflow.

Capability:
DotProductKHR

5+

4450

<id> Result Type

Result <id>

<id> Vector 1

<id> Vector 2

Optional
Packed Vector Format

OpUDotKHR

Unsigned integer dot product of Vector 1 and Vector 2.

Result Type must be an integer type with Signedness of 0 whose Width must be greater than or equal to that of the components of Vector 1 and Vector 2.

Vector 1 and Vector 2 must have the same type.

Vector 1 and Vector 2 must be either 32-bit integers (enabled by DotProductInput4x8BitPackedKHR) or vectors of integer type with Signedness of 0 (enabled by DotProductInput4x8BitKHR or DotProductInputAllKHR).

When Vector 1 and Vector 2 are scalar integer types, Packed Vector Format must be specified to select how the integers are to be interpreted as vectors.

All components of the input vectors are zero-extended to the bit width of the result’s type. The zero-extended input vectors are then multiplied component-wise and all components of the vector resulting from the component-wise multiplication are added together. The resulting value will equal the low-order N bits of the correct result R, where N is the result width and R is computed with enough precision to avoid overflow and underflow.

Capability:
DotProductKHR

5+

4451

<id> Result Type

Result <id>

<id> Vector 1

<id> Vector 2

Optional
Packed Vector Format

OpSUDotKHR

Mixed-signedness integer dot product of Vector 1 and Vector 2. Components of Vector 1 are treated as signed, components of Vector 2 are treated as unsigned.

Result Type must be an integer type whose Width must be greater than or equal to that of the components of Vector 1 and Vector 2.

Vector 1 and Vector 2 must be either 32-bit integers (enabled by DotProductInput4x8BitPackedKHR) or vectors of integer type with the same number of components and same component Width (enabled by DotProductInput4x8BitKHR or DotProductInputAllKHR). When Vector 1 and Vector 2 are vectors, the components of Vector 2 must have a Signedness of 0.

When Vector 1 and Vector 2 are scalar integer types, Packed Vector Format must be specified to select how the integers are to be interpreted as vectors.

All components of Vector 1 are sign-extended to the bit width of the result’s type. All components of Vector 2 are zero-extended to the bit width of the result’s type. The sign- or zero-extended input vectors are then multiplied component-wise and all components of the vector resulting from the component-wise multiplication are added together. The resulting value will equal the low-order N bits of the correct result R, where N is the result width and R is computed with enough precision to avoid overflow and underflow.

Capability:
DotProductKHR

5+

4452

<id> Result Type

Result <id>

<id> Vector 1

<id> Vector 2

Optional
Packed Vector Format

OpSDotAccSatKHR

Signed integer dot product of Vector 1 and Vector 2 and signed saturating addition of the result with Accumulator.

Result Type must be an integer type whose Width must be greater than or equal to that of the components of Vector 1 and Vector 2.

Vector 1 and Vector 2 must have the same type.

Vector 1 and Vector 2 must be either 32-bit integers (enabled by DotProductInput4x8BitPackedKHR) or vectors of integer type (enabled by DotProductInput4x8BitKHR or DotProductInputAllKHR).

The type of Accumulator must be the same as Result Type.

When Vector 1 and Vector 2 are scalar integer types, Packed Vector Format must be specified to select how the integers are to be interpreted as vectors.

All components of the input vectors are sign-extended to the bit width of the result’s type. The sign-extended input vectors are then multiplied component-wise and all components of the vector resulting from the component-wise multiplication are added together. Finally, the resulting sum is added to the input accumulator. This final addition is saturating.

If any of the multiplications or additions, with the exception of the final accumulation, overflow or underflow, the result of the instruction is undefined.

Capability:
DotProductKHR

6+

4453

<id> Result Type

Result <id>

<id> Vector 1

<id> Vector 2

<id> Accumulator

Optional
Packed Vector Format

OpUDotAccSatKHR

Unsigned integer dot product of Vector 1 and Vector 2 and unsigned saturating addition of the result with Accumulator.

Result Type must be an integer type with Signedness of 0 whose Width must be greater than or equal to that of the components of Vector 1 and Vector 2.

Vector 1 and Vector 2 must have the same type.

Vector 1 and Vector 2 must be either 32-bit integers (enabled by DotProductInput4x8BitPackedKHR) or vectors of integer type with Signedness of 0 (enabled by DotProductInput4x8BitKHR or DotProductInputAllKHR).

The type of Accumulator must be the same as Result Type.

When Vector 1 and Vector 2 are scalar integer types, Packed Vector Format must be specified to select how the integers are to be interpreted as vectors.

All components of the input vectors are zero-extended to the bit width of the result’s type. The zero-extended input vectors are then multiplied component-wise and all components of the vector resulting from the component-wise multiplication are added together. Finally, the resulting sum is added to the input accumulator. This final addition is saturating.

If any of the multiplications or additions, with the exception of the final accumulation, overflow or underflow, the result of the instruction is undefined.

Capability:
DotProductKHR

6+

4454

<id> Result Type

Result <id>

<id> Vector 1

<id> Vector 2

<id> Accumulator

Optional
Packed Vector Format

OpSUDotAccSatKHR

Mixed-signedness integer dot product of Vector 1 and Vector 2 and signed saturating addition of the result with Accumulator. Components of Vector 1 are treated as signed, components of Vector 2 are treated as unsigned.

Result Type must be an integer type whose Width must be greater than or equal to that of the components of Vector 1 and Vector 2.

Vector 1 and Vector 2 must be either 32-bit integers (enabled by DotProductInput4x8BitPackedKHR) or vectors of integer type with the same number of components and same component Width (enabled by DotProductInput4x8BitKHR or DotProductInputAllKHR). When Vector 1 and Vector 2 are vectors, the components of Vector 2 must have a Signedness of 0.

The type of Accumulator must be the same as Result Type.

When Vector 1 and Vector 2 are scalar integer types, Packed Vector Format must be specified to select how the integers are to be interpreted as vectors.

All components of Vector 1 are sign-extended to the bit width of the result’s type. All components of Vector 2 are zero-extended to the bit width of the result’s type. The sign- or zero-extended input vectors are then multiplied component-wise and all components of the vector resulting from the component-wise multiplication are added together. Finally, the resulting sum is added to the input accumulator. This final addition is saturating.

If any of the multiplications or additions, with the exception of the final accumulation, overflow or underflow, the result of the instruction is undefined.

Capability:
DotProductKHR

6+

4455

<id> Result Type

Result <id>

<id> Vector 1

<id> Vector 2

<id> Accumulator

Optional
Packed Vector Format

Interactions with type capabilities

Support for specific input types is enabled by various capabilities as follows.

Vectors of 4 8-bit integer components packed into a 32-bit integer are enabled by DotProductInput4x8BitPackedKHR.

Vectors of 4 8-bit integer components are enabled by DotProductInput4x8BitKHR.

Vectors of any other type are enabled by DotProductInputAllKHR along with other capabilities:

  • 2-, 3- or 4-component vectors require no additional capabilities

  • 8- or 16-component vectors require Vector16

  • 8-bit components require Int8

  • 16-bit components require Int16

  • 32-bit components require no additional capabilities

  • 64-bit components require Int64

Issues

  1. How should the signedness of operations be determined?

    RESOLVED: In line with existing instructions, the signedness of operations is carried by instructions (OpS*, OpU\* and OpSU*). Using the signedness of operands couldn’t work at all for OpenCL where signedness isn’t part of the types. Having three separate instructions for that purpose was deemed acceptable. The signedness of operands is contrained to be 0 for instructions that treat their inputs as unsigned to help with validation (as a non-zero value is very likely to be incorrect).

  2. Should there be non-saturating accumulating instructions?

    RESOLVED: No. It is simple enough to spot the dot product followed by an addition pattern and lower it to specific instructions in consumers that have them. There are multiple benefits to this approach:

    • Consumers that have these instructions are forced to optimise the pattern which removes the possibility that a user might use a non-accumulating instruction followed by an addition instead of an accumulating instruction.

    • Keeping the addition and dot product separate may expose additional optimisation opportunities.

    • Most high-level languages already have operators for addition. This reduces the number of new built-in functions to introduce.

  3. Shouldn’t the width of the result type always be large enough to accomodate all possible values of the input vectors?

    RESOLVED: No. This prevents implementing the instructions with lower precision arithmetic in some cases and is not consistent with other arithmetic instructions. Programs that need the result type to be large enough to represent the dot product of the input vectors for all possible values of the input vectors should choose a result type that satisfies the following constraint:

    result_width >= input_component_width * 2 + ceil(log2(input_num_components))

Revision History

Rev Date Author Changes

3

2021-09-08

Kévin Petit

Clarify how vectors are packed into 32-bit integers

2

2021-06-09

Kévin Petit

Use a single capability to enable all instructions

1

2020-05-20

Kévin Petit

Initial revision