Name Strings

SPV_AMD_shader_ballot

Contact

See Issues list in the Khronos SPIRV-Registry repository: https://github.com/KhronosGroup/SPIRV-Registry

Contributors

  • Qun Lin, AMD

  • Graham Sellers, AMD

  • Daniel Rakos, AMD

  • Rex Xu, AMD

  • Dominik Witczak, AMD

  • Matthäus G. Chajdas, AMD

Notice

Copyright (c) 2016 The Khronos Group Inc. Copyright terms at http://www.khronos.org/registry/speccopyright.html

Status

Released.

Version

Modified Date: March 28, 2018 Revision: 6

Dependencies

This extension is written against Revision 1 of the version 1.10 of the SPIR-V Specification.

The extension is written against Revision 1 of the OpenGL extension AMD_shader_ballot.

Overview

This extension is written to provide the functionality of the AMD_shader_ballot, OpenGL Shading Language Specification extension, for SPIR-V.

This extension introduces eight core instructions and four new extended instructions to SPIR-V that enable additional subgroup operations in shaders.

This extension adds 16-bit result type support to a number of core group operations.

Extension Name

To enable SPV_AMD_shader_ballot extension in SPIR-V, use

OpExtension "SPV_AMD_shader_ballot"

New Instructions

This extension adds the following core instructions

OpGroupIAddNonUniformAMD = 5000
OpGroupFAddNonUniformAMD = 5001
OpGroupFMinNonUniformAMD = 5002
OpGroupUMinNonUniformAMD = 5003
OpGroupSMinNonUniformAMD = 5004
OpGroupFMaxNonUniformAMD = 5005
OpGroupUMaxNonUniformAMD = 5006
OpGroupSMaxNonUniformAMD = 5007

This extension adds the following extended instructions

SwizzleInvocationsAMD = 1
SwizzleInvocationsMaskedAMD = 2
WriteInvocationAMD = 3
MbcntAMD = 4

To use the new core instructions, declare:

OpCapability Groups
OpExtension "SPV_AMD_shader_ballot"

To use the new extended instructions, declare:

OpExtension "SPV_AMD_shader_ballot"
OpExtInstImport %ext "SPV_AMD_shader_ballot"

Modifications to the SPIR-V Specification, Version 1.1

Modify Section 3.32.21, Group Instructions

OpGroupIAdd

(Replace the following sentence):

<Result Type> must be a 32-bit or 64-bit <integer type> scalar.

(with):

<Result Type> must be an <integer type> scalar.

OpGroupUMin

(Replace the following sentence):

<Result Type> must be a 32-bit or 64-bit <integer type> scalar.

(with):

<Result Type> must be an <integer type> scalar.

OpGroupSMin

(Replace the following sentence):

<Result Type> must be a 32-bit or 64-bit <integer type> scalar.

(with):

<Result Type> must be an <integer type> scalar.

OpGroupUMax

(Replace the following sentence):

<Result Type> must be a 32-bit or 64-bit <integer type> scalar.

(with):

<Result Type> must be an <integer type> scalar.

OpGroupSMax

(Replace the following sentence):

<Result Type> must be a 32-bit or 64-bit <integer type> scalar.

(with):

<Result Type> must be an <integer type> scalar.

(Add to the end of the section)

OpGroupIAddNonUniformAMD

An integer add group operation specified for all values of <X> specified by invocations in the group.

The identity <I> is 0.

All invocations of this module within <Execution> must reach this point of execution.

This instruction is able to work correctly if placed within non-uniform control flow within <Execution>.

<Result Type> must be an <integer type> scalar.

<Execution> must be Workgroup or Subgroup Scope.

The type of <X> must be the same as <Result Type>.

6 | 5000  | <id> Result Type | <id> Result  | Scope <id> Execution | Group Operation | <id> X

OpGroupFAddNonUniformAMD

A floating-point add group operation specified for all values of <X> specified by invocations in the group.

The identity <I> is 0.

All invocations of this module within <Execution> must reach this point of execution.

This instruction is able to work correctly if placed within non-uniform control flow within <Execution>.

<Result Type> must be an <integer type> scalar.

<Execution> must be Workgroup or Subgroup Scope.

The type of <X> must be the same as <Result Type>.

6 | 5001 | <id> Result Type | <id> Result | <id> Scope Execution | Group Operation | <id> X

OpGroupFMinNonUniformAMD

A floating-point minimum group operation specified for all values of <X> specified by invocations in the group.

The identity <I> is +INF.

All invocations of this module within <Execution> must reach this point of execution.

This instruction is able to work correctly if placed within non-uniform control flow within <Execution>.

<Result Type> must be an <integer type> scalar.

<Execution> must be Workgroup or Subgroup Scope.

The type of <X> must be the same as <Result Type>.

6 | 5002 | <id> Result Type | <id> Result | <id> Scope Execution | Group Operation | <id> X

OpGroupUMinNonUniformAMD

An unsigned integer minimum group operation specified for all values of <X> specified by invocations in the group.

The identity <I> is UINT_MAX when X is 32 bits wide and ULONG_MAX when <X> is 64 bits wide.

All invocations of this module within <Execution> must reach this point of execution.

This instruction is able to work correctly if placed within non-uniform control flow within <Execution>.

<Result Type> must be an <integer type> scalar.

<Execution> must be Workgroup or Subgroup Scope.

The type of <X> must be the same as <Result Type>.

6 | 5003 | <id> Result Type | <id> Result | <id> Scope Execution | Group Operation | <id> X

OpGroupSMinNonUniformAMD

A signed integer minimum group operation specified for all values of <X> specified by invocations in the group.

The identity <I> is INT_MAX when X is 32 bits wide and LONG_MAX when <X> is 64 bits wide.

All invocations of this module within <Execution> must reach this point of execution.

This instruction is able to work correctly if placed within non-uniform control flow within <Execution>.

<Result Type> must be an <integer type> scalar.

<Execution> must be Workgroup or Subgroup Scope.

The type of <X> must be the same as <Result Type>.

6 | 5004 | <id> Result Type | <id> Result | <id> Scope Execution | Group Operation | <id> X

OpGroupFMaxNonUniformAMD

A floating-point maximum group operation specified for all values of <X> specified by invocations in the group.

The identity <I> is -INF.

All invocations of this module within <Execution> must reach this point of execution.

This instruction is able to work correctly if placed within non-uniform control flow within <Execution>.

<Result Type> must be an <integer type> scalar.

<Execution> must be Workgroup or Subgroup Scope.

The type of <X> must be the same as <Result Type>.

6 | 5005 | <id> Result Type | <id> Result | <id> Scope Execution | Group Operation | <id> X

OpGroupUMaxNonUniformAMD

An unsigned integer maximum group operation specified for all values of <X> specified by invocations in the group.

The identity <I> is 0.

All invocations of this module within <Execution> must reach this point of execution.

This instruction is able to work correctly if placed within non-uniform control flow within <Execution>.

<Result Type> must be an <integer type> scalar.

<Execution> must be Workgroup or Subgroup Scope.

The type of <X> must be the same as <Result Type>.

6 | 5006 | <id> Result Type | <id> Result | <id> Scope Execution> | Group Operation | <id> X

OpGroupSMaxNonUniformAMD

A signed integer maximum group operation specified for all values of <X> specified by invocations in the group.

The identity <I> is INT_MIN when X is 32 bits wide and LONG_MIN when <X> is 64 bits wide.

All invocations of this module within <Execution> must reach this point of execution.

This instruction is able to work correctly if placed within non-uniform control flow within <Execution>.

<Result Type> must be an <integer type> scalar.

<Execution> must be Workgroup or Subgroup Scope.

The type of <X> must be the same as <Result Type>.

6 | 5007 | <id> Result Type | <id> Result | <id> Scope Execution | Group Operation | <id> X

SwizzleInvocationsAMD

Swizzles data within a group of 4 consecutive invocations of the subgroup based on <offset> as described below:

for (i = 0; i < SubgroupSize; i+=4) {
    dataOut[i+0] = isActive[i+offset.x] ? dataIn[i+offset.x] : 0;
    dataOut[i+1] = isActive[i+offset.y] ? dataIn[i+offset.y] : 0;
    dataOut[i+2] = isActive[i+offset.z] ? dataIn[i+offset.z] : 0;
    dataOut[i+3] = isActive[i+offset.w] ? dataIn[i+offset.w] : 0;
}

Where:

  • isActive[i] tells whether the invocation with the index <i> is currently active within the subgroup.

  • dataIn[i] is the value of <data> for invocation index <i>.

  • dataOut[i] is the return value of the function for invocation index <i>.

The operand data can be any scalar or vector type.

The operand offset must be a unsigned integer vector with 4 components, and each component is constant integer with a value in the range [0, 3].

Result Type and the type of operand <data> must be the same type.

3 | 1 | <id> data | <id> offset

SwizzleInvocationsMaskedAMD

Swizzles data within a group of 32 consecutive invocations with a limited mask as described below:

for (i = 0; i < SubgroupSize; i++) {
   j = (((i & 0x1f) & mask.x) | mask.y) ^ mask.z;
   j |= (i & 0x20); // which group of 32
   dataOut[i] = isActive[j] ? dataIn[j] : 0;
}

Where:

  • isActive[i] tells whether the invocation with the index <i> is currently active within the subgroup.

  • dataIn[i] is the value of <data> for invocation index <i>.

  • dataOut[i] is the return value of the function for invocation index <i>.

The operand data can be any scalar or vector type.

The operand mask must be a unsigned integer vector with 3 components, and each component is constant integer with a value in the range [0, 31].

Result Type and the type of operand <data> must be the same type.

3 | 2 | <id> data | <id> mask

WriteInvocationAMD

Returns <inputValue> for all active invocations in the subgroup except for the invocation whose invocation index within the subgroup is <invocationIndex>. Within a subgroup, the outputs are defined as described below:

for (i = 0; i < SubgroupSize; i++) {
   out[i] = (i == invocationIndex) ? writeValue : inputValue;
}

Where out[i] is the return value of the function for invocation index <i>.

Result Type must be a scalar or vector type.

The type of inputValue and writeValue must be the same as Result Type.

invocationIndex must be a 32-bit unsigned integer with a value in the range [0, SubgroupSize - 1].

writeValue and invocationIndex must be dynamically uniform within the subgroup, otherwise the result of the operation is undefined.

4 | 3  | <id> inputValue | <id> writeValue | <id> invocationIndex

MbcntAMD

Returns the bit count of SubgroupLtMaskARB with <mask> as described below:

%X = OpBitwiseAnd u32 %SubgroupLtMaskARB %mask
<Result> = OpBitCount u32 %X

Result Type and mask must be 32-bit unsigned integers.

4 | <id> mask

Validation Rules

None.

Issues

1.

Supported <result types> for group operation instructions depend on capabilities which are defined elsewhere in the SPIR-V code. In specific, these capabilities may come from other SPIR-V extensions, which are out of scope of this extension specification.

Due to the above, we have decided to relax the language restricting allowed result types for group operation instructions so that it now mentions general integer type, instead of specialized integer types.

Revision History

Rev Date Author Changes

1

April 21, 2016

Quentin Lin

Initial revision based on AMD_shader_ballot.

2

May 20, 2016

Dominik Witczak

Document refactoring

3

May 20, 2016

Matthäus G. Chajdas

Document refactoring

4

August 11, 2016

Rex Xu

Add new core instructions to handle group operations placed with non-uniform control flow.

5

October 13, 2016

Dominik Witczak

Added missing numerical value assignments, removed extension number

6

March 28, 2018

Dominik Witczak

Generalized type restrictions for result types of group operation instructions to integer types. Added issue#1.

7

May 16, 2019

Dominik Witczak

Fixed an issue in the section describing how to use the new functionality. Fixed MbcntAMD’s return type.