Name Strings
SPV_KHR_cooperative_matrix
Contact
To report problems with this extension, please open a new issue at:
Contributors
-
Jeff Bolz, NVIDIA
-
Daniel Koch, NVIDIA
-
Markus Tavenrath, NVIDIA
-
Kevin Petit, Arm Ltd.
-
Lei Zhang, Google
-
Ben Ashbaugh, Intel
-
Ruihao Zhang, Qualcomm
-
David Neto, Google
-
Tobias Hector, AMD
-
John Kessenich, Google
-
Nicolai Hahnle, AMD
-
Mariusz Merecki, Intel
-
Pedro Olsen Ferreira, Arm Ltd.
-
Ni Hui, Tencent
-
Dmitry Sidorov, Intel
-
Dong Wang, AMD
-
Ruimin Zhao, AMD
-
Alan Baker, Google
-
Lin Qun, AMD
-
Wooyoung Kim, Qualcomm
-
Krystian Andrzejewski, Intel
Notice
Copyright (c) 2020-2024 The Khronos Group Inc. Copyright terms at http://www.khronos.org/registry/speccopyright.html
Status
-
Approved by the SPIR-V Working Group: 2023-05-03
-
Approved by the Khronos Board of Promoters: 2023-06-16
Version
Last Modified Date |
2024-08-07 |
Revision |
6 |
Dependencies
This extension is written against the SPIR-V Specification, Version 1.6, Revision 2, Unified.
This extension requires SPIR-V 1.6.
This extension interacts with SPV_EXT_physical_storage_buffer.
Overview
This extension adds a new set of types known as "cooperative matrix" types, where the storage for and computations performed on the matrix are spread across a set of invocations such as a subgroup. These types give the implementation freedom in how to optimize matrix multiplies.
This extension introduces the types and instructions, but does not specify rules about what sizes/combinations are valid. This is left to the client API specs, and it is expected that different implementations may support different sizes. To help accommodate this, the dimensions of the cooperative types can be specialized via specialization constants. Since the scope parameter is also something that could potentially be specialized, this extension allows all scope ids to be specialization constants.
Extension Name
To use this extension within a SPIR-V module, the following OpExtension must be present in the module:
OpExtension "SPV_KHR_cooperative_matrix"
Modifications to the SPIR-V Specification, Version 1.6
2.2 Terms
Add new terms to section 2.2.2 Types:
Cooperative Matrix: A two-dimensional ordered collection of scalars, whose storage is spread across multiple shader invocations.
Add Cooperative Matrix to the definition of Abstract Type.
Add Cooperative Matrix to the definition of Composite. A cooperative matrix is a composite with an implementation-dependent number of components (which can be queried with OpCooperativeMatrixLengthKHR). It can be used as a composite for all operations that act on composite types. The mapping of components to invocations and indexes is implementation-dependent.
2.16 Validation Rules
Modify section 2.16.1. Universal Validation Rules:
-
Add OpCooperativeMatrixLoadKHR and OpCooperativeMatrixStoreKHR to the list of instructions under "It is invalid for a pointer to be an operand to any instruction other than:", when the Logical addressing model is selected and neither the VariablePointers nor VariablePointersStorageBuffer capability are declared.
-
Cooperative matrix types (or types containing them) can only be allocated in Function or Private storage classes.
-
The Matrix{A,B,C,Result}SignedComponentsKHR Cooperative Matrix Operand can only be used when the type of the corresponding matrix is an integer type.
Modify section 2.16.2. Validation Rules for Shader Capabilities:
Replace:
-
All <id> used for Scope <id> and Memory Semantics <id> must be of an OpConstant.
with:
-
All <id> used for Scope <id> must be the result of a constant instruction.
-
All <id> used for Memory Semantics <id> must be of an OpConstant.
Add:
-
If the CooperativeMatrixKHR capability is declared then the VulkanMemoryModel capability must be declared as well.
3.26 Memory Operands
Modify Section 3.26, "Memory Operands":
In the description of MakePointerAvailable, change "Not valid with OpLoad" to "Not valid with OpLoad or OpCooperativeMatrixLoadKHR".
In the description of MakePointerVisible, change "Not valid with OpStore" to "Not valid with OpStore or OpCooperativeMatrixStoreKHR".
3.31 Capabilities
Modify Section 3.31, "Capability", adding these rows to the Capability table:
Capability | Enabling Capabilities | |
---|---|---|
6022 |
CooperativeMatrixKHR |
3.X Cooperative Matrix Operands
New section in 3 "Binary Form".
Cooperative Matrix Operands | Enabling Capabilities | |
---|---|---|
0x0 |
NoneKHR |
|
0x1 |
MatrixASignedComponentsKHR |
|
0x2 |
MatrixBSignedComponentsKHR |
|
0x4 |
MatrixCSignedComponentsKHR |
|
0x8 |
MatrixResultSignedComponentsKHR |
|
0x10 |
SaturatingAccumulationKHR |
3.X Cooperative Matrix Layout
New section in 3 "Binary Form".
Cooperative Matrix Layout | Enabling Capabilities | |
---|---|---|
0x0 |
RowMajorKHR |
|
0x1 |
ColumnMajorKHR |
3.X Cooperative Matrix Use
New section in 3 "Binary Form".
Cooperative Matrix Use | Enabling Capabilities | |
---|---|---|
0 |
MatrixAKHR |
|
1 |
MatrixBKHR |
|
2 |
MatrixAccumulatorKHR |
3.42.6 Type-Declaration Instructions
3.42.7 Constant-Creation Instructions
Modify OpConstantComposite to make an exception for cooperative matrix types: "If the Result Type is a cooperative matrix type, then there must be only one Constituent and it is used to initialize all members."
3.42.8 Memory Instructions
3.42.11 Conversion Instructions
Allow values of cooperative matrix type for the following conversion instructions (if the component types are appropriate): OpConvertFToU, OpConvertFToS, OpConvertSToF, OpConvertUToF, OpUConvert, OpSConvert, OpFConvert. Allow the use of OpBitcast on objects of cooperative matrix type whose Component Type are integer types with the same Width. The result type and value type must have the same Scope, number of Rows, number of Columns, and Use.
All the operands to this instruction must be dynamically uniform within every instance of the Scope of the cooperative matrix.
3.42.12 Composite Instructions
Modify OpCompositeConstruct to make an exception for cooperative matrix types: "If the Result Type is a cooperative matrix type, then there must be only one Constituent and it is used to initialize all members. The Constituent must be dynamically uniform within the Scope of the cooperative matrix type.
3.42.13 Arithmetic Instructions
Allow cooperative matrix types for the following arithmetic instructions:
-
OpSNegate and OpFNegate
-
OpIAdd, OpFAdd, OpISub, OpFSub, OpFMul, OpIMul, OpFDiv, OpSDiv, and OpUDiv.
if their Component Type is appropriate:
-
OpF instructions can be used with cooperative matrix types whose Component Type is a floating-point type.
-
OpI, OpS, and OpU instructions can be used with cooperative matrix types whose Component Type is an integer type.
Unary arithmetic instructions operate on the individual elements of the cooperative matrices.
Binary arithmetic instructions operate on the individual elements of a pair of cooperative matrices whose type must match.
Allow cooperative matrix types for OpMatrixTimesScalar.
All the operands to this instruction must be dynamically uniform within every instance of the Scope of the cooperative matrix.
Issues
-
Should cooperative operations imply a fixed scope (e.g. Subgroup) or be more flexible?
Discussion: Some hardware (e.g. NVidia Volta) use a smaller scope than the typical Subgroup size, and it is plausible that other implementations could also want a different scope.
RESOLVED: Allow a specialization constant scope.
-
Should we have capabilities for each MxNxK matrix multiply "size" that is supported?
Discussion: It’s nice for validation if the shader instructions can be validated solely based on the OpCapability instructions. But that already breaks down for spec-constant-defined cooperative matrix types.
RESOLVED: Just one capability for the overall feature.
-
Should strides be in bytes or elements?
Discussion: Using elements helps avoid the unsupportable (or more difficult to support) cases.
RESOLVED: Stride is in elements of the pointee type (which can be different than the matrix component type).
-
Should we allow matrices to be stored in an opaque layout in shared memory?
Discussion: Some implementation need opaque layouts for optimal performance.
RESOLVED: Load/store instructions accept a layout operand that vendors can use to select custom layouts.
-
Should the MemoryLayout operand be a literal constant, or a constant instruction?
Discussion: Constant instructions are more general, and easier for code generation.
RESOLVED: Constant instruction.
-
Should we allow OpTranspose on cooperative matrix types?
Discussion: Most implementations are expected to support a restricted set of sizes where the transpose of a matrix will sometimes not be a valid type; it’s unclear if this is useful.
RESOLVED: Not supported in this extension.
-
What should the Pointer operand to a cooperative Load/Store be?
Discussion: The spec currently chooses to have the Pointer parameter point at the first element of the matrix in memory, and this pointer is assumed to be in the middle of an array. Another option would be to have the Pointer parameter be a pointer to the whole array, and have an additional "Element" parameter to the instructions, which indicates where the matrix starts in the array.
The alternative option’s main benefit is that you don’t end up with a pointer parameter being used to access something it does not point to. However, it effectively splits out the last element of the access chain into the load/store instruction, which is kind of weird. And in the first option, the pointer to the array is still there implicitly in the access chain.
RESOLVED: Pointer points to the first element of the array.
-
Should we allow the Pointer type and matrix component type to mismatch?
RESOLVED: Yes, this makes it easier to efficiently load matrix data into shared memory, which can be declared to use a larger type (e.g. uvec4). The Stride parameter is interpreted in units of the pointed-to type, not in units of the matrix’s component type.
-
Should we make it possible to use OpMatrixTimesScalar with OpenCL?
RESOLVED: No, this instruction is not generally supported in OpenCL environments and the same can be achieved either via an elementwise multiplication with a cooperative matrix object created from the scalar using OpConstantComposite or by iterating over the elements of the cooperative matrix to multiply each element by the scalar.
-
Both the Stride and Memory Operand operands to OpCooperativeMatrixLoadKHR and OpCooperativeMatrixStoreKHR are optional. Can Memory Operand be provided alone?
RESOLVED: No, in line with core SPIR-V rules, all optional operands that appear before a given optional operand must be provided for it to be possible to provide that given optional operand. Only the following combinations are valid:
-
None of the optional operands are present.
-
Stride alone is present.
-
Stride and Memory Operand are both present.
-
-
Can elements of cooperative matrix objects treated as composites be accessed in non-uniform control flow?
RESOLVED: Yes, control flow uniformity requirements apply to instructions whose operands are cooperative matrix objects but not pointers to cooperative matrix objects. Dereferencing a pointer to an element of a cooperative matrix object can be done in non-uniform control flow.
Revision History
Rev | Date | Author | Changes |
---|---|---|---|
6 |
2024-08-07 |
Jeff Bolz |
Clarify sign/zero-extension behavior |
5 |
2023-12-06 |
Kevin Petit |
Clarifications, mostly of uniformity rules |
4 |
2023-07-26 |
Kevin Petit |
Add KHR suffixes to Cooperative Matrix Operands |
3 |
2023-05-03 |
Kevin Petit |
Initial revision of SPV_KHR_cooperative_matrix |
2 |
2019-07-12 |
Jeff Bolz |
Added details for integer operations |
1 |
2019-01-30 |
Jeff Bolz |
Initial revision of SPV_NV_cooperative_matrix |