Name Strings
SPV_NV_cooperative_matrix
Contact
To report problems with this extension, please open a new issue at:
Contributors
-
Jeff Bolz, NVIDIA
-
Daniel Koch, NVIDIA
-
Markus Tavenrath, NVIDIA
Notice
Copyright (c) 2018-2019 NVIDIA Corp.
Status
-
Complete
Version
Last Modified Date |
2019-07-12 |
Revision |
2 |
Dependencies
This extension is written against the SPIR-V Specification, Version 1.3, Revision 5, Unified.
This extension requires SPIR-V 1.3.
This extension requires SPV_KHR_vulkan_memory_model.
This extension interacts with SPV_EXT_physical_storage_buffer.
Overview
This extension adds a new set of types known as "cooperative matrix" types, where the storage for and computations performed on the matrix are spread across a set of invocations such as a subgroup. These types give the implementation freedom in how to optimize matrix multiplies.
This extension introduces the types and instructions, but does not specify rules about what sizes/combinations are valid. This is left to the client API specs, and it is expected that different implementations may support different sizes. To help accommodate this, the dimensions of the cooperative types can be specialized via specialization constants. Since the scope parameter is also something that could potentially be specialized, this extension allows all scope ids to be specialization constants.
Extension Name
To use this extension within a SPIR-V module, the following OpExtension must be present in the module:
OpExtension "SPV_NV_cooperative_matrix"
Modifications to the SPIR-V Specification, Version 1.3
2.2 Terms
Add new terms to section 2.2.2 Types:
Cooperative Matrix: A two-dimensional ordered collection of scalars, whose storage is spread across multiple shader invocations.
Add Cooperative Matrix to the definition of Concrete Type.
Add Cooperative Matrix to the definition of Composite. As a composite, a cooperative matrix acts like a vector with an implementation-dependent number of components (which can be queried with OpCooperativeMatrixLengthNV). It can be used as a composite for all operations that act on composite types, including OpCompositeExtract, OpCompositeInsert, OpAccessChain, etc.. The mapping of components to invocations and indexes is implementation-dependent.
2.16 Validation Rules
Modify section 2.16.1. Universal Validation Rules:
Add OpCooperativeMatrixLoadNV and OpCooperativeMatrixStoreNV to the list of instructions under "A pointer can only be an operand to the following instructions:", when the Logical addressing model is selected and the VariablePointers capability is not declared.
Cooperative matrix types (or types containing them) can only be allocated in Function or Private storage classes.
Modify section 2.16.2. Shader Validation Rules, to relax the rules for scopes to allow specialization constants:
All <id> used for Scope and Memory Semantics must be the result of a constant instruction.
3.26 Memory Access
Modify Section 3.26, "Memory Access":
In the description of MakePointerAvailableKHR added by SPV_KHR_vulkan_memory_model, change "Not valid with OpLoad" to "Not valid with OpLoad or OpCooperativeMatrixLoadNV".
In the description of MakePointerVisibleKHR added by SPV_KHR_vulkan_memory_model, change "Not valid with OpStore" to "Not valid with OpStore or OpCooperativeMatrixStoreNV".
3.31 Capabilities
Modify Section 3.31, "Capability", adding these rows to the Capability table:
Capability | Enabling Capabilities | |
---|---|---|
5357 |
CooperativeMatrixNV |
Shader |
3.32.6 Type-Declaration Instructions
3.32.7 Constant-Creation Instructions
Modify OpConstantComposite to make an exception for cooperative matrix types: "If the Result Type is a cooperative matrix type, then there must be only one Constituent and it is used to initialize all members."
Modify OpSpecConstantOp to add: If the CooperativeMatrixNV capability was declared, the following opcode is also valid: OpCooperativeMatrixLengthNV. Relax the limitation on operands for this instruction to allow the operand to be a cooperative matrix type.
3.32.8 Memory Instructions
3.32.11 Conversion Instructions
Allow cooperative matrix types for the following conversion instructions (if the component types are appropriate): OpConvertFToU, OpConvertFToS, OpConvertSToF, OpConvertUToF, OpUConvert, OpSConvert, OpFConvert. The result type and value type must have the same scope, number of rows, and number of columns.
3.32.12 Composite Instructions
Modify OpCompositeConstruct to make an exception for cooperative matrix types: "If the Result Type is a cooperative matrix type, then there must be only one Constituent and it is used to initialize all members."
3.32.13 Arithmetic Instructions
Allow cooperative matrix types for the following instructions (if the component type is appropriate): OpSNegate, OpFNegate, OpIAdd, OpFAdd, OpISub, OpFSub, OpFDiv, OpSDiv, and OpUDiv. Allow cooperative matrix types for OpMatrixTimesScalar.
Issues
1) Should this functionality hardwire a scope (subgroup) or be more flexible?
Discussion: Volta hardware used a smaller scope (8 threads), and it is plausible that other implemenations could also want a smaller scope.
Resolution: Allow a specialization constant scope.
2) Should we have capabilities for each MxNxK matrix multiply "size" that is supported?
Discussion: It’s nice for validation if the shader instructions can be validated solely based on the OpCapability instructions. But that already breaks down for spec-constant-defined cooperative matrix types.
Resolution: Just one capability for the overall feature.
3) Should stride be in bytes or elements?
Discussion: Using elements helps avoid the unsupportable (or more difficult to support) cases.
Resolution: Stride is in elements of the pointee type (which can be different than the matrix component type).
4) Should we allow matrices to be stored in an opaque layout in shared memory?
Discussion: Currently matrices can be stored to shared memory as an array of floats, in row- or column-major order. It might be beneficial to let shaders spill matrices to shared memory in an opaque, implementation-dependent layout. There are a few possible ways to handle this: (1) Reuse the existing OpCooperativeMatrixLoad/Store opcodes with a flag or a trick like Stride=0, (2) new instructions without the stride parameter, (3) let the cooperative matrix types be placed in shared memory and use OpLoad/OpStore.
Resolution: Not supported in the current extension.
5) Should the "column major" operand be a literal constant, or a constant instruction?
Discussion: Constant instructions are more general, and easier for code generation.
Resolution: Constant instruction.
6) Should we allow OpTranspose on cooperative matrix types?
Discussion: In NVIDIA’s initial implementation, we’ll support a pretty restricted set of sizes where the transpose of a matrix will sometimes not be a valid type. So it’s unclear if this is useful.
Resolved: Not supported in this extension.
7) What should the Pointer operand to a cooperative Load/Store be?
Discussion: The spec currently chooses to have the Pointer parameter point at the first element of the matrix in memory, and this pointer is assumed to be in the middle of an array. Another option would be to have the Pointer parameter be a pointer to the whole array, and have an additional "Element" parameter to the instructions, which indicates where the matrix starts in the array.
The alternative option’s main benefit is that you don’t end up with a pointer parameter being used to access something it does not point to. However, it effectively splits out the last element of the access chain into the load/store instruction, which is kind of weird. And in the first option, the pointer to the array is still there implicitly in the access chain.
Resolution: Pointer points to the first element of the array.
8) Should we allow the Pointer type and matrix component type to mismatch?
Resolution: Yes, this makes it easier to efficiently load matrix data into shared memory, which can be declared to use a larger type (e.g. uvec4). The Stride parameter is interpreted in units of the pointed-to type, not in units of the matrix’s component type.
Revision History
Rev | Date | Author | Changes |
---|---|---|---|
1 |
2019-01-30 |
Jeff Bolz |
Initial revision |
2 |
2019-07-12 |
Jeff Bolz |
Added details for integer operations |