Name Strings
SPV_INTEL_task_sequence
Contact
To report problems with this extension, please open a new issue at:
Contributors
-
Jessica Davies, Intel
-
Joe Garvey, Intel
-
Robert Ho, Intel
-
Michael Kinsner, Intel
-
Abhishek Tiwari, Intel
-
Bowen Xue, Intel
Notice
Copyright (c) 2022-2024 Intel Corporation. All rights reserved.
Status
Complete
Version
Last Modified Date |
2023-03-06 |
Revision |
1 |
Dependencies
This extension is written against the SPIR-V Specification, Version 1.6 Revision 2.
This extension requires SPIR-V 1.0.
Overview
A task sequence is an abstraction of a sequence of calls to a function that can execute asynchronously from the caller and each other. This extension introduces four new instructions that support task sequence execution.
The OpTaskSequenceCreateINTEL instruction creates a task sequence to which asynchronous function calls can be submitted through the OpTaskSequenceAsyncINTEL instruction. The results of those function calls can be queried with the OpTaskSequenceGetINTEL instruction.
Task Sequence and Task Threads
A task sequence object can be created by calling OpTaskSequenceCreateINTEL. The OpTaskSequenceAsyncINTEL, OpTaskSequenceGetINTEL, and OpTaskSequenceReleaseINTEL instructions take a task sequence object as an argument. The OpTaskSequenceAsyncINTEL instruction creates a invocation which will be referred to as a task thread in this document. This task thread is said to belong to the task sequence specified to the OpTaskSequenceAsyncINTEL instruction. The OpTaskSequenceGetINTEL instruction returns the result of a task thread in the specified task sequence. Results are returned from the task sequence in the same order as the OpTaskSequenceAsyncINTEL calls are made to the task sequence.
An OpFunction f
is passed as an argument to the OpTaskSequenceCreateINTEL
instruction. The task threads belonging to a task sequence asynchronously
execute f
and they may run in parallel with the caller and with any other
task threads. The implementation is not required to run these task threads in
parallel except in so far as is necessary to meet the forward progress
guarantees outlined in the section below.
Forward Progress Guarantees and Execution Model
A task thread is a new Invocation which has a LocalInvocationId, GlobalInvocationId, and WorkgroupId of 0, WorkgroupSize and GlobalSize of 1 and LocalSize of 1, 1, 1. It does not share Workgroup storage class memory or Function storage class memory with the caller or with other task threads. It can access memory from CrossWorkgroup storage class. A task thread cannot synchronize with the caller or with other task threads using a barrier.
Calling OpTaskSequenceAsyncINTEL is analogous to enqueuing an OpenCL
kernel with global_work_offset, global_work_size, local_work_size set to
0, 1, 1 , i.e., a task kernel. This extension does not support
any instruction which would be analogous to enqueuing a kernel with a different
geometry.
|
An OpTaskSequenceAsyncINTEL call is guaranteed to not block the caller as long
as the number of task threads in the task sequence is strictly less than the
AsyncCapacity
of the sequence.
A task thread executes f
and then writes its completion status and
results to an output data structure D associated with the sequence. The
task thread can only write into D if there is space available in it and the
task thread ceases to exist after writing its results. The implementation must
ensure that at least GetCapacity
task threads can store their outputs to D.
Results are removed from D when they are retrieved by OpTaskSequenceGetINTEL
calls. An OpTaskSequenceGetINTEL call is guaranteed to block the caller if
there are no results stored in D.
C++ defines a framework for describing the forward progress of individual thread of execution in a multi-threaded program. Here are the terms and definitions from the C++ specification that we will use to define progress guarantees for task threads:
-
Weakly parallel forward progress guarantee: the implementation does not ensure that the thread will eventually make progress.
-
Concurrent forward progress guarantee: the implementation ensures that the thread will eventually make progress for as long as it has not terminated.
-
Blocking with forward progress guarantee delegation: When a thread of execution A is specified to block with forward progress guarantee delegation on the completion of a set M of threads of execution, then throughout the whole time of A being blocked on M, the implementation shall ensure that the forward progress guarantees provided by at least one thread of execution in M is at least as strong as A's forward progress guarantees. It is unspecified which thread or threads of execution in M are chosen and for which number of execution steps. The strengthening is not necessarily in place for the rest of the lifetime of the affected thread of execution. Using the above definitions, the progress guarantees for task threads are defined as follows:
-
When a task sequence object O is created by OpTaskSequenceCreateINTEL, a task sequence object thread is also created.
-
At any point in time, the progress guarantee of all task sequence object threads created by a work item WI matches that of WI. For example, if WI is strengthened to have a stronger progress guarantee than its initial guarantee, all of the task sequence object threads created by WI are also strengthened.
-
A call to
OpTaskSequenceAsyncINTEL(O, …)
will result in creation of a task thread.OpTaskSequenceAsyncINTEL(O, …)
can be called multiple times to create multiple task threads for O. A task thread has weakly parallel forward progress guarantee. -
Upon creation, a task sequence object thread P immediately blocks on the set S of task threads that belong to O with forward progress guarantee delegation.
-
If a task thread with concurrent forward progress guarantee has finished executing
f
and if it can write its results to the output data structure D, then it does so and some other task thread in S is strengthened to have concurrent forward progress guarantee. If a task thread cannot write its results to D, the task thread blocks until space is available.
-
The two examples below, respectively, show the following:
-
How strengthening of a work item strengthens the task threads.
-
How a task thread delegates its progress guarantee to other task threads in the same task sequence object.
Example 1 uses the following pseudo-code program:
// A work item WI
{
...
TaskSeqObject1 = OpTaskSequenceCreateINTEL(SomeFunction, ...); // Object_1_Thread
OpTaskSequenceAsyncINTEL(TaskSeqObject1, ...); // Task_1_1
OpTaskSequenceAsyncINTEL(TaskSeqObject1, ...); // Task_1_2
...
TaskSeqObject2 = OpTaskSequenceCreateINTEL(SomeFunction, ...); // Object_2_Thread
OpTaskSequenceAsyncINTEL(TaskSeqObject2, ...); // Task_2_1
OpTaskSequenceAsyncINTEL(TaskSeqObject2, ...); // Task_2_2
}
The OpTaskSequenceCreateINTEL calls create task object threads Object_1_Thread and Object_2_Thread. The first two OpTaskSequenceAsyncINTEL calls create task threads Task_1_1 and Task_1_2. Similarly the next two calls create Task_2_1 and Task_2_2.
The table below provides a view of the hierarchy of task threads that will be generated.
Work Item |
WI |
|||
Task Sequence Object Thread |
Object_1_Thread |
Object_2_Thread |
||
Task Thread |
Task_1_1 |
Task_1_2 |
Task_2_1 |
Task_2_2 |
At some initial stage, all task threads have weakly parallel forward progress guarantee. If WI is strengthened to have concurrent forward progress guarantee, then all of the object threads are also strengthened. Next, in this example one task thread for each task sequence is also strengthened. This is depicted in the table below (progress guarantee for each thread is in parenthesis):
Work Item |
WI (concurrently parallel) |
|||
Task Sequence Object Thread |
Object_1_Thread (concurrent) |
Object_2_Thread (concurrent) |
||
Task Thread |
Task_1_1 (weakly parallel) |
Task_1_2 (concurrent) |
Task_2_1 (concurrent) |
Task_2_2 (weakly parallel) |
The next example shows how a task thread delegates its progress guarantee to another task thread:
Assume that we have a task sequence TS with GetCapacity
of 1 and
AsyncCapacity
of 5. Four OpTaskSequenceAsyncINTEL calls create the
following task threads: T1, T2, T3 and T4, for TS. T1 has
concurrent forward progress guarantee after getting strengthened, while
T2, T3 and T4 have weakly parallel forward progress guarantees. The
task threads go through the following execution flow:
-
T1 finishes executing the function
f
associated with TS. -
For TS, the output data structure D can store the output of only one task thread since
GetCapacity
is one. T1 writes its output. -
Any task thread can now be picked to be strengthened to have concurrent forward progress guarantee. Let’s say T2 is picked.
-
At some point T2 finishes executing
f
. T1's results are still in the output data structure. -
T2 cannot write its results until space is available in D. Hence , none of the other task threads can be picked to be strengthened to the stronger progress guarantee.
-
OpTaskSequenceGetINTEL is invoked. T1's results get removed from D.
-
T2 can write its results and some other task thread can be picked to be strengthened.
Memory Order Semantics
-
OpTaskSequenceAsyncINTEL is a Release operation scoped to include the work item that called it and the task thread that the OpTaskSequenceAsyncINTEL call creates.
-
The beginning of a task thread T is an Acquire operation scoped to include the work item that called OpTaskSequenceAsyncINTEL to create T and the task thread T.
-
The end of a task thread T is a Release operation scoped to include T and the work item that called OpTaskSequenceAsyncINTEL to create T.
-
OpTaskSequenceGetINTEL is an Acquire operation scoped to include the task thread that is being retrieved by OpTaskSequenceGetINTEL and the work item that is calling OpTaskSequenceGetINTEL.
Extension Name
To use this extension within a SPIR-V module, the following OpExtension must be present in the module:
OpExtension "SPV_INTEL_task_sequence"
New Capabilities
This extension introduces a new capability:
TaskSequenceINTEL
New Instructions
Instructions added under the TaskSequenceINTEL capability:
OpTaskSequenceCreateINTEL OpTaskSequenceAsyncINTEL OpTaskSequenceGetINTEL OpTaskSequenceReleaseINTEL
Token Number Assignments
TaskSequenceINTEL |
6162 |
OpTaskSequenceCreateINTEL |
6163 |
OpTaskSequenceAsyncINTEL |
6164 |
OpTaskSequenceGetINTEL |
6165 |
OpTaskSequenceReleaseINTEL |
6166 |
OpTypeTaskSequenceINTEL |
6199 |
Modifications to the SPIR-V Specification, Version 1.6, Revision 2
Capability
Modify Section 3.31, Capability, adding a row to the Capability table:
Capability | Implicitly Declares | |
---|---|---|
6162 |
TaskSequenceINTEL |
Type Declaration Instruction
Add a new subsection, 3.42.26, Task Sequence Type Declaration Instruction, and add one new instruction in this subsection as follows:
OpTypeTaskSequenceINTEL Declare a task sequence type. |
Capability: TaskSequenceINTEL |
|
2 |
6199 |
Result |
Instructions
Add a new subsection, 3.42.27, Task Sequence Instructions, and add four new instructions in this subsection as follows:
OpTaskSequenceCreateINTEL Create and return an instance of a task sequence with type OpTypeTaskSequenceINTEL. All calls to OpTaskSequenceAsyncINTEL with Result passed in as an argument will execute the function Function. Result Type must be OpTypeTaskSequenceINTEL. Function is an OpFunction. Pipelined is a literal 32-bit signed integer and it represents the following based on the value: 0 - Do not pipeline the task sequence data path. N - (N > 0), Pipeline the data path such that a new invocation of the task sequence can be launched every N cycles (also known as the Initiation Interval). -1 - Pipeline the task sequence with a compiler determined Initiation Interval. This argument is only meaningful on FPGA devices. ClusterMode is a literal 32-bit signed integer and it is a request for the method that statically-scheduled clusters should handle stalls: using an exit FIFO to drain computations from the cluster or using a stall-enable signal to freeze computations within the cluster. The valid values are: 0 - Direct the compiler to use stall-free clusters. 1 - Direct the compiler to use stall-enable clusters. -1 - Let the compiler decide which type of cluster to use. This argument is only meaningful on FPGA devices. GetCapacity is a literal 32-bit unsigned integer. A task thread that has finished executing Function is guaranteed to write its results to the results data structure of the task sequence as long as there is space to do so. The implementation must ensure that at least the oldest GetCapacity task threads can write their results and completion status. Only task threads that have written their results are counted against this limit. AsyncCapacity is a literal 32-bit unsigned integer. OpTaskSequenceAsyncINTEL calls for Result are guaranteed to not block as long as the number of task threads in Result are strictly less than this limit. |
Capability: TaskSequenceINTEL |
|||||||
8 |
6163 |
<id> |
Result |
<id> |
Literal |
Literal |
Literal |
Literal |
OpTaskSequenceAsyncINTEL Asynchronously invoke the OpFunction Sequence must have type OpTypeTaskSequenceINTEL. This instruction is guaranteed to not block as long as the number of task
threads in Sequence are strictly less than the AsyncCapacity of Sequence.
The call may return before the asynchronous call to Argument N is the object to pass as the N th parameter of the function |
Capability: TaskSequenceINTEL |
||
2+variable |
6164 |
<id> |
<id>, <id>, … |
OpTaskSequenceGetINTEL |
Capability: TaskSequenceINTEL |
|||
4 |
6165 |
<id> |
Result |
<id> |
OpTaskSequenceReleaseINTEL |
Capability: TaskSequenceINTEL |
|
2 |
6166 |
<id> |
SPIR-V Representation in LLVM IR
This is a non-normative section. OpTypeTaskSequenceINTEL
can be mapped to LLVM
opaque type spirv.TaskSequenceINTEL
and mangled as
__spirv_TaskSequenceINTEL__
.
Issues
None.
Revision History
Rev | Date | Author | Changes |
---|---|---|---|
1 |
2023-03-06 |
Abhishek Tiwari |
Initial public release |