Name Strings

SPV_INTEL_task_sequence

Contact

To report problems with this extension, please open a new issue at:

Contributors

  • Jessica Davies, Intel

  • Joe Garvey, Intel

  • Robert Ho, Intel

  • Michael Kinsner, Intel

  • Abhishek Tiwari, Intel

  • Bowen Xue, Intel

Notice

Copyright (c) 2022-2024 Intel Corporation. All rights reserved.

Status

Complete

Version

Last Modified Date

2023-03-06

Revision

1

Dependencies

This extension is written against the SPIR-V Specification, Version 1.6 Revision 2.

This extension requires SPIR-V 1.0.

Overview

A task sequence is an abstraction of a sequence of calls to a function that can execute asynchronously from the caller and each other. This extension introduces four new instructions that support task sequence execution.

The OpTaskSequenceCreateINTEL instruction creates a task sequence to which asynchronous function calls can be submitted through the OpTaskSequenceAsyncINTEL instruction. The results of those function calls can be queried with the OpTaskSequenceGetINTEL instruction.

Task Sequence and Task Threads

A task sequence object can be created by calling OpTaskSequenceCreateINTEL. The OpTaskSequenceAsyncINTEL, OpTaskSequenceGetINTEL, and OpTaskSequenceReleaseINTEL instructions take a task sequence object as an argument. The OpTaskSequenceAsyncINTEL instruction creates a invocation which will be referred to as a task thread in this document. This task thread is said to belong to the task sequence specified to the OpTaskSequenceAsyncINTEL instruction. The OpTaskSequenceGetINTEL instruction returns the result of a task thread in the specified task sequence. Results are returned from the task sequence in the same order as the OpTaskSequenceAsyncINTEL calls are made to the task sequence.

An OpFunction f is passed as an argument to the OpTaskSequenceCreateINTEL instruction. The task threads belonging to a task sequence asynchronously execute f and they may run in parallel with the caller and with any other task threads. The implementation is not required to run these task threads in parallel except in so far as is necessary to meet the forward progress guarantees outlined in the section below.

Forward Progress Guarantees and Execution Model

A task thread is a new Invocation which has a LocalInvocationId, GlobalInvocationId, and WorkgroupId of 0, WorkgroupSize and GlobalSize of 1 and LocalSize of 1, 1, 1. It does not share Workgroup storage class memory or Function storage class memory with the caller or with other task threads. It can access memory from CrossWorkgroup storage class. A task thread cannot synchronize with the caller or with other task threads using a barrier.

Calling OpTaskSequenceAsyncINTEL is analogous to enqueuing an OpenCL kernel with global_work_offset, global_work_size, local_work_size set to 0, 1, 1, i.e., a task kernel. This extension does not support any instruction which would be analogous to enqueuing a kernel with a different geometry.

An OpTaskSequenceAsyncINTEL call is guaranteed to not block the caller as long as the number of task threads in the task sequence is strictly less than the AsyncCapacity of the sequence.

A task thread executes f and then writes its completion status and results to an output data structure D associated with the sequence. The task thread can only write into D if there is space available in it and the task thread ceases to exist after writing its results. The implementation must ensure that at least GetCapacity task threads can store their outputs to D. Results are removed from D when they are retrieved by OpTaskSequenceGetINTEL calls. An OpTaskSequenceGetINTEL call is guaranteed to block the caller if there are no results stored in D.

C++ defines a framework for describing the forward progress of individual thread of execution in a multi-threaded program. Here are the terms and definitions from the C++ specification that we will use to define progress guarantees for task threads:

  1. Weakly parallel forward progress guarantee: the implementation does not ensure that the thread will eventually make progress.

  2. Concurrent forward progress guarantee: the implementation ensures that the thread will eventually make progress for as long as it has not terminated.

  3. Blocking with forward progress guarantee delegation: When a thread of execution A is specified to block with forward progress guarantee delegation on the completion of a set M of threads of execution, then throughout the whole time of A being blocked on M, the implementation shall ensure that the forward progress guarantees provided by at least one thread of execution in M is at least as strong as A's forward progress guarantees. It is unspecified which thread or threads of execution in M are chosen and for which number of execution steps. The strengthening is not necessarily in place for the rest of the lifetime of the affected thread of execution. Using the above definitions, the progress guarantees for task threads are defined as follows:

    • When a task sequence object O is created by OpTaskSequenceCreateINTEL, a task sequence object thread is also created.

    • At any point in time, the progress guarantee of all task sequence object threads created by a work item WI matches that of WI. For example, if WI is strengthened to have a stronger progress guarantee than its initial guarantee, all of the task sequence object threads created by WI are also strengthened.

    • A call to OpTaskSequenceAsyncINTEL(O, …​) will result in creation of a task thread. OpTaskSequenceAsyncINTEL(O, …​) can be called multiple times to create multiple task threads for O. A task thread has weakly parallel forward progress guarantee.

    • Upon creation, a task sequence object thread P immediately blocks on the set S of task threads that belong to O with forward progress guarantee delegation.

    • If a task thread with concurrent forward progress guarantee has finished executing f and if it can write its results to the output data structure D, then it does so and some other task thread in S is strengthened to have concurrent forward progress guarantee. If a task thread cannot write its results to D, the task thread blocks until space is available.

The two examples below, respectively, show the following:

  1. How strengthening of a work item strengthens the task threads.

  2. How a task thread delegates its progress guarantee to other task threads in the same task sequence object.

Example 1 uses the following pseudo-code program:

// A work item WI
{
  ...
  TaskSeqObject1 = OpTaskSequenceCreateINTEL(SomeFunction, ...); // Object_1_Thread
  OpTaskSequenceAsyncINTEL(TaskSeqObject1, ...); // Task_1_1
  OpTaskSequenceAsyncINTEL(TaskSeqObject1, ...); // Task_1_2
  ...
  TaskSeqObject2 = OpTaskSequenceCreateINTEL(SomeFunction, ...); // Object_2_Thread
  OpTaskSequenceAsyncINTEL(TaskSeqObject2, ...); // Task_2_1
  OpTaskSequenceAsyncINTEL(TaskSeqObject2, ...); // Task_2_2
}

The OpTaskSequenceCreateINTEL calls create task object threads Object_1_Thread and Object_2_Thread. The first two OpTaskSequenceAsyncINTEL calls create task threads Task_1_1 and Task_1_2. Similarly the next two calls create Task_2_1 and Task_2_2.

The table below provides a view of the hierarchy of task threads that will be generated.

Table 1. Hierarchy of task threads.

Work Item

WI

Task Sequence Object Thread

Object_1_Thread

Object_2_Thread

Task Thread

Task_1_1

Task_1_2

Task_2_1

Task_2_2

At some initial stage, all task threads have weakly parallel forward progress guarantee. If WI is strengthened to have concurrent forward progress guarantee, then all of the object threads are also strengthened. Next, in this example one task thread for each task sequence is also strengthened. This is depicted in the table below (progress guarantee for each thread is in parenthesis):

Table 2. Possible Progress Guarantees at some time after WI is strengthened.

Work Item

WI (concurrently parallel)

Task Sequence Object Thread

Object_1_Thread (concurrent)

Object_2_Thread (concurrent)

Task Thread

Task_1_1 (weakly parallel)

Task_1_2 (concurrent)

Task_2_1 (concurrent)

Task_2_2 (weakly parallel)

The next example shows how a task thread delegates its progress guarantee to another task thread:

Assume that we have a task sequence TS with GetCapacity of 1 and AsyncCapacity of 5. Four OpTaskSequenceAsyncINTEL calls create the following task threads: T1, T2, T3 and T4, for TS. T1 has concurrent forward progress guarantee after getting strengthened, while T2, T3 and T4 have weakly parallel forward progress guarantees. The task threads go through the following execution flow:

  • T1 finishes executing the function f associated with TS.

  • For TS, the output data structure D can store the output of only one task thread since GetCapacity is one. T1 writes its output.

  • Any task thread can now be picked to be strengthened to have concurrent forward progress guarantee. Let’s say T2 is picked.

  • At some point T2 finishes executing f. T1's results are still in the output data structure.

  • T2 cannot write its results until space is available in D. Hence , none of the other task threads can be picked to be strengthened to the stronger progress guarantee.

  • OpTaskSequenceGetINTEL is invoked. T1's results get removed from D.

  • T2 can write its results and some other task thread can be picked to be strengthened.

Memory Order Semantics

  • OpTaskSequenceAsyncINTEL is a Release operation scoped to include the work item that called it and the task thread that the OpTaskSequenceAsyncINTEL call creates.

  • The beginning of a task thread T is an Acquire operation scoped to include the work item that called OpTaskSequenceAsyncINTEL to create T and the task thread T.

  • The end of a task thread T is a Release operation scoped to include T and the work item that called OpTaskSequenceAsyncINTEL to create T.

  • OpTaskSequenceGetINTEL is an Acquire operation scoped to include the task thread that is being retrieved by OpTaskSequenceGetINTEL and the work item that is calling OpTaskSequenceGetINTEL.

Extension Name

To use this extension within a SPIR-V module, the following OpExtension must be present in the module:

OpExtension "SPV_INTEL_task_sequence"

New Capabilities

This extension introduces a new capability:

TaskSequenceINTEL

New Instructions

Instructions added under the TaskSequenceINTEL capability:

OpTaskSequenceCreateINTEL
OpTaskSequenceAsyncINTEL
OpTaskSequenceGetINTEL
OpTaskSequenceReleaseINTEL

Token Number Assignments

TaskSequenceINTEL

6162

OpTaskSequenceCreateINTEL

6163

OpTaskSequenceAsyncINTEL

6164

OpTaskSequenceGetINTEL

6165

OpTaskSequenceReleaseINTEL

6166

OpTypeTaskSequenceINTEL

6199

Modifications to the SPIR-V Specification, Version 1.6, Revision 2

Capability

Modify Section 3.31, Capability, adding a row to the Capability table:

Capability Implicitly Declares

6162

TaskSequenceINTEL

Type Declaration Instruction

Add a new subsection, 3.42.26, Task Sequence Type Declaration Instruction, and add one new instruction in this subsection as follows:

OpTypeTaskSequenceINTEL

Declare a task sequence type.

Capability: TaskSequenceINTEL

2

6199

Result
<id>

Instructions

Add a new subsection, 3.42.27, Task Sequence Instructions, and add four new instructions in this subsection as follows:

OpTaskSequenceCreateINTEL

Create and return an instance of a task sequence with type OpTypeTaskSequenceINTEL. All calls to OpTaskSequenceAsyncINTEL with Result passed in as an argument will execute the function Function.

Result Type must be OpTypeTaskSequenceINTEL.

Function is an OpFunction.

Pipelined is a literal 32-bit signed integer and it represents the following based on the value:

0 - Do not pipeline the task sequence data path.

N - (N > 0), Pipeline the data path such that a new invocation of the task sequence can be launched every N cycles (also known as the Initiation Interval).

-1 - Pipeline the task sequence with a compiler determined Initiation Interval.

This argument is only meaningful on FPGA devices.

ClusterMode is a literal 32-bit signed integer and it is a request for the method that statically-scheduled clusters should handle stalls: using an exit FIFO to drain computations from the cluster or using a stall-enable signal to freeze computations within the cluster.

The valid values are:

0 - Direct the compiler to use stall-free clusters.

1 - Direct the compiler to use stall-enable clusters.

-1 - Let the compiler decide which type of cluster to use.

This argument is only meaningful on FPGA devices.

GetCapacity is a literal 32-bit unsigned integer. A task thread that has finished executing Function is guaranteed to write its results to the results data structure of the task sequence as long as there is space to do so. The implementation must ensure that at least the oldest GetCapacity task threads can write their results and completion status. Only task threads that have written their results are counted against this limit.

AsyncCapacity is a literal 32-bit unsigned integer. OpTaskSequenceAsyncINTEL calls for Result are guaranteed to not block as long as the number of task threads in Result are strictly less than this limit.

Capability: TaskSequenceINTEL

8

6163

<id>
Result Type

Result
<id>

<id>
Function

Literal
Pipelined

Literal
UseStallEnableClusters

Literal
GetCapacity

Literal
AsyncCapacity

OpTaskSequenceAsyncINTEL

Asynchronously invoke the OpFunction f associated with the task sequence Sequence.

Sequence must have type OpTypeTaskSequenceINTEL.

This instruction is guaranteed to not block as long as the number of task threads in Sequence are strictly less than the AsyncCapacity of Sequence. The call may return before the asynchronous call to f completes execution, and potentially before f even begins executing.

Argument N is the object to pass as the N th parameter of the function f. If f cannot be called with N arguments the behavior is undefined.

Capability: TaskSequenceINTEL

2+variable

6164

<id>
Sequence

<id>, <id>, …​
Argument 0,
Argument 1,
…​

OpTaskSequenceGetINTEL
Retrieve the result of a task thread in the task sequence Sequence. If there are multiple task threads, the results are retrieved in the same order in which the threads were created. Sequence must have type OpTypeTaskSequenceINTEL. This instruction will block if there are no results to return. Result Type is the same as the return type of the OpFunction associated with Sequence.

Capability: TaskSequenceINTEL

4

6165

<id>
Result Type

Result
<id>

<id>
Sequence

OpTaskSequenceReleaseINTEL
Release the memory allocated for the task sequence uniquely identified by the id Sequence. Sequence must have type OpTypeTaskSequenceINTEL.

Capability: TaskSequenceINTEL

2

6166

<id>
Sequence

SPIR-V Representation in LLVM IR

This is a non-normative section. OpTypeTaskSequenceINTEL can be mapped to LLVM opaque type spirv.TaskSequenceINTEL and mangled as __spirv_TaskSequenceINTEL__.

Issues

None.

Revision History

Rev Date Author Changes

1

2023-03-06

Abhishek Tiwari

Initial public release