Overview

This document specifies SPIR-V extended instructions that correspond to TOSA operators for use within SPIR-V graphs. The focus of this document is to specify how TOSA operators are represented by SPIR-V extended instructions. For the full details of the TOSA operators, please consult the TOSA specification.

This instruction set can be imported as follows:

%tosa = OpExtInstImport "TOSA.001000.1"

The name of the instruction set follows this convention:

TOSA.<tosa-version-major><tosa-version-minor>.<instruction-set-revision>

where:

  • <tosa-version-major> is the major version of TOSA that the instruction set uses, encoded using 3 decimal digits

  • <tosa-version-minor> is the minor version of TOSA that the instruction set uses, encoded using 3 decimal digits

  • <instruction-set-revision> is the revision of the instruction set.

Operator mapping

Each TOSA operator maps to a single extended instruction in this instruction set.

Argument mapping

TOSA operators have arguments that fall into three categories:

  • Attribute arguments map to operands to the SPIR-V instructions. Operands that correspond to attribute arguments always come first and must come from constant instructions.

  • Input arguments map to operands to the SPIR-V instructions. Operands that correspond to input arguments come after operands that correspond to attribute arguments.

  • Output arguments map to the return value of the SPIR-V instructions. Since SPIR-V instructions can only return one value, the return type of SPIR-V instructions for TOSA operators that have multiple output arguments is an OpTypeStruct that groups all the outputs.

Type mapping

TOSA types are mapped to SPIR-V as follows:

  • bool_t is mapped to OpTypeBool

  • i8_t is mapped to OpTypeInt 8 0

  • i16_t is mapped to OpTypeInt 16 0

  • i32_t is mapped to OpTypeInt 32 0

  • i48_t is mapped to OpTypeInt 64 0 (this is not a typo)

  • fp16_t is mapped to OpTypeFloat 16

  • fp32_t is mapped to OpTypeFloat 32

  • bf16_t is mapped to OpTypeFloat 16 BFloat16KHR

  • shape_t is mapped to rank-1 tensors of 32-bit integers

  • fp8e4m3_t is mapped to OpTypeFloat 8 Float8E4M3EXT

  • fp8e5m2_t is mapped to OpTypeFloat 8 Float8E5M2EXT

Tensor types are mapped to OpTypeTensorARM.

Tensor lists use OpTypeStruct to be able to accommodate heterogeneous tensor types.

Profiles and extensions

TOSA profiles are independent sets of operations and data type combinations. This instruction set supports the following TOSA profiles:

  • PRO-INT

  • PRO-FP

TOSA profile extensions define additional operations and data type combinations. This instruction set supports the following TOSA profile extensions:

  • EXT-INT16

  • EXT-BF16

  • EXT-FP8E4M3

  • EXT-FP8E5M2

  • EXT-FFT

  • EXT-DOUBLEROUND

  • EXT-INEXACTROUND

Type combinations

Most TOSA operators only support specific type combinations. This specification documents the valid type combinations for each instruction under a "Type support" section. These sections begin with a list of type definitions that are referred to in the tables that list all valid type combinations. Specific type combinations are enabled by one or more TOSA profiles or extensions.

Operators

Tensor operators

ARGMAX

ARGMAX

axis: Axis in range from 0 to rank(shape1) - 1
axis must come from a constant instruction of the following type:

OpTypeInt 32 0

nan_mode: PROPAGATE or IGNORE. Set to PROPAGATE by default. This attribute affects the floating-point NaN propagation approach. This attribute is ignored by non floating-point types.
nan_mode must come from a constant instruction of the following type:

OpTypeInt 32 0

See nan_propagation_mode_t for valid values.

input: Input tensor
input shape: shape1
input must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor, with rank = rank(shape1) - 1

OpTypeTensorARM %etype [rank]

8

12

<id>
Result Type

Result <id>

Extended instructions set <id>

0

<id>
axis

<id>
nan_mode

<id>
input

Type support

%int8 = OpTypeInt 8 0
%int32 = OpTypeInt 32 0
%int16 = OpTypeInt 16 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_fp8e4m3_r[rank] = OpTypeTensorARM %fp8e4m3 [rank]
%ts_fp8e5m2_r[rank] = OpTypeTensorARM %fp8e5m2 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input

output

PRO-INT

%ts_int8_r[rank]

%ts_int32_r[rank]

EXT-INT16

%ts_int16_r[rank]

%ts_int32_r[rank]

EXT-FP8E4M3

%ts_fp8e4m3_r[rank]

%ts_int32_r[rank]

EXT-FP8E5M2

%ts_fp8e5m2_r[rank]

%ts_int32_r[rank]

PRO-FP

%ts_float16_r[rank]

%ts_int32_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_int32_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_int32_r[rank]

AVG_POOL2D

AVG_POOL2D

kernel: [kernel_y, kernel_x]
kernel shape: [2]
kernel must come from a constant instruction of the following type:

%etype = OpTypeInt 32 0
%shape = OpConstantComposite %shape_array_type %uint_2
OpTypeTensorARM %etype %uint_1 %shape

stride: [stride_y, stride_x]
stride shape: [2]
stride must come from a constant instruction of the following type:

%etype = OpTypeInt 32 0
%shape = OpConstantComposite %shape_array_type %uint_2
OpTypeTensorARM %etype %uint_1 %shape

pad: [pad_top, pad_bottom, pad_left, pad_right]
pad shape: [4]
pad must come from a constant instruction of the following type:

%etype = OpTypeInt 32 0
%shape = OpConstantComposite %shape_array_type %uint_4
OpTypeTensorARM %etype %uint_1 %shape

acc_type: Enumerated type, must be one of INT32, FP16, FP32 matching the type of acc_t in the Supported Data Types table for this operation
acc_type must come from a constant instruction of the following type:

OpTypeInt 32 0

See acc_type_t for valid values.

input: Input tensor
input shape: [N,IH,IW,C]
input must come from an instruction of the following type:

OpTypeTensorARM %etype 4

input_zp: Input tensor zero point. Must be zero for non-int8 types.
input_zp shape: [1]
input_zp must come from a constant instruction of the following type:

OpTypeTensor %etype [rank] [shape]

output_zp: Output tensor zero point. Must be zero for non-int8 types.
output_zp shape: [1]
output_zp must come from a constant instruction of the following type:

OpTypeTensor %etype [rank] [shape]

output: Output tensor 4D

OpTypeTensorARM %etype 4

12

12

<id>
Result Type

Result <id>

Extended instructions set <id>

1

<id>
kernel

<id>
stride

<id>
pad

<id>
acc_type

<id>
input

<id>
input_zp

<id>
output_zp

Type support

%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int8_r4 = OpTypeTensorARM %int8 4
%ts_int8_s1 = OpTypeTensor %int8 [rank] [shape]
%ts_int16_r4 = OpTypeTensorARM %int16 4
%ts_int16_s1 = OpTypeTensor %int16 [rank] [shape]
%ts_fp8e4m3_r4 = OpTypeTensorARM %fp8e4m3 4
%ts_fp8e4m3_s1 = OpTypeTensor %fp8e4m3 [rank] [shape]
%ts_fp8e5m2_r4 = OpTypeTensorARM %fp8e5m2 4
%ts_fp8e5m2_s1 = OpTypeTensor %fp8e5m2 [rank] [shape]
%ts_float16_r4 = OpTypeTensorARM %float16 4
%ts_float16_s1 = OpTypeTensor %float16 [rank] [shape]
%ts_bfloat16_r4 = OpTypeTensorARM %bfloat16 4
%ts_bfloat16_s1 = OpTypeTensor %bfloat16 [rank] [shape]
%ts_float32_r4 = OpTypeTensorARM %float32 4
%ts_float32_s1 = OpTypeTensor %float32 [rank] [shape]

Enabling profile/extension

input

input_zp

output_zp

output

PRO-INT

%ts_int8_r4

%ts_int8_s1

%ts_int8_s1

%ts_int8_r4

EXT-INT16

%ts_int16_r4

%ts_int16_s1

%ts_int16_s1

%ts_int16_r4

EXT-FP8E4M3

%ts_fp8e4m3_r4

%ts_fp8e4m3_s1

%ts_fp8e4m3_s1

%ts_fp8e4m3_r4

EXT-FP8E5M2

%ts_fp8e5m2_r4

%ts_fp8e5m2_s1

%ts_fp8e5m2_s1

%ts_fp8e5m2_r4

PRO-FP

%ts_float16_r4

%ts_float16_s1

%ts_float16_s1

%ts_float16_r4

PRO-FP

%ts_float16_r4

%ts_float16_s1

%ts_float16_s1

%ts_float16_r4

EXT-BF16

%ts_bfloat16_r4

%ts_bfloat16_s1

%ts_bfloat16_s1

%ts_bfloat16_r4

PRO-FP

%ts_float32_r4

%ts_float32_s1

%ts_float32_s1

%ts_float32_r4

CONV2D

CONV2D

pad: [pad_top, pad_bottom, pad_left, pad_right]
pad shape: [4]
pad must come from a constant instruction of the following type:

%etype = OpTypeInt 32 0
%shape = OpConstantComposite %shape_array_type %uint_4
OpTypeTensorARM %etype %uint_1 %shape

stride: [stride_y, stride_x]
stride shape: [2]
stride must come from a constant instruction of the following type:

%etype = OpTypeInt 32 0
%shape = OpConstantComposite %shape_array_type %uint_2
OpTypeTensorARM %etype %uint_1 %shape

dilation: [dilation_y, dilation_x]
dilation shape: [2]
dilation must come from a constant instruction of the following type:

%etype = OpTypeInt 32 0
%shape = OpConstantComposite %shape_array_type %uint_2
OpTypeTensorARM %etype %uint_1 %shape

acc_type: Enumerated type, must be one of INT32, INT48, FP16, FP32 matching the type of acc_t in the Supported Data Types table for this operation
acc_type must come from a constant instruction of the following type:

OpTypeInt 32 0

See acc_type_t for valid values.

local_bound: This optional attribute affects the floating-point compliance error bound. The default of false allows for direct and transform based, fast convolution algorithms. Only set to true if direct dot-product calculation precision is required.
local_bound must come from a constant instruction of the following type:

OpTypeBool

input: Input tensor
input shape: [N,IH,IW,IC]
input must come from an instruction of the following type:

OpTypeTensorARM %etype 4

weight: Weight kernel size KH x KW
weight shape: [OC,KH,KW,IC]
weight must come from an instruction of the following type:

OpTypeTensorARM %etype 4

bias: Per output channel bias data.
Bias data will be broadcast if BC == 1.
bias shape: [BC]
bias must come from an instruction of the following type:

OpTypeTensorARM %etype 1

input_zp: Input tensor zero point. Must be zero for non-int8 types.
input_zp shape: [1]
input_zp must come from a constant instruction of the following type:

OpTypeTensor %etype [rank] [shape]

weight_zp: Weight zero point. Must be zero for non-int8 types.
weight_zp shape: [1]
weight_zp must come from a constant instruction of the following type:

OpTypeTensor %etype [rank] [shape]

output: Output tensor

OpTypeTensorARM %etype 4

15

12

<id>
Result Type

Result <id>

Extended instructions set <id>

2

<id>
pad

<id>
stride

<id>
dilation

<id>
acc_type

<id>
local_bound

<id>
input

<id>
weight

<id>
bias

<id>
input_zp

<id>
weight_zp

Type support

%int8 = OpTypeInt 8 0
%int32 = OpTypeInt 32 0
%int16 = OpTypeInt 16 0
%int64 = OpTypeInt 64 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%float16 = OpTypeFloat 16
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int8_r4 = OpTypeTensorARM %int8 4
%ts_int32_r1 = OpTypeTensorARM %int32 1
%ts_int8_s1 = OpTypeTensor %int8 [rank] [shape]
%ts_int32_r4 = OpTypeTensorARM %int32 4
%ts_int16_r4 = OpTypeTensorARM %int16 4
%ts_int64_r1 = OpTypeTensorARM %int64 1
%ts_int16_s1 = OpTypeTensor %int16 [rank] [shape]
%ts_int64_r4 = OpTypeTensorARM %int64 4
%ts_fp8e4m3_r4 = OpTypeTensorARM %fp8e4m3 4
%ts_float16_r1 = OpTypeTensorARM %float16 1
%ts_fp8e4m3_s1 = OpTypeTensor %fp8e4m3 [rank] [shape]
%ts_float16_r4 = OpTypeTensorARM %float16 4
%ts_fp8e5m2_r4 = OpTypeTensorARM %fp8e5m2 4
%ts_fp8e5m2_s1 = OpTypeTensor %fp8e5m2 [rank] [shape]
%ts_float16_s1 = OpTypeTensor %float16 [rank] [shape]
%ts_bfloat16_r4 = OpTypeTensorARM %bfloat16 4
%ts_bfloat16_r1 = OpTypeTensorARM %bfloat16 1
%ts_bfloat16_s1 = OpTypeTensor %bfloat16 [rank] [shape]
%ts_float32_r4 = OpTypeTensorARM %float32 4
%ts_float32_r1 = OpTypeTensorARM %float32 1
%ts_float32_s1 = OpTypeTensor %float32 [rank] [shape]

Enabling profile/extension

input

weight

bias

input_zp

weight_zp

output

PRO-INT

%ts_int8_r4

%ts_int8_r4

%ts_int32_r1

%ts_int8_s1

%ts_int8_s1

%ts_int32_r4

EXT-INT16

%ts_int16_r4

%ts_int8_r4

%ts_int64_r1

%ts_int16_s1

%ts_int8_s1

%ts_int64_r4

EXT-FP8E4M3

%ts_fp8e4m3_r4

%ts_fp8e4m3_r4

%ts_float16_r1

%ts_fp8e4m3_s1

%ts_fp8e4m3_s1

%ts_float16_r4

EXT-FP8E5M2

%ts_fp8e5m2_r4

%ts_fp8e5m2_r4

%ts_float16_r1

%ts_fp8e5m2_s1

%ts_fp8e5m2_s1

%ts_float16_r4

PRO-FP

%ts_float16_r4

%ts_float16_r4

%ts_float16_r1

%ts_float16_s1

%ts_float16_s1

%ts_float16_r4

PRO-FP

%ts_float16_r4

%ts_float16_r4

%ts_float16_r1

%ts_float16_s1

%ts_float16_s1

%ts_float16_r4

EXT-BF16

%ts_bfloat16_r4

%ts_bfloat16_r4

%ts_bfloat16_r1

%ts_bfloat16_s1

%ts_bfloat16_s1

%ts_bfloat16_r4

PRO-FP

%ts_float32_r4

%ts_float32_r4

%ts_float32_r1

%ts_float32_s1

%ts_float32_s1

%ts_float32_r4

CONV3D

CONV3D

pad: [pad_d0, pad_d1, pad_top, pad_bottom, pad_left, pad_right]
pad shape: [6]
pad must come from a constant instruction of the following type:

%etype = OpTypeInt 32 0
%shape = OpConstantComposite %shape_array_type %uint_6
OpTypeTensorARM %etype %uint_1 %shape

stride: [stride_d, stride_y, stride_x]
stride shape: [3]
stride must come from a constant instruction of the following type:

%etype = OpTypeInt 32 0
%shape = OpConstantComposite %shape_array_type %uint_3
OpTypeTensorARM %etype %uint_1 %shape

dilation: [dilation_d, dilation_y, dilation_x]
dilation shape: [3]
dilation must come from a constant instruction of the following type:

%etype = OpTypeInt 32 0
%shape = OpConstantComposite %shape_array_type %uint_3
OpTypeTensorARM %etype %uint_1 %shape

acc_type: Enumerated type, must be one of INT32, INT48, FP16, FP32 matching the type of acc_t in the Supported Data Types table for this operation
acc_type must come from a constant instruction of the following type:

OpTypeInt 32 0

See acc_type_t for valid values.

local_bound: This optional attribute affects the floating-point compliance error bound. The default of false allows for direct and transform based, fast convolution algorithms. Only set to true if direct dot-product calculation precision is required.
local_bound must come from a constant instruction of the following type:

OpTypeBool

input: Input tensor
input shape: [N,ID,IH,IW,IC]
input must come from an instruction of the following type:

OpTypeTensorARM %etype 5

weight: Weight kernel size KDxKHxKW
weight shape: [OC,KD,KH,KW,IC]
weight must come from an instruction of the following type:

OpTypeTensorARM %etype 5

bias: Per output channel bias data.
Bias data will be broadcast if BC == 1.
bias shape: [BC]
bias must come from an instruction of the following type:

OpTypeTensorARM %etype 1

input_zp: Input tensor zero point. Must be zero for non-int8 types.
input_zp shape: [1]
input_zp must come from a constant instruction of the following type:

OpTypeTensor %etype [rank] [shape]

weight_zp: Weight zero point. Must be zero for non-int8 types.
weight_zp shape: [1]
weight_zp must come from a constant instruction of the following type:

OpTypeTensor %etype [rank] [shape]

output: Output tensor

OpTypeTensorARM %etype 5

15

12

<id>
Result Type

Result <id>

Extended instructions set <id>

3

<id>
pad

<id>
stride

<id>
dilation

<id>
acc_type

<id>
local_bound

<id>
input

<id>
weight

<id>
bias

<id>
input_zp

<id>
weight_zp

Type support

%int8 = OpTypeInt 8 0
%int32 = OpTypeInt 32 0
%int16 = OpTypeInt 16 0
%int64 = OpTypeInt 64 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%float16 = OpTypeFloat 16
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int8_r5 = OpTypeTensorARM %int8 5
%ts_int32_r1 = OpTypeTensorARM %int32 1
%ts_int8_s1 = OpTypeTensor %int8 [rank] [shape]
%ts_int32_r5 = OpTypeTensorARM %int32 5
%ts_int16_r5 = OpTypeTensorARM %int16 5
%ts_int64_r1 = OpTypeTensorARM %int64 1
%ts_int16_s1 = OpTypeTensor %int16 [rank] [shape]
%ts_int64_r5 = OpTypeTensorARM %int64 5
%ts_fp8e4m3_r5 = OpTypeTensorARM %fp8e4m3 5
%ts_float16_r1 = OpTypeTensorARM %float16 1
%ts_fp8e4m3_s1 = OpTypeTensor %fp8e4m3 [rank] [shape]
%ts_float16_r5 = OpTypeTensorARM %float16 5
%ts_fp8e5m2_r5 = OpTypeTensorARM %fp8e5m2 5
%ts_fp8e5m2_s1 = OpTypeTensor %fp8e5m2 [rank] [shape]
%ts_float16_s1 = OpTypeTensor %float16 [rank] [shape]
%ts_bfloat16_r5 = OpTypeTensorARM %bfloat16 5
%ts_bfloat16_r1 = OpTypeTensorARM %bfloat16 1
%ts_bfloat16_s1 = OpTypeTensor %bfloat16 [rank] [shape]
%ts_float32_r5 = OpTypeTensorARM %float32 5
%ts_float32_r1 = OpTypeTensorARM %float32 1
%ts_float32_s1 = OpTypeTensor %float32 [rank] [shape]

Enabling profile/extension

input

weight

bias

input_zp

weight_zp

output

PRO-INT

%ts_int8_r5

%ts_int8_r5

%ts_int32_r1

%ts_int8_s1

%ts_int8_s1

%ts_int32_r5

EXT-INT16

%ts_int16_r5

%ts_int8_r5

%ts_int64_r1

%ts_int16_s1

%ts_int8_s1

%ts_int64_r5

EXT-FP8E4M3

%ts_fp8e4m3_r5

%ts_fp8e4m3_r5

%ts_float16_r1

%ts_fp8e4m3_s1

%ts_fp8e4m3_s1

%ts_float16_r5

EXT-FP8E5M2

%ts_fp8e5m2_r5

%ts_fp8e5m2_r5

%ts_float16_r1

%ts_fp8e5m2_s1

%ts_fp8e5m2_s1

%ts_float16_r5

PRO-FP

%ts_float16_r5

%ts_float16_r5

%ts_float16_r1

%ts_float16_s1

%ts_float16_s1

%ts_float16_r5

PRO-FP

%ts_float16_r5

%ts_float16_r5

%ts_float16_r1

%ts_float16_s1

%ts_float16_s1

%ts_float16_r5

EXT-BF16

%ts_bfloat16_r5

%ts_bfloat16_r5

%ts_bfloat16_r1

%ts_bfloat16_s1

%ts_bfloat16_s1

%ts_bfloat16_r5

PRO-FP

%ts_float32_r5

%ts_float32_r5

%ts_float32_r1

%ts_float32_s1

%ts_float32_s1

%ts_float32_r5

DEPTHWISE_CONV2D

DEPTHWISE_CONV2D

pad: [pad_top, pad_bottom, pad_left, pad_right]
pad shape: [4]
pad must come from a constant instruction of the following type:

%etype = OpTypeInt 32 0
%shape = OpConstantComposite %shape_array_type %uint_4
OpTypeTensorARM %etype %uint_1 %shape

stride: [stride_y, stride_x]
stride shape: [2]
stride must come from a constant instruction of the following type:

%etype = OpTypeInt 32 0
%shape = OpConstantComposite %shape_array_type %uint_2
OpTypeTensorARM %etype %uint_1 %shape

dilation: [dilation_y, dilation_x]
dilation shape: [2]
dilation must come from a constant instruction of the following type:

%etype = OpTypeInt 32 0
%shape = OpConstantComposite %shape_array_type %uint_2
OpTypeTensorARM %etype %uint_1 %shape

acc_type: Enumerated type, must be one of INT32, INT48, FP16, FP32 matching the type of acc_t in the Supported Data Types table for this operation
acc_type must come from a constant instruction of the following type:

OpTypeInt 32 0

See acc_type_t for valid values.

local_bound: This optional attribute affects the floating-point compliance error bound. The default of false allows for direct and transform based, fast convolution algorithms. Only set to true if direct dot-product calculation precision is required.
local_bound must come from a constant instruction of the following type:

OpTypeBool

input: Input tensor
input shape: [N,IH,IW,C]
input must come from an instruction of the following type:

OpTypeTensorARM %etype 4

weight: Weight kernel size KH x KW
weight shape: [KH,KW,C,M]
weight must come from an instruction of the following type:

OpTypeTensorARM %etype 4

bias: Per output channel bias data.
Bias data will be broadcast if BC == 1.
bias shape: [BC]
bias must come from an instruction of the following type:

OpTypeTensorARM %etype 1

input_zp: Input tensor zero point. Must be zero for non-int8 types.
input_zp shape: [1]
input_zp must come from a constant instruction of the following type:

OpTypeTensor %etype [rank] [shape]

weight_zp: Weight zero point. Must be zero for non-int8 types.
weight_zp shape: [1]
weight_zp must come from a constant instruction of the following type:

OpTypeTensor %etype [rank] [shape]

output: Output tensor

OpTypeTensorARM %etype [rank]

15

12

<id>
Result Type

Result <id>

Extended instructions set <id>

4

<id>
pad

<id>
stride

<id>
dilation

<id>
acc_type

<id>
local_bound

<id>
input

<id>
weight

<id>
bias

<id>
input_zp

<id>
weight_zp

Type support

%int8 = OpTypeInt 8 0
%int32 = OpTypeInt 32 0
%int16 = OpTypeInt 16 0
%int64 = OpTypeInt 64 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%float16 = OpTypeFloat 16
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int8_r4 = OpTypeTensorARM %int8 4
%ts_int32_r1 = OpTypeTensorARM %int32 1
%ts_int8_s1 = OpTypeTensor %int8 [rank] [shape]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_int16_r4 = OpTypeTensorARM %int16 4
%ts_int64_r1 = OpTypeTensorARM %int64 1
%ts_int16_s1 = OpTypeTensor %int16 [rank] [shape]
%ts_int64_r[rank] = OpTypeTensorARM %int64 [rank]
%ts_fp8e4m3_r4 = OpTypeTensorARM %fp8e4m3 4
%ts_float16_r1 = OpTypeTensorARM %float16 1
%ts_fp8e4m3_s1 = OpTypeTensor %fp8e4m3 [rank] [shape]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_fp8e5m2_r4 = OpTypeTensorARM %fp8e5m2 4
%ts_fp8e5m2_s1 = OpTypeTensor %fp8e5m2 [rank] [shape]
%ts_float16_r4 = OpTypeTensorARM %float16 4
%ts_float16_s1 = OpTypeTensor %float16 [rank] [shape]
%ts_bfloat16_r4 = OpTypeTensorARM %bfloat16 4
%ts_bfloat16_r1 = OpTypeTensorARM %bfloat16 1
%ts_bfloat16_s1 = OpTypeTensor %bfloat16 [rank] [shape]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r4 = OpTypeTensorARM %float32 4
%ts_float32_r1 = OpTypeTensorARM %float32 1
%ts_float32_s1 = OpTypeTensor %float32 [rank] [shape]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input

weight

bias

input_zp

weight_zp

output

PRO-INT

%ts_int8_r4

%ts_int8_r4

%ts_int32_r1

%ts_int8_s1

%ts_int8_s1

%ts_int32_r[rank]

EXT-INT16

%ts_int16_r4

%ts_int8_r4

%ts_int64_r1

%ts_int16_s1

%ts_int8_s1

%ts_int64_r[rank]

EXT-FP8E4M3

%ts_fp8e4m3_r4

%ts_fp8e4m3_r4

%ts_float16_r1

%ts_fp8e4m3_s1

%ts_fp8e4m3_s1

%ts_float16_r[rank]

EXT-FP8E5M2

%ts_fp8e5m2_r4

%ts_fp8e5m2_r4

%ts_float16_r1

%ts_fp8e5m2_s1

%ts_fp8e5m2_s1

%ts_float16_r[rank]

PRO-FP

%ts_float16_r4

%ts_float16_r4

%ts_float16_r1

%ts_float16_s1

%ts_float16_s1

%ts_float16_r[rank]

PRO-FP

%ts_float16_r4

%ts_float16_r4

%ts_float16_r1

%ts_float16_s1

%ts_float16_s1

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r4

%ts_bfloat16_r4

%ts_bfloat16_r1

%ts_bfloat16_s1

%ts_bfloat16_s1

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r4

%ts_float32_r4

%ts_float32_r1

%ts_float32_s1

%ts_float32_s1

%ts_float32_r[rank]

FFT2D

FFT2D

inverse: false for forward FFT, true for inverse FFT
inverse must come from a constant instruction of the following type:

OpTypeBool

local_bound: This optional attribute affects the floating-point compliance error bound. The default of false allows for direct and transform based, fast convolution algorithms. Only set to true if direct dot-product calculation precision is required.
local_bound must come from a constant instruction of the following type:

OpTypeBool

input_real: Real part of the complex input. H,W must be powers of two.
input_real shape: [N,H,W]
input_real must come from an instruction of the following type:

OpTypeTensorARM %etype 3

input_imag: Imaginary part of the complex input. H,W must be powers of two.
input_imag shape: [N,H,W]
input_imag must come from an instruction of the following type:

OpTypeTensorARM %etype 3

Output (composite)
output_real: Real part of the complex output.
output_imag: Imaginary part of the complex output.

%tensor_[element_type]_r3 = OpTypeTensorARM %etype 3
OpTypeStruct %tensor_[element_type]_r3 %tensor_[element_type]_r3

9

12

<id>
Result Type

Result <id>

Extended instructions set <id>

5

<id>
inverse

<id>
local_bound

<id>
input_real

<id>
input_imag

Type support

%float32 = OpTypeFloat 32
%ts_float32_r3 = OpTypeTensorARM %float32 3

Enabling profile/extension

input_real

input_imag

output_real

output_imag

EXT-FFT

%ts_float32_r3

%ts_float32_r3

%ts_float32_r3

%ts_float32_r3

MATMUL

MATMUL

A: Input tensor A, N matrices of size HxC
A shape: [N,H,C]
A must come from an instruction of the following type:

OpTypeTensorARM %etype 3

B: Input tensor B, N matrices of size CxW
B shape: [N,C,W]
B must come from an instruction of the following type:

OpTypeTensorARM %etype 3

A_zp: Input tensor A zero point. Must be zero for non-int8 types.
A_zp shape: [1]
A_zp must come from a constant instruction of the following type:

OpTypeTensor %etype [rank] [shape]

B_zp: Input tensor B zero point. Must be zero for non-int8 types.
B_zp shape: [1]
B_zp must come from a constant instruction of the following type:

OpTypeTensor %etype [rank] [shape]

output: Output tensor, N matrices of size HxW

OpTypeTensorARM %etype 3

9

12

<id>
Result Type

Result <id>

Extended instructions set <id>

6

<id>
A

<id>
B

<id>
A_zp

<id>
B_zp

Type support

%int8 = OpTypeInt 8 0
%int32 = OpTypeInt 32 0
%int16 = OpTypeInt 16 0
%int64 = OpTypeInt 64 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%float16 = OpTypeFloat 16
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%float32 = OpTypeFloat 32
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%ts_int8_r3 = OpTypeTensorARM %int8 3
%ts_int8_s1 = OpTypeTensor %int8 [rank] [shape]
%ts_int32_r3 = OpTypeTensorARM %int32 3
%ts_int16_r3 = OpTypeTensorARM %int16 3
%ts_int16_s1 = OpTypeTensor %int16 [rank] [shape]
%ts_int64_r3 = OpTypeTensorARM %int64 3
%ts_fp8e4m3_r3 = OpTypeTensorARM %fp8e4m3 3
%ts_fp8e4m3_s1 = OpTypeTensor %fp8e4m3 [rank] [shape]
%ts_float16_r3 = OpTypeTensorARM %float16 3
%ts_fp8e5m2_r3 = OpTypeTensorARM %fp8e5m2 3
%ts_fp8e5m2_s1 = OpTypeTensor %fp8e5m2 [rank] [shape]
%ts_float16_s1 = OpTypeTensor %float16 [rank] [shape]
%ts_float32_r3 = OpTypeTensorARM %float32 3
%ts_bfloat16_r3 = OpTypeTensorARM %bfloat16 3
%ts_bfloat16_s1 = OpTypeTensor %bfloat16 [rank] [shape]
%ts_float32_s1 = OpTypeTensor %float32 [rank] [shape]

Enabling profile/extension

A

B

A_zp

B_zp

output

PRO-INT

%ts_int8_r3

%ts_int8_r3

%ts_int8_s1

%ts_int8_s1

%ts_int32_r3

EXT-INT16

%ts_int16_r3

%ts_int16_r3

%ts_int16_s1

%ts_int16_s1

%ts_int64_r3

EXT-FP8E4M3

%ts_fp8e4m3_r3

%ts_fp8e4m3_r3

%ts_fp8e4m3_s1

%ts_fp8e4m3_s1

%ts_float16_r3

EXT-FP8E5M2

%ts_fp8e5m2_r3

%ts_fp8e5m2_r3

%ts_fp8e5m2_s1

%ts_fp8e5m2_s1

%ts_float16_r3

PRO-FP

%ts_float16_r3

%ts_float16_r3

%ts_float16_s1

%ts_float16_s1

%ts_float16_r3

PRO-FP

%ts_float16_r3

%ts_float16_r3

%ts_float16_s1

%ts_float16_s1

%ts_float32_r3

EXT-BF16

%ts_bfloat16_r3

%ts_bfloat16_r3

%ts_bfloat16_s1

%ts_bfloat16_s1

%ts_float32_r3

PRO-FP

%ts_float32_r3

%ts_float32_r3

%ts_float32_s1

%ts_float32_s1

%ts_float32_r3

MAX_POOL2D

MAX_POOL2D

kernel: [kernel_y, kernel_x]
kernel shape: [2]
kernel must come from a constant instruction of the following type:

%etype = OpTypeInt 32 0
%shape = OpConstantComposite %shape_array_type %uint_2
OpTypeTensorARM %etype %uint_1 %shape

stride: [stride_y, stride_x]
stride shape: [2]
stride must come from a constant instruction of the following type:

%etype = OpTypeInt 32 0
%shape = OpConstantComposite %shape_array_type %uint_2
OpTypeTensorARM %etype %uint_1 %shape

pad: [pad_top, pad_bottom, pad_left, pad_right]
pad shape: [4]
pad must come from a constant instruction of the following type:

%etype = OpTypeInt 32 0
%shape = OpConstantComposite %shape_array_type %uint_4
OpTypeTensorARM %etype %uint_1 %shape

nan_mode: PROPAGATE or IGNORE. Set to PROPAGATE by default. This attribute affects the floating-point NaN propagation approach. This attribute is ignored by non floating-point types.
nan_mode must come from a constant instruction of the following type:

OpTypeInt 32 0

See nan_propagation_mode_t for valid values.

input: Input tensor 4D
input shape: [N,IH,IW,C]
input must come from an instruction of the following type:

OpTypeTensorARM %etype 4

output: Output tensor 4D

OpTypeTensorARM %etype 4

10

12

<id>
Result Type

Result <id>

Extended instructions set <id>

7

<id>
kernel

<id>
stride

<id>
pad

<id>
nan_mode

<id>
input

Type support

%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int8_r4 = OpTypeTensorARM %int8 4
%ts_int16_r4 = OpTypeTensorARM %int16 4
%ts_fp8e4m3_r4 = OpTypeTensorARM %fp8e4m3 4
%ts_fp8e5m2_r4 = OpTypeTensorARM %fp8e5m2 4
%ts_float16_r4 = OpTypeTensorARM %float16 4
%ts_bfloat16_r4 = OpTypeTensorARM %bfloat16 4
%ts_float32_r4 = OpTypeTensorARM %float32 4

Enabling profile/extension

input

output

PRO-INT

%ts_int8_r4

%ts_int8_r4

EXT-INT16

%ts_int16_r4

%ts_int16_r4

EXT-FP8E4M3

%ts_fp8e4m3_r4

%ts_fp8e4m3_r4

EXT-FP8E5M2

%ts_fp8e5m2_r4

%ts_fp8e5m2_r4

PRO-FP

%ts_float16_r4

%ts_float16_r4

EXT-BF16

%ts_bfloat16_r4

%ts_bfloat16_r4

PRO-FP

%ts_float32_r4

%ts_float32_r4

RFFT2D

RFFT2D

local_bound: This optional attribute affects the floating-point compliance error bound. The default of false allows for direct and transform based, fast convolution algorithms. Only set to true if direct dot-product calculation precision is required.
local_bound must come from a constant instruction of the following type:

OpTypeBool

input_real: Real input. H,W must be powers of two.
input_real shape: [N,H,W]
input_real must come from an instruction of the following type:

OpTypeTensorARM %etype 3

Output (composite)
output_real: Real part of the complex output
output_imag: Imaginary part of the complex output.

%tensor_[element_type]_r[rank] = OpTypeTensorARM %etype 3
OpTypeStruct %tensor_[element_type]_r[rank] %tensor_[element_type]_r[rank]

7

12

<id>
Result Type

Result <id>

Extended instructions set <id>

8

<id>
local_bound

<id>
input_real

Type support

%float32 = OpTypeFloat 32
%ts_float32_r3 = OpTypeTensorARM %float32 3
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input_real

output_real

output_imag

EXT-FFT

%ts_float32_r3

%ts_float32_r[rank]

%ts_float32_r[rank]

TRANSPOSE_CONV2D

TRANSPOSE_CONV2D

out_pad: [out_pad_top, out_pad_bottom, out_pad_left, out_pad_right]
out_pad shape: [4]
out_pad must come from a constant instruction of the following type:

%etype = OpTypeInt 32 0
%shape = OpConstantComposite %shape_array_type %uint_4
OpTypeTensorARM %etype %uint_1 %shape

stride: [stride_y, stride_x]
stride shape: [2]
stride must come from a constant instruction of the following type:

%etype = OpTypeInt 32 0
%shape = OpConstantComposite %shape_array_type %uint_2
OpTypeTensorARM %etype %uint_1 %shape

acc_type: Enumerated type, must be one of INT32, INT48, FP16, FP32 matching the type of acc_t in the Supported Data Types table for this operation
acc_type must come from a constant instruction of the following type:

OpTypeInt 32 0

See acc_type_t for valid values.

local_bound: This optional attribute affects the floating-point compliance error bound. The default of false allows for direct and transform based, fast convolution algorithms. Only set to true if direct dot-product calculation precision is required.
local_bound must come from a constant instruction of the following type:

OpTypeBool

input: Input tensor
input shape: [N,IH,IW,IC]
input must come from an instruction of the following type:

OpTypeTensorARM %etype 4

weight: Weight kernel size KH x KW
weight shape: [OC,KH,KW,IC]
weight must come from an instruction of the following type:

OpTypeTensorARM %etype 4

bias: Per output channel bias data.
Bias data will be broadcast if BC == 1.
bias shape: [BC]
bias must come from an instruction of the following type:

OpTypeTensorARM %etype 1

input_zp: Input tensor zero point. Must be zero for non-int8 types.
input_zp shape: [1]
input_zp must come from a constant instruction of the following type:

OpTypeTensor %etype [rank] [shape]

weight_zp: Weight zero point. Must be zero for non-int8 types.
weight_zp shape: [1]
weight_zp must come from a constant instruction of the following type:

OpTypeTensor %etype [rank] [shape]

output: Output tensor

OpTypeTensorARM %etype 4

14

12

<id>
Result Type

Result <id>

Extended instructions set <id>

9

<id>
out_pad

<id>
stride

<id>
acc_type

<id>
local_bound

<id>
input

<id>
weight

<id>
bias

<id>
input_zp

<id>
weight_zp

Type support

%int8 = OpTypeInt 8 0
%int32 = OpTypeInt 32 0
%int16 = OpTypeInt 16 0
%int64 = OpTypeInt 64 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%float16 = OpTypeFloat 16
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int8_r4 = OpTypeTensorARM %int8 4
%ts_int32_r1 = OpTypeTensorARM %int32 1
%ts_int8_s1 = OpTypeTensor %int8 [rank] [shape]
%ts_int32_r4 = OpTypeTensorARM %int32 4
%ts_int16_r4 = OpTypeTensorARM %int16 4
%ts_int64_r1 = OpTypeTensorARM %int64 1
%ts_int16_s1 = OpTypeTensor %int16 [rank] [shape]
%ts_int64_r4 = OpTypeTensorARM %int64 4
%ts_fp8e4m3_r4 = OpTypeTensorARM %fp8e4m3 4
%ts_float16_r1 = OpTypeTensorARM %float16 1
%ts_fp8e4m3_s1 = OpTypeTensor %fp8e4m3 [rank] [shape]
%ts_float16_r4 = OpTypeTensorARM %float16 4
%ts_fp8e5m2_r4 = OpTypeTensorARM %fp8e5m2 4
%ts_fp8e5m2_s1 = OpTypeTensor %fp8e5m2 [rank] [shape]
%ts_float16_s1 = OpTypeTensor %float16 [rank] [shape]
%ts_bfloat16_r4 = OpTypeTensorARM %bfloat16 4
%ts_bfloat16_r1 = OpTypeTensorARM %bfloat16 1
%ts_bfloat16_s1 = OpTypeTensor %bfloat16 [rank] [shape]
%ts_float32_r4 = OpTypeTensorARM %float32 4
%ts_float32_r1 = OpTypeTensorARM %float32 1
%ts_float32_s1 = OpTypeTensor %float32 [rank] [shape]

Enabling profile/extension

input

weight

bias

input_zp

weight_zp

output

PRO-INT

%ts_int8_r4

%ts_int8_r4

%ts_int32_r1

%ts_int8_s1

%ts_int8_s1

%ts_int32_r4

EXT-INT16

%ts_int16_r4

%ts_int8_r4

%ts_int64_r1

%ts_int16_s1

%ts_int8_s1

%ts_int64_r4

EXT-FP8E4M3

%ts_fp8e4m3_r4

%ts_fp8e4m3_r4

%ts_float16_r1

%ts_fp8e4m3_s1

%ts_fp8e4m3_s1

%ts_float16_r4

EXT-FP8E5M2

%ts_fp8e5m2_r4

%ts_fp8e5m2_r4

%ts_float16_r1

%ts_fp8e5m2_s1

%ts_fp8e5m2_s1

%ts_float16_r4

PRO-FP

%ts_float16_r4

%ts_float16_r4

%ts_float16_r1

%ts_float16_s1

%ts_float16_s1

%ts_float16_r4

PRO-FP

%ts_float16_r4

%ts_float16_r4

%ts_float16_r1

%ts_float16_s1

%ts_float16_s1

%ts_float16_r4

EXT-BF16

%ts_bfloat16_r4

%ts_bfloat16_r4

%ts_bfloat16_r1

%ts_bfloat16_s1

%ts_bfloat16_s1

%ts_bfloat16_r4

PRO-FP

%ts_float32_r4

%ts_float32_r4

%ts_float32_r1

%ts_float32_s1

%ts_float32_s1

%ts_float32_r4

Activation operators

CLAMP

CLAMP

min_val: Minimum clip value
min_val must come from a constant instruction of the following type:

See type support table

max_val: Maximum clip value
max_val must come from a constant instruction of the following type:

See type support table

nan_mode: PROPAGATE or IGNORE. Set to PROPAGATE by default. This attribute affects the floating-point NaN propagation approach. This attribute is ignored by non floating-point types.
nan_mode must come from a constant instruction of the following type:

OpTypeInt 32 0

See nan_propagation_mode_t for valid values.

input: Input tensor
input shape: shape
input must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor of same type and shape as input

OpTypeTensorARM %etype [rank]

9

12

<id>
Result Type

Result <id>

Extended instructions set <id>

10

<id>
min_val

<id>
max_val

<id>
nan_mode

<id>
input

Type support

%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input

output

PRO-INT

%ts_int8_r[rank]

%ts_int8_r[rank]

EXT-INT16

%ts_int16_r[rank]

%ts_int16_r[rank]

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

ERF

ERF

input: Input tensor
input shape: shape
input must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor of same type and shape as input

OpTypeTensorARM %etype [rank]

6

12

<id>
Result Type

Result <id>

Extended instructions set <id>

11

<id>
input

Type support

%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input

output

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

SIGMOID

SIGMOID

input: Input tensor
input shape: shape
input must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor of same type and shape as input

OpTypeTensorARM %etype [rank]

6

12

<id>
Result Type

Result <id>

Extended instructions set <id>

12

<id>
input

Type support

%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input

output

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

TANH

TANH

input: Input tensor
input shape: shape
input must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor of same type and shape as input

OpTypeTensorARM %etype [rank]

6

12

<id>
Result Type

Result <id>

Extended instructions set <id>

13

<id>
input

Type support

%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input

output

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

Elementwise-binary operators

ADD

ADD

input1: Input tensor
input1 shape: shape1
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

input2: Input tensor with the same rank as input1
input2 shape: shape2
input2 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor

OpTypeTensorARM %etype [rank]

7

12

<id>
Result Type

Result <id>

Extended instructions set <id>

14

<id>
input1

<id>
input2

Type support

%int32 = OpTypeInt 32 0
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input1

input2

output

PRO-INT, PRO-FP

%ts_int32_r[rank]

%ts_int32_r[rank]

%ts_int32_r[rank]

PRO-INT, PRO-FP

%ts_int32_r[rank]

%ts_int32_r[rank]

%ts_int32_r[rank]

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

%ts_float32_r[rank]

ARITHMETIC_RIGHT_SHIFT

ARITHMETIC_RIGHT_SHIFT

round: If true then the shift is rounded
round must come from a constant instruction of the following type:

OpTypeBool

input1: Input tensor
input1 shape: shape1
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

input2: Input tensor with the same rank as input1
input2 shape: shape2
input2 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor

OpTypeTensorARM %etype [rank]

8

12

<id>
Result Type

Result <id>

Extended instructions set <id>

15

<id>
round

<id>
input1

<id>
input2

Type support

%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]

Enabling profile/extension

input1

input2

output

PRO-INT

%ts_int8_r[rank]

%ts_int8_r[rank]

%ts_int8_r[rank]

PRO-INT

%ts_int16_r[rank]

%ts_int16_r[rank]

%ts_int16_r[rank]

PRO-INT

%ts_int32_r[rank]

%ts_int32_r[rank]

%ts_int32_r[rank]

BITWISE_AND

BITWISE_AND

input1: Input tensor
input1 shape: shape1
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

input2: Input tensor with the same rank as input1
input2 shape: shape2
input2 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor

OpTypeTensorARM %etype [rank]

7

12

<id>
Result Type

Result <id>

Extended instructions set <id>

16

<id>
input1

<id>
input2

Type support

%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]

Enabling profile/extension

input1

input2

output

PRO-INT

%ts_int8_r[rank]

%ts_int8_r[rank]

%ts_int8_r[rank]

PRO-INT

%ts_int16_r[rank]

%ts_int16_r[rank]

%ts_int16_r[rank]

PRO-INT

%ts_int32_r[rank]

%ts_int32_r[rank]

%ts_int32_r[rank]

BITWISE_OR

BITWISE_OR

input1: Input tensor
input1 shape: shape1
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

input2: Input tensor with the same rank as input1
input2 shape: shape2
input2 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor

OpTypeTensorARM %etype [rank]

7

12

<id>
Result Type

Result <id>

Extended instructions set <id>

17

<id>
input1

<id>
input2

Type support

%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]

Enabling profile/extension

input1

input2

output

PRO-INT

%ts_int8_r[rank]

%ts_int8_r[rank]

%ts_int8_r[rank]

PRO-INT

%ts_int16_r[rank]

%ts_int16_r[rank]

%ts_int16_r[rank]

PRO-INT

%ts_int32_r[rank]

%ts_int32_r[rank]

%ts_int32_r[rank]

BITWISE_XOR

BITWISE_XOR

input1: Input tensor
input1 shape: shape1
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

input2: Input tensor with the same rank as input1
input2 shape: shape2
input2 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor

OpTypeTensorARM %etype [rank]

7

12

<id>
Result Type

Result <id>

Extended instructions set <id>

18

<id>
input1

<id>
input2

Type support

%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]

Enabling profile/extension

input1

input2

output

PRO-INT

%ts_int8_r[rank]

%ts_int8_r[rank]

%ts_int8_r[rank]

PRO-INT

%ts_int16_r[rank]

%ts_int16_r[rank]

%ts_int16_r[rank]

PRO-INT

%ts_int32_r[rank]

%ts_int32_r[rank]

%ts_int32_r[rank]

INTDIV

INTDIV

input1: Input tensor
input1 shape: shape1
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

input2: Input tensor with the same rank as input1
input2 shape: shape2
input2 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor

OpTypeTensorARM %etype [rank]

7

12

<id>
Result Type

Result <id>

Extended instructions set <id>

19

<id>
input1

<id>
input2

Type support

%int32 = OpTypeInt 32 0
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]

Enabling profile/extension

input1

input2

output

PRO-INT, PRO-FP

%ts_int32_r[rank]

%ts_int32_r[rank]

%ts_int32_r[rank]

PRO-INT, PRO-FP

%ts_int32_r[rank]

%ts_int32_r[rank]

%ts_int32_r[rank]

LOGICAL_AND

LOGICAL_AND

input1: Input tensor
input1 shape: shape1
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

input2: Input tensor with the same rank as input1
input2 shape: shape2
input2 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor

OpTypeTensorARM %etype [rank]

7

12

<id>
Result Type

Result <id>

Extended instructions set <id>

20

<id>
input1

<id>
input2

Type support

%bool = OpTypeBool
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]

Enabling profile/extension

input1

input2

output

PRO-INT, PRO-FP

%ts_bool_r[rank]

%ts_bool_r[rank]

%ts_bool_r[rank]

PRO-INT, PRO-FP

%ts_bool_r[rank]

%ts_bool_r[rank]

%ts_bool_r[rank]

LOGICAL_LEFT_SHIFT

LOGICAL_LEFT_SHIFT

input1: Input tensor
input1 shape: shape1
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

input2: Input tensor with the same rank as input1
input2 shape: shape2
input2 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor

OpTypeTensorARM %etype [rank]

7

12

<id>
Result Type

Result <id>

Extended instructions set <id>

21

<id>
input1

<id>
input2

Type support

%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]

Enabling profile/extension

input1

input2

output

PRO-INT, PRO-FP

%ts_int8_r[rank]

%ts_int8_r[rank]

%ts_int8_r[rank]

PRO-INT, PRO-FP

%ts_int8_r[rank]

%ts_int8_r[rank]

%ts_int8_r[rank]

PRO-INT, PRO-FP

%ts_int16_r[rank]

%ts_int16_r[rank]

%ts_int16_r[rank]

PRO-INT, PRO-FP

%ts_int16_r[rank]

%ts_int16_r[rank]

%ts_int16_r[rank]

PRO-INT, PRO-FP

%ts_int32_r[rank]

%ts_int32_r[rank]

%ts_int32_r[rank]

PRO-INT, PRO-FP

%ts_int32_r[rank]

%ts_int32_r[rank]

%ts_int32_r[rank]

LOGICAL_RIGHT_SHIFT

LOGICAL_RIGHT_SHIFT

input1: Input tensor
input1 shape: shape1
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

input2: Input tensor with the same rank as input1
input2 shape: shape2
input2 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor

OpTypeTensorARM %etype [rank]

7

12

<id>
Result Type

Result <id>

Extended instructions set <id>

22

<id>
input1

<id>
input2

Type support

%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]

Enabling profile/extension

input1

input2

output

PRO-INT, PRO-FP

%ts_int8_r[rank]

%ts_int8_r[rank]

%ts_int8_r[rank]

PRO-INT, PRO-FP

%ts_int8_r[rank]

%ts_int8_r[rank]

%ts_int8_r[rank]

PRO-INT, PRO-FP

%ts_int16_r[rank]

%ts_int16_r[rank]

%ts_int16_r[rank]

PRO-INT, PRO-FP

%ts_int16_r[rank]

%ts_int16_r[rank]

%ts_int16_r[rank]

PRO-INT, PRO-FP

%ts_int32_r[rank]

%ts_int32_r[rank]

%ts_int32_r[rank]

PRO-INT, PRO-FP

%ts_int32_r[rank]

%ts_int32_r[rank]

%ts_int32_r[rank]

LOGICAL_OR

LOGICAL_OR

input1: Input tensor
input1 shape: shape1
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

input2: Input tensor with the same rank as input1
input2 shape: shape2
input2 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor

OpTypeTensorARM %etype [rank]

7

12

<id>
Result Type

Result <id>

Extended instructions set <id>

23

<id>
input1

<id>
input2

Type support

%bool = OpTypeBool
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]

Enabling profile/extension

input1

input2

output

PRO-INT, PRO-FP

%ts_bool_r[rank]

%ts_bool_r[rank]

%ts_bool_r[rank]

PRO-INT, PRO-FP

%ts_bool_r[rank]

%ts_bool_r[rank]

%ts_bool_r[rank]

LOGICAL_XOR

LOGICAL_XOR

input1: Input tensor
input1 shape: shape1
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

input2: Input tensor with the same rank as input1
input2 shape: shape2
input2 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor

OpTypeTensorARM %etype [rank]

7

12

<id>
Result Type

Result <id>

Extended instructions set <id>

24

<id>
input1

<id>
input2

Type support

%bool = OpTypeBool
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]

Enabling profile/extension

input1

input2

output

PRO-INT, PRO-FP

%ts_bool_r[rank]

%ts_bool_r[rank]

%ts_bool_r[rank]

PRO-INT, PRO-FP

%ts_bool_r[rank]

%ts_bool_r[rank]

%ts_bool_r[rank]

MAXIMUM

MAXIMUM

nan_mode: PROPAGATE or IGNORE. Set to PROPAGATE by default. This attribute affects the floating-point NaN propagation approach. This attribute is ignored by non floating-point types.
nan_mode must come from a constant instruction of the following type:

OpTypeInt 32 0

See nan_propagation_mode_t for valid values.

input1: Input tensor
input1 shape: shape1
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

input2: Input tensor with the same rank as input1
input2 shape: shape2
input2 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor

OpTypeTensorARM %etype [rank]

8

12

<id>
Result Type

Result <id>

Extended instructions set <id>

25

<id>
nan_mode

<id>
input1

<id>
input2

Type support

%int32 = OpTypeInt 32 0
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input1

input2

output

PRO-INT

%ts_int32_r[rank]

%ts_int32_r[rank]

%ts_int32_r[rank]

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

%ts_float32_r[rank]

MINIMUM

MINIMUM

nan_mode: PROPAGATE or IGNORE. Set to PROPAGATE by default. This attribute affects the floating-point NaN propagation approach. This attribute is ignored by non floating-point types.
nan_mode must come from a constant instruction of the following type:

OpTypeInt 32 0

See nan_propagation_mode_t for valid values.

input1: Input tensor
input1 shape: shape1
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

input2: Input tensor with the same rank as input1
input2 shape: shape2
input2 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor

OpTypeTensorARM %etype [rank]

8

12

<id>
Result Type

Result <id>

Extended instructions set <id>

26

<id>
nan_mode

<id>
input1

<id>
input2

Type support

%int32 = OpTypeInt 32 0
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input1

input2

output

PRO-INT

%ts_int32_r[rank]

%ts_int32_r[rank]

%ts_int32_r[rank]

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

%ts_float32_r[rank]

MUL

MUL

input1: Input tensor
input1 shape: shape1
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

input2: Input tensor with the same rank as input1
input2 shape: shape2
input2 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

shift: Result right shift (used only when in_t is i32_t)
shift shape: [1]
shift must come from a constant instruction of the following type:

%etype = OpTypeInt 8 0
%shape = OpConstantComposite %shape_array_type %uint_1
OpTypeTensorARM %etype %uint_1 %shape

output: Output tensor

OpTypeTensorARM %etype [rank]

8

12

<id>
Result Type

Result <id>

Extended instructions set <id>

27

<id>
input1

<id>
input2

<id>
shift

Type support

%int8 = OpTypeInt 8 0
%int32 = OpTypeInt 32 0
%int16 = OpTypeInt 16 0
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input1

input2

output

PRO-INT

%ts_int8_r[rank]

%ts_int8_r[rank]

%ts_int32_r[rank]

PRO-INT

%ts_int16_r[rank]

%ts_int16_r[rank]

%ts_int32_r[rank]

PRO-INT, PRO-FP

%ts_int32_r[rank]

%ts_int32_r[rank]

%ts_int32_r[rank]

PRO-INT, PRO-FP

%ts_int32_r[rank]

%ts_int32_r[rank]

%ts_int32_r[rank]

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

%ts_float32_r[rank]

POW

POW

input1: Input tensor
input1 shape: shape1
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

input2: Input tensor with the same rank as input1
input2 shape: shape2
input2 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor

OpTypeTensorARM %etype [rank]

7

12

<id>
Result Type

Result <id>

Extended instructions set <id>

28

<id>
input1

<id>
input2

Type support

%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input1

input2

output

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

%ts_float32_r[rank]

SUB

SUB

input1: Input tensor
input1 shape: shape1
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

input2: Input tensor with the same rank as input1
input2 shape: shape2
input2 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor

OpTypeTensorARM %etype [rank]

7

12

<id>
Result Type

Result <id>

Extended instructions set <id>

29

<id>
input1

<id>
input2

Type support

%int32 = OpTypeInt 32 0
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input1

input2

output

PRO-INT, PRO-FP

%ts_int32_r[rank]

%ts_int32_r[rank]

%ts_int32_r[rank]

PRO-INT, PRO-FP

%ts_int32_r[rank]

%ts_int32_r[rank]

%ts_int32_r[rank]

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

%ts_float32_r[rank]

TABLE

TABLE

input1: Input tensor
input1 shape: shape
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

table: Lookup table tensor
table shape: [TABLE_SIZE]
table must come from a constant instruction of the following type:

OpTypeTensorARM %etype 1

output: Output tensor

OpTypeTensorARM %etype [rank]

7

12

<id>
Result Type

Result <id>

Extended instructions set <id>

30

<id>
input1

<id>
table

Type support

%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int8_r1 = OpTypeTensorARM %int8 1
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int16_r1 = OpTypeTensorARM %int16 1
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]

Enabling profile/extension

input1

table

output

PRO-INT

%ts_int8_r[rank]

%ts_int8_r1

%ts_int8_r[rank]

EXT-INT16

%ts_int16_r[rank]

%ts_int16_r1

%ts_int32_r[rank]

Elementwise-unary operators

ABS

ABS

input1: Input tensor
input1 shape: shape
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor of same type, size as the input tensor

OpTypeTensorARM %etype [rank]

6

12

<id>
Result Type

Result <id>

Extended instructions set <id>

31

<id>
input1

Type support

%int32 = OpTypeInt 32 0
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input1

output

PRO-INT

%ts_int32_r[rank]

%ts_int32_r[rank]

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

BITWISE_NOT

BITWISE_NOT

input1: Input tensor
input1 shape: shape
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor of same type, size as the input tensor

OpTypeTensorARM %etype [rank]

6

12

<id>
Result Type

Result <id>

Extended instructions set <id>

32

<id>
input1

Type support

%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]

Enabling profile/extension

input1

output

PRO-INT

%ts_int8_r[rank]

%ts_int8_r[rank]

PRO-INT

%ts_int16_r[rank]

%ts_int16_r[rank]

PRO-INT

%ts_int32_r[rank]

%ts_int32_r[rank]

CEIL

CEIL

input1: Input tensor
input1 shape: shape
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor of same type, size as the input tensor

OpTypeTensorARM %etype [rank]

6

12

<id>
Result Type

Result <id>

Extended instructions set <id>

33

<id>
input1

Type support

%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input1

output

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

CLZ

CLZ

input1: Input tensor
input1 shape: shape
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor of same type, size as the input tensor

OpTypeTensorARM %etype [rank]

6

12

<id>
Result Type

Result <id>

Extended instructions set <id>

34

<id>
input1

Type support

%int32 = OpTypeInt 32 0
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]

Enabling profile/extension

input1

output

PRO-INT

%ts_int32_r[rank]

%ts_int32_r[rank]

COS

COS

input1: Input tensor
input1 shape: shape
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor of same type and shape as input

OpTypeTensorARM %etype [rank]

6

12

<id>
Result Type

Result <id>

Extended instructions set <id>

35

<id>
input1

Type support

%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input1

output

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

EXP

EXP

input1: Input tensor
input1 shape: shape
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor of same type, size as the input tensor

OpTypeTensorARM %etype [rank]

6

12

<id>
Result Type

Result <id>

Extended instructions set <id>

36

<id>
input1

Type support

%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input1

output

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

FLOOR

FLOOR

input1: Input tensor
input1 shape: shape
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor of same type, size as the input tensor

OpTypeTensorARM %etype [rank]

6

12

<id>
Result Type

Result <id>

Extended instructions set <id>

37

<id>
input1

Type support

%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input1

output

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

LOG

LOG

input1: Input tensor
input1 shape: shape
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor of same type, size as the input tensor

OpTypeTensorARM %etype [rank]

6

12

<id>
Result Type

Result <id>

Extended instructions set <id>

38

<id>
input1

Type support

%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input1

output

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

LOGICAL_NOT

LOGICAL_NOT

input1: Input tensor
input1 shape: shape
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor of same type, size as the input tensor

OpTypeTensorARM %etype [rank]

6

12

<id>
Result Type

Result <id>

Extended instructions set <id>

39

<id>
input1

Type support

%bool = OpTypeBool
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]

Enabling profile/extension

input1

output

PRO-INT, PRO-FP

%ts_bool_r[rank]

%ts_bool_r[rank]

PRO-INT, PRO-FP

%ts_bool_r[rank]

%ts_bool_r[rank]

NEGATE

NEGATE

input1: Input tensor
input1 shape: shape
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

input1_zp: Input 1 zero point. Must be zero for non-int8 types.
input1_zp shape: [1]
input1_zp must come from a constant instruction of the following type:

OpTypeTensor %etype [rank] [shape]

output_zp: Output zero point. Must be zero for non-int8 types.
output_zp shape: [1]
output_zp must come from a constant instruction of the following type:

OpTypeTensor %etype [rank] [shape]

output: Output tensor of same type, size as the input tensor

OpTypeTensorARM %etype [rank]

8

12

<id>
Result Type

Result <id>

Extended instructions set <id>

40

<id>
input1

<id>
input1_zp

<id>
output_zp

Type support

%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int8_s1 = OpTypeTensor %int8 [rank] [shape]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int16_s1 = OpTypeTensor %int16 [rank] [shape]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_int32_s1 = OpTypeTensor %int32 [rank] [shape]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_float16_s1 = OpTypeTensor %float16 [rank] [shape]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_bfloat16_s1 = OpTypeTensor %bfloat16 [rank] [shape]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
%ts_float32_s1 = OpTypeTensor %float32 [rank] [shape]

Enabling profile/extension

input1

input1_zp

output_zp

output

PRO-INT

%ts_int8_r[rank]

%ts_int8_s1

%ts_int8_s1

%ts_int8_r[rank]

PRO-INT

%ts_int16_r[rank]

%ts_int16_s1

%ts_int16_s1

%ts_int16_r[rank]

PRO-INT

%ts_int32_r[rank]

%ts_int32_s1

%ts_int32_s1

%ts_int32_r[rank]

PRO-FP

%ts_float16_r[rank]

%ts_float16_s1

%ts_float16_s1

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_s1

%ts_bfloat16_s1

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_s1

%ts_float32_s1

%ts_float32_r[rank]

RECIPROCAL

RECIPROCAL

input1: Input tensor
input1 shape: shape
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor of same type, size as the input tensor

OpTypeTensorARM %etype [rank]

6

12

<id>
Result Type

Result <id>

Extended instructions set <id>

41

<id>
input1

Type support

%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input1

output

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

RSQRT

RSQRT

input1: Input tensor
input1 shape: shape
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor of same type, size as the input tensor

OpTypeTensorARM %etype [rank]

6

12

<id>
Result Type

Result <id>

Extended instructions set <id>

42

<id>
input1

Type support

%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input1

output

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

SIN

SIN

input1: Input tensor
input1 shape: shape
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor of same type and shape as input

OpTypeTensorARM %etype [rank]

6

12

<id>
Result Type

Result <id>

Extended instructions set <id>

43

<id>
input1

Type support

%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input1

output

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

Elementwise-ternary operators

SELECT

SELECT

input1: Input selector tensor
input1 shape: shape1
input1 must come from an instruction of the following type:

%etype = OpTypeBool
OpTypeTensorARM %etype [rank]

input2: Input value tensor if input1 is True
input2 shape: shape2
input2 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

input3: Input value tensor if input1 is False
input3 shape: shape3
input3 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor of same type as input2 and input3

OpTypeTensorARM %etype [rank]

8

12

<id>
Result Type

Result <id>

Extended instructions set <id>

44

<id>
input1

<id>
input2

<id>
input3

Type support

%bool = OpTypeBool
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input2

input3

output

PRO-INT, PRO-FP

%ts_bool_r[rank]

%ts_bool_r[rank]

%ts_bool_r[rank]

PRO-INT, PRO-FP

%ts_bool_r[rank]

%ts_bool_r[rank]

%ts_bool_r[rank]

PRO-INT

%ts_int8_r[rank]

%ts_int8_r[rank]

%ts_int8_r[rank]

PRO-INT

%ts_int16_r[rank]

%ts_int16_r[rank]

%ts_int16_r[rank]

PRO-INT

%ts_int32_r[rank]

%ts_int32_r[rank]

%ts_int32_r[rank]

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

%ts_float32_r[rank]

Comparison operators

EQUAL

EQUAL

input1: Input tensor
input1 shape: shape1
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

input2: Input tensor with the same rank as input1
input2 shape: shape2
input2 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor

OpTypeTensorARM %etype [rank]

7

12

<id>
Result Type

Result <id>

Extended instructions set <id>

45

<id>
input1

<id>
input2

Type support

%int32 = OpTypeInt 32 0
%bool = OpTypeBool
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input1

input2

output

PRO-INT

%ts_int32_r[rank]

%ts_int32_r[rank]

%ts_bool_r[rank]

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

%ts_bool_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

%ts_bool_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

%ts_bool_r[rank]

GREATER

GREATER

input1: Input tensor
input1 shape: shape1
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

input2: Input tensor with the same rank as input1
input2 shape: shape2
input2 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor

OpTypeTensorARM %etype [rank]

7

12

<id>
Result Type

Result <id>

Extended instructions set <id>

46

<id>
input1

<id>
input2

Type support

%int32 = OpTypeInt 32 0
%bool = OpTypeBool
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input1

input2

output

PRO-INT

%ts_int32_r[rank]

%ts_int32_r[rank]

%ts_bool_r[rank]

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

%ts_bool_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

%ts_bool_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

%ts_bool_r[rank]

GREATER_EQUAL

GREATER_EQUAL

input1: Input tensor
input1 shape: shape1
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

input2: Input tensor with the same rank as input1
input2 shape: shape2
input2 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor

OpTypeTensorARM %etype [rank]

7

12

<id>
Result Type

Result <id>

Extended instructions set <id>

47

<id>
input1

<id>
input2

Type support

%int32 = OpTypeInt 32 0
%bool = OpTypeBool
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input1

input2

output

PRO-INT

%ts_int32_r[rank]

%ts_int32_r[rank]

%ts_bool_r[rank]

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

%ts_bool_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

%ts_bool_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

%ts_bool_r[rank]

Reduction operators

REDUCE_ALL

REDUCE_ALL

axis: Axis to reduce, in range from 0 to rank(shape1)-1
axis must come from a constant instruction of the following type:

OpTypeInt 32 0

input: Input tensor
input shape: shape1
input must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor. Same rank as the input tensor.

OpTypeTensorARM %etype [rank]

7

12

<id>
Result Type

Result <id>

Extended instructions set <id>

48

<id>
axis

<id>
input

Type support

%bool = OpTypeBool
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]

Enabling profile/extension

input

output

PRO-INT, PRO-FP

%ts_bool_r[rank]

%ts_bool_r[rank]

PRO-INT, PRO-FP

%ts_bool_r[rank]

%ts_bool_r[rank]

REDUCE_ANY

REDUCE_ANY

axis: Axis to reduce, in range from 0 to rank(shape1)-1
axis must come from a constant instruction of the following type:

OpTypeInt 32 0

input: Input tensor
input shape: shape1
input must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor. Same rank as the input tensor.

OpTypeTensorARM %etype [rank]

7

12

<id>
Result Type

Result <id>

Extended instructions set <id>

49

<id>
axis

<id>
input

Type support

%bool = OpTypeBool
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]

Enabling profile/extension

input

output

PRO-INT, PRO-FP

%ts_bool_r[rank]

%ts_bool_r[rank]

PRO-INT, PRO-FP

%ts_bool_r[rank]

%ts_bool_r[rank]

REDUCE_MAX

REDUCE_MAX

axis: Axis to reduce, in range from 0 to rank(shape1)-1
axis must come from a constant instruction of the following type:

OpTypeInt 32 0

nan_mode: PROPAGATE or IGNORE. Set to PROPAGATE by default. This attribute affects the floating-point NaN propagation approach. This attribute is ignored by non floating-point types.
nan_mode must come from a constant instruction of the following type:

OpTypeInt 32 0

See nan_propagation_mode_t for valid values.

input: Input tensor
input shape: shape1
input must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor. Same rank as the input tensor.

OpTypeTensorARM %etype [rank]

8

12

<id>
Result Type

Result <id>

Extended instructions set <id>

50

<id>
axis

<id>
nan_mode

<id>
input

Type support

%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input

output

PRO-INT

%ts_int8_r[rank]

%ts_int8_r[rank]

PRO-INT

%ts_int16_r[rank]

%ts_int16_r[rank]

PRO-INT

%ts_int32_r[rank]

%ts_int32_r[rank]

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

REDUCE_MIN

REDUCE_MIN

axis: Axis to reduce, in range from 0 to rank(shape1)-1
axis must come from a constant instruction of the following type:

OpTypeInt 32 0

nan_mode: PROPAGATE or IGNORE. Set to PROPAGATE by default. This attribute affects the floating-point NaN propagation approach. This attribute is ignored by non floating-point types.
nan_mode must come from a constant instruction of the following type:

OpTypeInt 32 0

See nan_propagation_mode_t for valid values.

input: Input tensor
input shape: shape1
input must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor. Same rank as the input tensor.

OpTypeTensorARM %etype [rank]

8

12

<id>
Result Type

Result <id>

Extended instructions set <id>

51

<id>
axis

<id>
nan_mode

<id>
input

Type support

%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input

output

PRO-INT

%ts_int8_r[rank]

%ts_int8_r[rank]

PRO-INT

%ts_int16_r[rank]

%ts_int16_r[rank]

PRO-INT

%ts_int32_r[rank]

%ts_int32_r[rank]

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

REDUCE_PRODUCT

REDUCE_PRODUCT

axis: Axis to reduce, in range from 0 to rank(shape1)-1
axis must come from a constant instruction of the following type:

OpTypeInt 32 0

input: Input tensor
input shape: shape1
input must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor. Same rank as the input tensor.

OpTypeTensorARM %etype [rank]

7

12

<id>
Result Type

Result <id>

Extended instructions set <id>

52

<id>
axis

<id>
input

Type support

%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input

output

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

REDUCE_SUM

REDUCE_SUM

axis: Axis to reduce, in range from 0 to rank(shape1)-1
axis must come from a constant instruction of the following type:

OpTypeInt 32 0

input: Input tensor
input shape: shape1
input must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor. Same rank as the input tensor.

OpTypeTensorARM %etype [rank]

7

12

<id>
Result Type

Result <id>

Extended instructions set <id>

53

<id>
axis

<id>
input

Type support

%int32 = OpTypeInt 32 0
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input

output

PRO-INT

%ts_int32_r[rank]

%ts_int32_r[rank]

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

Data-layout operators

CONCAT

CONCAT

axis: Axis along which concatenation is to occur, in range from 0 to rank(shape)-1
axis must come from a constant instruction of the following type:

OpTypeInt 32 0

input1: List of input tensors. All inputs must have the same rank and data type
input1 shape: shapes1
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor

OpTypeTensorARM %etype [rank]

7+Variable

12

<id>
Result Type

Result <id>

Extended instructions set <id>

54

<id>
axis

<id>, <id>,…​
input1

Type support

%bool = OpTypeBool
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_fp8e4m3_r[rank] = OpTypeTensorARM %fp8e4m3 [rank]
%ts_fp8e5m2_r[rank] = OpTypeTensorARM %fp8e5m2 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input1

output

PRO-INT, PRO-FP

%ts_bool_r[rank]

%ts_bool_r[rank]

PRO-INT, PRO-FP

%ts_bool_r[rank]

%ts_bool_r[rank]

PRO-INT

%ts_int8_r[rank]

%ts_int8_r[rank]

EXT-INT16

%ts_int16_r[rank]

%ts_int16_r[rank]

PRO-INT

%ts_int32_r[rank]

%ts_int32_r[rank]

EXT-FP8E4M3

%ts_fp8e4m3_r[rank]

%ts_fp8e4m3_r[rank]

EXT-FP8E5M2

%ts_fp8e5m2_r[rank]

%ts_fp8e5m2_r[rank]

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

PAD

PAD

input1: Input tensor
input1 shape: shape1
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

padding: Number of pad elements at the start and end of each dimension. The values in padding are interpreted as start, end of each dimension. As an example for rank 2, the values would be interpreted as [start_dim0, end_dim0, start_dim1, end_dim1].
padding shape: [2*rank(shape1)]
padding must come from a constant instruction of the following type:

%etype = OpTypeInt 32 0
OpTypeTensorARM %uint %uint_1 %uint_array_rank(padding)

pad_const: The value to be used as padding.
pad_const shape: [1]
pad_const must come from a constant instruction of the following type:

OpTypeTensor %etype [rank] [shape]

output: Output tensor of same type as the input tensor

OpTypeTensorARM %etype [rank]

8

12

<id>
Result Type

Result <id>

Extended instructions set <id>

55

<id>
input1

<id>
padding

<id>
pad_const

Type support

%bool = OpTypeBool
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]
%ts_bool_s1 = OpTypeTensor %bool [rank] [shape]
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int8_s1 = OpTypeTensor %int8 [rank] [shape]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int16_s1 = OpTypeTensor %int16 [rank] [shape]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_int32_s1 = OpTypeTensor %int32 [rank] [shape]
%ts_fp8e4m3_r[rank] = OpTypeTensorARM %fp8e4m3 [rank]
%ts_fp8e4m3_s1 = OpTypeTensor %fp8e4m3 [rank] [shape]
%ts_fp8e5m2_r[rank] = OpTypeTensorARM %fp8e5m2 [rank]
%ts_fp8e5m2_s1 = OpTypeTensor %fp8e5m2 [rank] [shape]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_float16_s1 = OpTypeTensor %float16 [rank] [shape]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_bfloat16_s1 = OpTypeTensor %bfloat16 [rank] [shape]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
%ts_float32_s1 = OpTypeTensor %float32 [rank] [shape]

Enabling profile/extension

input1

pad_const

output

PRO-INT, PRO-FP

%ts_bool_r[rank]

%ts_bool_s1

%ts_bool_r[rank]

PRO-INT, PRO-FP

%ts_bool_r[rank]

%ts_bool_s1

%ts_bool_r[rank]

PRO-INT

%ts_int8_r[rank]

%ts_int8_s1

%ts_int8_r[rank]

PRO-INT

%ts_int16_r[rank]

%ts_int16_s1

%ts_int16_r[rank]

PRO-INT

%ts_int32_r[rank]

%ts_int32_s1

%ts_int32_r[rank]

EXT-FP8E4M3

%ts_fp8e4m3_r[rank]

%ts_fp8e4m3_s1

%ts_fp8e4m3_r[rank]

EXT-FP8E5M2

%ts_fp8e5m2_r[rank]

%ts_fp8e5m2_s1

%ts_fp8e5m2_r[rank]

PRO-FP

%ts_float16_r[rank]

%ts_float16_s1

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_s1

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_s1

%ts_float32_r[rank]

RESHAPE

RESHAPE

input1: Input tensor
input1 shape: shape1
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

shape: shape_t giving the new shape.
shape shape: [rank(shape)]
shape must come from a constant instruction of the following type:

%etype = OpTypeInt 32 0
OpTypeTensorARM %uint %uint_1 %uint_array_rank(shape)

output: Output tensor of same type, size as the input tensor

OpTypeTensorARM %etype [rank]

7

12

<id>
Result Type

Result <id>

Extended instructions set <id>

56

<id>
input1

<id>
shape

Type support

%bool = OpTypeBool
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_fp8e4m3_r[rank] = OpTypeTensorARM %fp8e4m3 [rank]
%ts_fp8e5m2_r[rank] = OpTypeTensorARM %fp8e5m2 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input1

output

PRO-INT, PRO-FP

%ts_bool_r[rank]

%ts_bool_r[rank]

PRO-INT, PRO-FP

%ts_bool_r[rank]

%ts_bool_r[rank]

PRO-INT

%ts_int8_r[rank]

%ts_int8_r[rank]

PRO-INT

%ts_int16_r[rank]

%ts_int16_r[rank]

PRO-INT

%ts_int32_r[rank]

%ts_int32_r[rank]

EXT-FP8E4M3

%ts_fp8e4m3_r[rank]

%ts_fp8e4m3_r[rank]

EXT-FP8E5M2

%ts_fp8e5m2_r[rank]

%ts_fp8e5m2_r[rank]

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

REVERSE

REVERSE

axis: Axis to reverse, in range from 0 to rank(shape)-1
axis must come from a constant instruction of the following type:

OpTypeInt 32 0

input1: Input tensor
input1 shape: shape
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor. Same shape as input tensor

OpTypeTensorARM %etype [rank]

7

12

<id>
Result Type

Result <id>

Extended instructions set <id>

57

<id>
axis

<id>
input1

Type support

%bool = OpTypeBool
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_fp8e4m3_r[rank] = OpTypeTensorARM %fp8e4m3 [rank]
%ts_fp8e5m2_r[rank] = OpTypeTensorARM %fp8e5m2 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input1

output

PRO-INT, PRO-FP

%ts_bool_r[rank]

%ts_bool_r[rank]

PRO-INT, PRO-FP

%ts_bool_r[rank]

%ts_bool_r[rank]

PRO-INT

%ts_int8_r[rank]

%ts_int8_r[rank]

PRO-INT

%ts_int16_r[rank]

%ts_int16_r[rank]

PRO-INT

%ts_int32_r[rank]

%ts_int32_r[rank]

EXT-FP8E4M3

%ts_fp8e4m3_r[rank]

%ts_fp8e4m3_r[rank]

EXT-FP8E5M2

%ts_fp8e5m2_r[rank]

%ts_fp8e5m2_r[rank]

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

SLICE

SLICE

input1: Input tensor
input1 shape: shape1
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

start: List of integer coordinates, of length equal to the rank of input1. Start coordinate for slicing.
start shape: [rank(shape1)]
start must come from a constant instruction of the following type:

%etype = OpTypeInt 32 0
OpTypeTensorARM %uint %uint_1 %uint_array_rank(start)

size: List of integer size values, of length equal to the rank of input1. Size of the input to be used.
size shape: [rank(shape1)]
size must come from a constant instruction of the following type:

%etype = OpTypeInt 32 0
OpTypeTensorARM %uint %uint_1 %uint_array_rank(size)

output: Output tensor of same type as the input tensor

OpTypeTensorARM %etype [rank]

8

12

<id>
Result Type

Result <id>

Extended instructions set <id>

58

<id>
input1

<id>
start

<id>
size

Type support

%bool = OpTypeBool
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_fp8e4m3_r[rank] = OpTypeTensorARM %fp8e4m3 [rank]
%ts_fp8e5m2_r[rank] = OpTypeTensorARM %fp8e5m2 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input1

output

PRO-INT, PRO-FP

%ts_bool_r[rank]

%ts_bool_r[rank]

PRO-INT, PRO-FP

%ts_bool_r[rank]

%ts_bool_r[rank]

PRO-INT

%ts_int8_r[rank]

%ts_int8_r[rank]

PRO-INT

%ts_int16_r[rank]

%ts_int16_r[rank]

PRO-INT

%ts_int32_r[rank]

%ts_int32_r[rank]

EXT-FP8E4M3

%ts_fp8e4m3_r[rank]

%ts_fp8e4m3_r[rank]

EXT-FP8E5M2

%ts_fp8e5m2_r[rank]

%ts_fp8e5m2_r[rank]

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

TILE

TILE

input1: Input tensor
input1 shape: shape1
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

multiples: Number of times to replicate input1 in each dimension
multiples shape: [rank(shape1)]
multiples must come from a constant instruction of the following type:

%etype = OpTypeInt 32 0
OpTypeTensorARM %uint %uint_1 %uint_array_rank(multiples)

output: Output tensor of same type, rank as the input tensor

OpTypeTensorARM %etype [rank]

7

12

<id>
Result Type

Result <id>

Extended instructions set <id>

59

<id>
input1

<id>
multiples

Type support

%bool = OpTypeBool
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_fp8e4m3_r[rank] = OpTypeTensorARM %fp8e4m3 [rank]
%ts_fp8e5m2_r[rank] = OpTypeTensorARM %fp8e5m2 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input1

output

PRO-INT, PRO-FP

%ts_bool_r[rank]

%ts_bool_r[rank]

PRO-INT, PRO-FP

%ts_bool_r[rank]

%ts_bool_r[rank]

PRO-INT

%ts_int8_r[rank]

%ts_int8_r[rank]

PRO-INT

%ts_int16_r[rank]

%ts_int16_r[rank]

PRO-INT

%ts_int32_r[rank]

%ts_int32_r[rank]

EXT-FP8E4M3

%ts_fp8e4m3_r[rank]

%ts_fp8e4m3_r[rank]

EXT-FP8E5M2

%ts_fp8e5m2_r[rank]

%ts_fp8e5m2_r[rank]

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

TRANSPOSE

TRANSPOSE

perms: List of integers of length equal to the rank of input1. Values must be valid dimensions within shape1, and may not be repeated.
perms shape: [rank(shape1)]
perms must come from a constant instruction of the following type:

%etype = OpTypeInt 32 0
%shape = OpConstantComposite %shape_array_type %uint_rank(shape1)
OpTypeTensorARM %etype %uint_1 %shape

input1: Input tensor
input1 shape: shape1
input1 must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor of same type, rank as the input tensor

OpTypeTensorARM %etype [rank]

7

12

<id>
Result Type

Result <id>

Extended instructions set <id>

60

<id>
perms

<id>
input1

Type support

%bool = OpTypeBool
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_fp8e4m3_r[rank] = OpTypeTensorARM %fp8e4m3 [rank]
%ts_fp8e5m2_r[rank] = OpTypeTensorARM %fp8e5m2 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]

Enabling profile/extension

input1

output

PRO-INT, PRO-FP

%ts_bool_r[rank]

%ts_bool_r[rank]

PRO-INT, PRO-FP

%ts_bool_r[rank]

%ts_bool_r[rank]

PRO-INT

%ts_int8_r[rank]

%ts_int8_r[rank]

PRO-INT

%ts_int16_r[rank]

%ts_int16_r[rank]

PRO-INT

%ts_int32_r[rank]

%ts_int32_r[rank]

EXT-FP8E4M3

%ts_fp8e4m3_r[rank]

%ts_fp8e4m3_r[rank]

EXT-FP8E5M2

%ts_fp8e5m2_r[rank]

%ts_fp8e5m2_r[rank]

PRO-FP

%ts_float16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float32_r[rank]

Scatter-gather operators

GATHER

GATHER

values: 3D value tensor
values shape: [N,K,C]
values must come from an instruction of the following type:

OpTypeTensorARM %etype 3

indices: 2D index tensor
indices shape: [N,W]
indices must come from an instruction of the following type:

%etype = OpTypeInt 32 0
OpTypeTensorARM %etype 2

output: 3D output tensor

OpTypeTensorARM %etype 3

7

12

<id>
Result Type

Result <id>

Extended instructions set <id>

61

<id>
values

<id>
indices

Type support

%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int8_r3 = OpTypeTensorARM %int8 3
%ts_int16_r3 = OpTypeTensorARM %int16 3
%ts_int32_r3 = OpTypeTensorARM %int32 3
%ts_fp8e4m3_r3 = OpTypeTensorARM %fp8e4m3 3
%ts_fp8e5m2_r3 = OpTypeTensorARM %fp8e5m2 3
%ts_float16_r3 = OpTypeTensorARM %float16 3
%ts_bfloat16_r3 = OpTypeTensorARM %bfloat16 3
%ts_float32_r3 = OpTypeTensorARM %float32 3

Enabling profile/extension

values

output

PRO-INT

%ts_int8_r3

%ts_int8_r3

PRO-INT

%ts_int16_r3

%ts_int16_r3

PRO-INT

%ts_int32_r3

%ts_int32_r3

EXT-FP8E4M3

%ts_fp8e4m3_r3

%ts_fp8e4m3_r3

EXT-FP8E5M2

%ts_fp8e5m2_r3

%ts_fp8e5m2_r3

PRO-FP

%ts_float16_r3

%ts_float16_r3

EXT-BF16

%ts_bfloat16_r3

%ts_bfloat16_r3

PRO-FP

%ts_float32_r3

%ts_float32_r3

SCATTER

SCATTER

values_in: 3D values in tensor
values_in shape: [N,K,C]
values_in must come from an instruction of the following type:

OpTypeTensorARM %etype 3

indices: 2D index tensor
indices shape: [N,W]
indices must come from an instruction of the following type:

%etype = OpTypeInt 32 0
OpTypeTensorARM %etype 2

input: 3D input tensor
input shape: [N,W,C]
input must come from an instruction of the following type:

OpTypeTensorARM %etype 3

values_out: 3D output tensor

OpTypeTensorARM %etype 3

8

12

<id>
Result Type

Result <id>

Extended instructions set <id>

62

<id>
values_in

<id>
indices

<id>
input

Type support

%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int8_r3 = OpTypeTensorARM %int8 3
%ts_int16_r3 = OpTypeTensorARM %int16 3
%ts_int32_r3 = OpTypeTensorARM %int32 3
%ts_fp8e4m3_r3 = OpTypeTensorARM %fp8e4m3 3
%ts_fp8e5m2_r3 = OpTypeTensorARM %fp8e5m2 3
%ts_float16_r3 = OpTypeTensorARM %float16 3
%ts_bfloat16_r3 = OpTypeTensorARM %bfloat16 3
%ts_float32_r3 = OpTypeTensorARM %float32 3

Enabling profile/extension

values_in

input

values_out

PRO-INT

%ts_int8_r3

%ts_int8_r3

%ts_int8_r3

PRO-INT

%ts_int16_r3

%ts_int16_r3

%ts_int16_r3

PRO-INT

%ts_int32_r3

%ts_int32_r3

%ts_int32_r3

EXT-FP8E4M3

%ts_fp8e4m3_r3

%ts_fp8e4m3_r3

%ts_fp8e4m3_r3

EXT-FP8E5M2

%ts_fp8e5m2_r3

%ts_fp8e5m2_r3

%ts_fp8e5m2_r3

PRO-FP

%ts_float16_r3

%ts_float16_r3

%ts_float16_r3

EXT-BF16

%ts_bfloat16_r3

%ts_bfloat16_r3

%ts_bfloat16_r3

PRO-FP

%ts_float32_r3

%ts_float32_r3

%ts_float32_r3

Image operators

RESIZE

RESIZE

mode: BILINEAR or NEAREST
mode must come from a constant instruction of the following type:

OpTypeInt 32 0

See resize_mode_t for valid values.

input: Input tensor
input shape: [N,IH,IW,C]
input must come from an instruction of the following type:

OpTypeTensorARM %etype 4

scale: [scale_y_n, scale_y_d, scale_x_n, scale_x_d]
scale shape: [4]
scale must come from a constant instruction of the following type:

%etype = OpTypeInt 32 0
OpTypeTensorARM %uint %uint_1 %uint_array_rank(scale)

offset: [offset_y, offset_x]
offset shape: [2]
offset must come from a constant instruction of the following type:

%etype = OpTypeInt 32 0
OpTypeTensorARM %uint %uint_1 %uint_array_rank(offset)

border: [border_y, border_x]
border shape: [2]
border must come from a constant instruction of the following type:

%etype = OpTypeInt 32 0
OpTypeTensorARM %uint %uint_1 %uint_array_rank(border)

output: Output tensor

OpTypeTensorARM %etype 4

10

12

<id>
Result Type

Result <id>

Extended instructions set <id>

63

<id>
mode

<id>
input

<id>
scale

<id>
offset

<id>
border

Type support

%int8 = OpTypeInt 8 0
%int32 = OpTypeInt 32 0
%int16 = OpTypeInt 16 0
%int64 = OpTypeInt 64 0
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int8_r4 = OpTypeTensorARM %int8 4
%ts_int32_r4 = OpTypeTensorARM %int32 4
%ts_int16_r4 = OpTypeTensorARM %int16 4
%ts_int64_r4 = OpTypeTensorARM %int64 4
%ts_float16_r4 = OpTypeTensorARM %float16 4
%ts_bfloat16_r4 = OpTypeTensorARM %bfloat16 4
%ts_float32_r4 = OpTypeTensorARM %float32 4

Enabling profile/extension

input

output

PRO-INT

%ts_int8_r4

%ts_int32_r4

PRO-INT

%ts_int8_r4

%ts_int8_r4

EXT-INT16

%ts_int16_r4

%ts_int64_r4

EXT-INT16

%ts_int16_r4

%ts_int16_r4

PRO-FP

%ts_float16_r4

%ts_float16_r4

EXT-BF16

%ts_bfloat16_r4

%ts_bfloat16_r4

PRO-FP

%ts_float32_r4

%ts_float32_r4

Type-conversion operators

CAST

CAST

input: Input tensor
input shape: shape
input must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

output: Output tensor

OpTypeTensorARM %etype [rank]

6

12

<id>
Result Type

Result <id>

Extended instructions set <id>

64

<id>
input

Type support

%bool = OpTypeBool
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
%ts_fp8e4m3_r[rank] = OpTypeTensorARM %fp8e4m3 [rank]
%ts_fp8e5m2_r[rank] = OpTypeTensorARM %fp8e5m2 [rank]

Enabling profile/extension

input

output

PRO-INT

%ts_bool_r[rank]

%ts_int8_r[rank]

PRO-INT

%ts_bool_r[rank]

%ts_int16_r[rank]

PRO-INT

%ts_bool_r[rank]

%ts_int32_r[rank]

PRO-INT

%ts_int8_r[rank]

%ts_bool_r[rank]

PRO-INT

%ts_int8_r[rank]

%ts_int16_r[rank]

PRO-INT

%ts_int8_r[rank]

%ts_int32_r[rank]

PRO-FP

%ts_int8_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_int8_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_int8_r[rank]

%ts_float32_r[rank]

PRO-INT

%ts_int16_r[rank]

%ts_bool_r[rank]

PRO-INT

%ts_int16_r[rank]

%ts_int8_r[rank]

PRO-INT

%ts_int16_r[rank]

%ts_int32_r[rank]

PRO-FP

%ts_int16_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_int16_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_int16_r[rank]

%ts_float32_r[rank]

PRO-INT

%ts_int32_r[rank]

%ts_bool_r[rank]

PRO-INT

%ts_int32_r[rank]

%ts_int8_r[rank]

PRO-INT

%ts_int32_r[rank]

%ts_int16_r[rank]

PRO-FP

%ts_int32_r[rank]

%ts_float16_r[rank]

EXT-BF16

%ts_int32_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_int32_r[rank]

%ts_float32_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_int8_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_int16_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_int32_r[rank]

EXT-BF16 and EXT-FP8E4M3

%ts_bfloat16_r[rank]

%ts_fp8e4m3_r[rank]

EXT-BF16 and EXT-FP8E5M2

%ts_bfloat16_r[rank]

%ts_fp8e5m2_r[rank]

EXT-BF16

%ts_bfloat16_r[rank]

%ts_float32_r[rank]

EXT-FP8E4M3

%ts_fp8e4m3_r[rank]

%ts_float16_r[rank]

EXT-BF16 and EXT-FP8E4M3

%ts_fp8e4m3_r[rank]

%ts_bfloat16_r[rank]

EXT-FP8E4M3

%ts_fp8e4m3_r[rank]

%ts_float32_r[rank]

EXT-FP8E5M2

%ts_fp8e5m2_r[rank]

%ts_float16_r[rank]

EXT-BF16 and EXT-FP8E5M2

%ts_fp8e5m2_r[rank]

%ts_bfloat16_r[rank]

EXT-FP8E5M2

%ts_fp8e5m2_r[rank]

%ts_float32_r[rank]

PRO-FP

%ts_float16_r[rank]

%ts_int8_r[rank]

PRO-FP

%ts_float16_r[rank]

%ts_int16_r[rank]

PRO-FP

%ts_float16_r[rank]

%ts_int32_r[rank]

EXT-FP8E4M3

%ts_float16_r[rank]

%ts_fp8e4m3_r[rank]

EXT-FP8E5M2

%ts_float16_r[rank]

%ts_fp8e5m2_r[rank]

PRO-FP

%ts_float16_r[rank]

%ts_float32_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_int8_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_int16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_int32_r[rank]

EXT-FP8E4M3

%ts_float32_r[rank]

%ts_fp8e4m3_r[rank]

EXT-FP8E5M2

%ts_float32_r[rank]

%ts_fp8e5m2_r[rank]

EXT-BF16

%ts_float32_r[rank]

%ts_bfloat16_r[rank]

PRO-FP

%ts_float32_r[rank]

%ts_float16_r[rank]

RESCALE

RESCALE

scale32: if (scale32) mul_t=i32_t else mul_t=i16_t
scale32 must come from a constant instruction of the following type:

OpTypeBool

rounding_mode: Select rounding mode
rounding_mode must come from a constant instruction of the following type:

OpTypeInt 32 0

See rounding_mode_t for valid values.

per_channel: if (per_channel) NC=shape[rank(shape)-1] else NC=1
per_channel must come from a constant instruction of the following type:

OpTypeBool

input_unsigned: If True, treat the input values as unsigned.
input_unsigned must come from a constant instruction of the following type:

OpTypeBool

output_unsigned: If True, treat the output values as unsigned.
output_unsigned must come from a constant instruction of the following type:

OpTypeBool

input: Input tensor
input shape: shape
input must come from an instruction of the following type:

OpTypeTensorARM %etype [rank]

multiplier: Scaling multiplier array
multiplier shape: [NC]
multiplier must come from a constant instruction of the following type:

%etype = OpTypeInt 32 0
OpTypeTensorARM %etype 1

shift: Scaling shift array
shift shape: [NC]
shift must come from a constant instruction of the following type:

%etype = OpTypeInt 8 0
OpTypeTensorARM %etype 1

input_zp: Input tensor zero point. int8/uint8 can have zero point within their valid range. uint16 zero point must be either 0 or 32768. All other types must have zero point equal to 0.
input_zp shape: [1]
input_zp must come from a constant instruction of the following type:

OpTypeTensor %etype [rank] [shape]

output_zp: Output tensor zero point.int8/uint8 can have zero point within their valid range. uint16 zero point must be either 0 or 32768. All other types must have zero point equal to 0.
output_zp shape: [1]
output_zp must come from a constant instruction of the following type:

OpTypeTensor %etype [rank] [shape]

output: Output tensor with the same shape as input

OpTypeTensorARM %etype [rank]

15

12

<id>
Result Type

Result <id>

Extended instructions set <id>

65

<id>
scale32

<id>
rounding_mode

<id>
per_channel

<id>
input_unsigned

<id>
output_unsigned

<id>
input

<id>
multiplier

<id>
shift

<id>
input_zp

<id>
output_zp

Type support

%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%int64 = OpTypeInt 64 0
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int8_s1 = OpTypeTensor %int8 [rank] [shape]
%ts_int16_s1 = OpTypeTensor %int16 [rank] [shape]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_s1 = OpTypeTensor %int32 [rank] [shape]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_int64_r[rank] = OpTypeTensorARM %int64 [rank]
%ts_int64_s1 = OpTypeTensor %int64 [rank] [shape]

Enabling profile/extension

input

input_zp

output_zp

output

PRO-INT

%ts_int8_r[rank]

%ts_int8_s1

%ts_int8_s1

%ts_int8_r[rank]

PRO-INT

%ts_int8_r[rank]

%ts_int8_s1

%ts_int16_s1

%ts_int16_r[rank]

PRO-INT

%ts_int8_r[rank]

%ts_int8_s1

%ts_int32_s1

%ts_int32_r[rank]

PRO-INT

%ts_int16_r[rank]

%ts_int16_s1

%ts_int8_s1

%ts_int8_r[rank]

PRO-INT

%ts_int16_r[rank]

%ts_int16_s1

%ts_int16_s1

%ts_int16_r[rank]

PRO-INT

%ts_int16_r[rank]

%ts_int16_s1

%ts_int32_s1

%ts_int32_r[rank]

PRO-INT

%ts_int32_r[rank]

%ts_int32_s1

%ts_int8_s1

%ts_int8_r[rank]

PRO-INT

%ts_int32_r[rank]

%ts_int32_s1

%ts_int16_s1

%ts_int16_r[rank]

PRO-INT

%ts_int32_r[rank]

%ts_int32_s1

%ts_int32_s1

%ts_int32_r[rank]

EXT-INT16

%ts_int64_r[rank]

%ts_int64_s1

%ts_int8_s1

%ts_int8_r[rank]

EXT-INT16

%ts_int64_r[rank]

%ts_int64_s1

%ts_int16_s1

%ts_int16_r[rank]

EXT-INT16

%ts_int64_r[rank]

%ts_int64_s1

%ts_int32_s1

%ts_int32_r[rank]

Enumerated types

resize_mode_t

Name Value Description Required TOSA extension

NEAREST_NEIGHBOR

1

Nearest neighbor resize

BILINEAR

2

Bilinear resize

acc_type_t

Name Value Description Required TOSA extension

INT32

1

32-bit integer

FP16

2

16-bit floating-point

FP32

3

32-bit floating-point

INT48

4

48-bit integer

nan_propagation_mode_t

Name Value Description Required TOSA extension

PROPAGATE

1

NaN is returned when the operation has a NaN

IGNORE

2

NaN is ignored when the operation has a NaN. NaN is produced if and only if all operands are NaN

rounding_mode_t

Name Value Description Required TOSA extension

SINGLE_ROUND

1

Perform single rounding.

INEXACT_ROUND

2

Allow rounding results to be inexact.

EXT-INEXACTROUND

DOUBLE_ROUND

3

Perform double rounding.

EXT-DOUBLEROUND

Revision history

  • Revision 1 - 2025-09-12 - Kevin Petit

    • First public revision