Overview
This document specifies SPIR-V extended instructions that correspond to TOSA operators for use within SPIR-V graphs. The focus of this document is to specify how TOSA operators are represented by SPIR-V extended instructions. For the full details of the TOSA operators, please consult the TOSA specification.
This instruction set can be imported as follows:
%tosa = OpExtInstImport "TOSA.001000.1"
The name of the instruction set follows this convention:
TOSA.<tosa-version-major><tosa-version-minor>.<instruction-set-revision>
where:
-
<tosa-version-major>
is the major version of TOSA that the instruction set uses, encoded using 3 decimal digits -
<tosa-version-minor>
is the minor version of TOSA that the instruction set uses, encoded using 3 decimal digits -
<instruction-set-revision>
is the revision of the instruction set.
Operator mapping
Each TOSA operator maps to a single extended instruction in this instruction set.
Argument mapping
TOSA operators have arguments that fall into three categories:
-
Attribute arguments map to operands to the SPIR-V instructions. Operands that correspond to attribute arguments always come first and must come from constant instructions.
-
Input arguments map to operands to the SPIR-V instructions. Operands that correspond to input arguments come after operands that correspond to attribute arguments.
-
Output arguments map to the return value of the SPIR-V instructions. Since SPIR-V instructions can only return one value, the return type of SPIR-V instructions for TOSA operators that have multiple output arguments is an
OpTypeStruct
that groups all the outputs.
Type mapping
TOSA types are mapped to SPIR-V as follows:
-
bool_t
is mapped toOpTypeBool
-
i8_t
is mapped toOpTypeInt 8 0
-
i16_t
is mapped toOpTypeInt 16 0
-
i32_t
is mapped toOpTypeInt 32 0
-
i48_t
is mapped toOpTypeInt 64 0
(this is not a typo) -
fp16_t
is mapped toOpTypeFloat 16
-
fp32_t
is mapped toOpTypeFloat 32
-
bf16_t
is mapped toOpTypeFloat 16 BFloat16KHR
-
shape_t
is mapped to rank-1 tensors of 32-bit integers -
fp8e4m3_t
is mapped toOpTypeFloat 8 Float8E4M3EXT
-
fp8e5m2_t
is mapped toOpTypeFloat 8 Float8E5M2EXT
Tensor types are mapped to OpTypeTensorARM
.
Tensor lists use OpTypeStruct
to be able to accommodate heterogeneous tensor types.
Profiles and extensions
TOSA profiles are independent sets of operations and data type combinations. This instruction set supports the following TOSA profiles:
-
PRO-INT
-
PRO-FP
TOSA profile extensions define additional operations and data type combinations. This instruction set supports the following TOSA profile extensions:
-
EXT-INT16
-
EXT-BF16
-
EXT-FP8E4M3
-
EXT-FP8E5M2
-
EXT-FFT
-
EXT-DOUBLEROUND
-
EXT-INEXACTROUND
Type combinations
Most TOSA operators only support specific type combinations. This specification documents the valid type combinations for each instruction under a "Type support" section. These sections begin with a list of type definitions that are referred to in the tables that list all valid type combinations. Specific type combinations are enabled by one or more TOSA profiles or extensions.
Operators
Tensor operators
ARGMAX
ARGMAX
nan_mode: PROPAGATE or IGNORE. Set to PROPAGATE by default.
This attribute affects the floating-point NaN propagation approach. This attribute is ignored by non floating-point types.
See nan_propagation_mode_t for valid values. input: Input tensor
output: Output tensor, with rank = rank(shape1) - 1
|
||||||||
8 |
12 |
<id> |
Result <id> |
Extended instructions set <id> |
0 |
<id> |
<id> |
<id> |
Type support
%int8 = OpTypeInt 8 0
%int32 = OpTypeInt 32 0
%int16 = OpTypeInt 16 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_fp8e4m3_r[rank] = OpTypeTensorARM %fp8e4m3 [rank]
%ts_fp8e5m2_r[rank] = OpTypeTensorARM %fp8e5m2 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input |
output |
PRO-INT |
|
|
EXT-INT16 |
|
|
EXT-FP8E4M3 |
|
|
EXT-FP8E5M2 |
|
|
PRO-FP |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
AVG_POOL2D
AVG_POOL2D
stride: [stride_y, stride_x]
pad: [pad_top, pad_bottom, pad_left, pad_right]
acc_type: Enumerated type, must be one of INT32, FP16, FP32 matching the type of acc_t in the Supported Data Types table for this operation
See acc_type_t for valid values. input: Input tensor
input_zp: Input tensor zero point. Must be zero for non-int8 types.
output_zp: Output tensor zero point. Must be zero for non-int8 types.
output: Output tensor 4D
|
||||||||||||
12 |
12 |
<id> |
Result <id> |
Extended instructions set <id> |
1 |
<id> |
<id> |
<id> |
<id> |
<id> |
<id> |
<id> |
Type support
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int8_r4 = OpTypeTensorARM %int8 4
%ts_int8_s1 = OpTypeTensor %int8 [rank] [shape]
%ts_int16_r4 = OpTypeTensorARM %int16 4
%ts_int16_s1 = OpTypeTensor %int16 [rank] [shape]
%ts_fp8e4m3_r4 = OpTypeTensorARM %fp8e4m3 4
%ts_fp8e4m3_s1 = OpTypeTensor %fp8e4m3 [rank] [shape]
%ts_fp8e5m2_r4 = OpTypeTensorARM %fp8e5m2 4
%ts_fp8e5m2_s1 = OpTypeTensor %fp8e5m2 [rank] [shape]
%ts_float16_r4 = OpTypeTensorARM %float16 4
%ts_float16_s1 = OpTypeTensor %float16 [rank] [shape]
%ts_bfloat16_r4 = OpTypeTensorARM %bfloat16 4
%ts_bfloat16_s1 = OpTypeTensor %bfloat16 [rank] [shape]
%ts_float32_r4 = OpTypeTensorARM %float32 4
%ts_float32_s1 = OpTypeTensor %float32 [rank] [shape]
Enabling profile/extension |
input |
input_zp |
output_zp |
output |
PRO-INT |
|
|
|
|
EXT-INT16 |
|
|
|
|
EXT-FP8E4M3 |
|
|
|
|
EXT-FP8E5M2 |
|
|
|
|
PRO-FP |
|
|
|
|
PRO-FP |
|
|
|
|
EXT-BF16 |
|
|
|
|
PRO-FP |
|
|
|
|
CONV2D
CONV2D
stride: [stride_y, stride_x]
dilation: [dilation_y, dilation_x]
acc_type: Enumerated type, must be one of INT32, INT48, FP16, FP32 matching the type of acc_t in the Supported Data Types table for this operation
See acc_type_t for valid values. local_bound: This optional attribute affects the floating-point compliance error bound.
The default of false allows for direct and transform based, fast convolution algorithms.
Only set to true if direct dot-product calculation precision is required.
input: Input tensor
weight: Weight kernel size KH x KW
bias: Per output channel bias data.
input_zp: Input tensor zero point. Must be zero for non-int8 types.
weight_zp: Weight zero point. Must be zero for non-int8 types.
output: Output tensor
|
|||||||||||||||
15 |
12 |
<id> |
Result <id> |
Extended instructions set <id> |
2 |
<id> |
<id> |
<id> |
<id> |
<id> |
<id> |
<id> |
<id> |
<id> |
<id> |
Type support
%int8 = OpTypeInt 8 0
%int32 = OpTypeInt 32 0
%int16 = OpTypeInt 16 0
%int64 = OpTypeInt 64 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%float16 = OpTypeFloat 16
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int8_r4 = OpTypeTensorARM %int8 4
%ts_int32_r1 = OpTypeTensorARM %int32 1
%ts_int8_s1 = OpTypeTensor %int8 [rank] [shape]
%ts_int32_r4 = OpTypeTensorARM %int32 4
%ts_int16_r4 = OpTypeTensorARM %int16 4
%ts_int64_r1 = OpTypeTensorARM %int64 1
%ts_int16_s1 = OpTypeTensor %int16 [rank] [shape]
%ts_int64_r4 = OpTypeTensorARM %int64 4
%ts_fp8e4m3_r4 = OpTypeTensorARM %fp8e4m3 4
%ts_float16_r1 = OpTypeTensorARM %float16 1
%ts_fp8e4m3_s1 = OpTypeTensor %fp8e4m3 [rank] [shape]
%ts_float16_r4 = OpTypeTensorARM %float16 4
%ts_fp8e5m2_r4 = OpTypeTensorARM %fp8e5m2 4
%ts_fp8e5m2_s1 = OpTypeTensor %fp8e5m2 [rank] [shape]
%ts_float16_s1 = OpTypeTensor %float16 [rank] [shape]
%ts_bfloat16_r4 = OpTypeTensorARM %bfloat16 4
%ts_bfloat16_r1 = OpTypeTensorARM %bfloat16 1
%ts_bfloat16_s1 = OpTypeTensor %bfloat16 [rank] [shape]
%ts_float32_r4 = OpTypeTensorARM %float32 4
%ts_float32_r1 = OpTypeTensorARM %float32 1
%ts_float32_s1 = OpTypeTensor %float32 [rank] [shape]
Enabling profile/extension |
input |
weight |
bias |
input_zp |
weight_zp |
output |
PRO-INT |
|
|
|
|
|
|
EXT-INT16 |
|
|
|
|
|
|
EXT-FP8E4M3 |
|
|
|
|
|
|
EXT-FP8E5M2 |
|
|
|
|
|
|
PRO-FP |
|
|
|
|
|
|
PRO-FP |
|
|
|
|
|
|
EXT-BF16 |
|
|
|
|
|
|
PRO-FP |
|
|
|
|
|
|
CONV3D
CONV3D
stride: [stride_d, stride_y, stride_x]
dilation: [dilation_d, dilation_y, dilation_x]
acc_type: Enumerated type, must be one of INT32, INT48, FP16, FP32 matching the type of acc_t in the Supported Data Types table for this operation
See acc_type_t for valid values. local_bound: This optional attribute affects the floating-point compliance error bound.
The default of false allows for direct and transform based, fast convolution algorithms.
Only set to true if direct dot-product calculation precision is required.
input: Input tensor
weight: Weight kernel size KDxKHxKW
bias: Per output channel bias data.
input_zp: Input tensor zero point. Must be zero for non-int8 types.
weight_zp: Weight zero point. Must be zero for non-int8 types.
output: Output tensor
|
|||||||||||||||
15 |
12 |
<id> |
Result <id> |
Extended instructions set <id> |
3 |
<id> |
<id> |
<id> |
<id> |
<id> |
<id> |
<id> |
<id> |
<id> |
<id> |
Type support
%int8 = OpTypeInt 8 0
%int32 = OpTypeInt 32 0
%int16 = OpTypeInt 16 0
%int64 = OpTypeInt 64 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%float16 = OpTypeFloat 16
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int8_r5 = OpTypeTensorARM %int8 5
%ts_int32_r1 = OpTypeTensorARM %int32 1
%ts_int8_s1 = OpTypeTensor %int8 [rank] [shape]
%ts_int32_r5 = OpTypeTensorARM %int32 5
%ts_int16_r5 = OpTypeTensorARM %int16 5
%ts_int64_r1 = OpTypeTensorARM %int64 1
%ts_int16_s1 = OpTypeTensor %int16 [rank] [shape]
%ts_int64_r5 = OpTypeTensorARM %int64 5
%ts_fp8e4m3_r5 = OpTypeTensorARM %fp8e4m3 5
%ts_float16_r1 = OpTypeTensorARM %float16 1
%ts_fp8e4m3_s1 = OpTypeTensor %fp8e4m3 [rank] [shape]
%ts_float16_r5 = OpTypeTensorARM %float16 5
%ts_fp8e5m2_r5 = OpTypeTensorARM %fp8e5m2 5
%ts_fp8e5m2_s1 = OpTypeTensor %fp8e5m2 [rank] [shape]
%ts_float16_s1 = OpTypeTensor %float16 [rank] [shape]
%ts_bfloat16_r5 = OpTypeTensorARM %bfloat16 5
%ts_bfloat16_r1 = OpTypeTensorARM %bfloat16 1
%ts_bfloat16_s1 = OpTypeTensor %bfloat16 [rank] [shape]
%ts_float32_r5 = OpTypeTensorARM %float32 5
%ts_float32_r1 = OpTypeTensorARM %float32 1
%ts_float32_s1 = OpTypeTensor %float32 [rank] [shape]
Enabling profile/extension |
input |
weight |
bias |
input_zp |
weight_zp |
output |
PRO-INT |
|
|
|
|
|
|
EXT-INT16 |
|
|
|
|
|
|
EXT-FP8E4M3 |
|
|
|
|
|
|
EXT-FP8E5M2 |
|
|
|
|
|
|
PRO-FP |
|
|
|
|
|
|
PRO-FP |
|
|
|
|
|
|
EXT-BF16 |
|
|
|
|
|
|
PRO-FP |
|
|
|
|
|
|
DEPTHWISE_CONV2D
DEPTHWISE_CONV2D
stride: [stride_y, stride_x]
dilation: [dilation_y, dilation_x]
acc_type: Enumerated type, must be one of INT32, INT48, FP16, FP32 matching the type of acc_t in the Supported Data Types table for this operation
See acc_type_t for valid values. local_bound: This optional attribute affects the floating-point compliance error bound.
The default of false allows for direct and transform based, fast convolution algorithms.
Only set to true if direct dot-product calculation precision is required.
input: Input tensor
weight: Weight kernel size KH x KW
bias: Per output channel bias data.
input_zp: Input tensor zero point. Must be zero for non-int8 types.
weight_zp: Weight zero point. Must be zero for non-int8 types.
output: Output tensor
|
|||||||||||||||
15 |
12 |
<id> |
Result <id> |
Extended instructions set <id> |
4 |
<id> |
<id> |
<id> |
<id> |
<id> |
<id> |
<id> |
<id> |
<id> |
<id> |
Type support
%int8 = OpTypeInt 8 0
%int32 = OpTypeInt 32 0
%int16 = OpTypeInt 16 0
%int64 = OpTypeInt 64 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%float16 = OpTypeFloat 16
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int8_r4 = OpTypeTensorARM %int8 4
%ts_int32_r1 = OpTypeTensorARM %int32 1
%ts_int8_s1 = OpTypeTensor %int8 [rank] [shape]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_int16_r4 = OpTypeTensorARM %int16 4
%ts_int64_r1 = OpTypeTensorARM %int64 1
%ts_int16_s1 = OpTypeTensor %int16 [rank] [shape]
%ts_int64_r[rank] = OpTypeTensorARM %int64 [rank]
%ts_fp8e4m3_r4 = OpTypeTensorARM %fp8e4m3 4
%ts_float16_r1 = OpTypeTensorARM %float16 1
%ts_fp8e4m3_s1 = OpTypeTensor %fp8e4m3 [rank] [shape]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_fp8e5m2_r4 = OpTypeTensorARM %fp8e5m2 4
%ts_fp8e5m2_s1 = OpTypeTensor %fp8e5m2 [rank] [shape]
%ts_float16_r4 = OpTypeTensorARM %float16 4
%ts_float16_s1 = OpTypeTensor %float16 [rank] [shape]
%ts_bfloat16_r4 = OpTypeTensorARM %bfloat16 4
%ts_bfloat16_r1 = OpTypeTensorARM %bfloat16 1
%ts_bfloat16_s1 = OpTypeTensor %bfloat16 [rank] [shape]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r4 = OpTypeTensorARM %float32 4
%ts_float32_r1 = OpTypeTensorARM %float32 1
%ts_float32_s1 = OpTypeTensor %float32 [rank] [shape]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input |
weight |
bias |
input_zp |
weight_zp |
output |
PRO-INT |
|
|
|
|
|
|
EXT-INT16 |
|
|
|
|
|
|
EXT-FP8E4M3 |
|
|
|
|
|
|
EXT-FP8E5M2 |
|
|
|
|
|
|
PRO-FP |
|
|
|
|
|
|
PRO-FP |
|
|
|
|
|
|
EXT-BF16 |
|
|
|
|
|
|
PRO-FP |
|
|
|
|
|
|
FFT2D
Type support
%float32 = OpTypeFloat 32
%ts_float32_r3 = OpTypeTensorARM %float32 3
Enabling profile/extension |
input_real |
input_imag |
output_real |
output_imag |
EXT-FFT |
|
|
|
|
MATMUL
Type support
%int8 = OpTypeInt 8 0
%int32 = OpTypeInt 32 0
%int16 = OpTypeInt 16 0
%int64 = OpTypeInt 64 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%float16 = OpTypeFloat 16
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%float32 = OpTypeFloat 32
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%ts_int8_r3 = OpTypeTensorARM %int8 3
%ts_int8_s1 = OpTypeTensor %int8 [rank] [shape]
%ts_int32_r3 = OpTypeTensorARM %int32 3
%ts_int16_r3 = OpTypeTensorARM %int16 3
%ts_int16_s1 = OpTypeTensor %int16 [rank] [shape]
%ts_int64_r3 = OpTypeTensorARM %int64 3
%ts_fp8e4m3_r3 = OpTypeTensorARM %fp8e4m3 3
%ts_fp8e4m3_s1 = OpTypeTensor %fp8e4m3 [rank] [shape]
%ts_float16_r3 = OpTypeTensorARM %float16 3
%ts_fp8e5m2_r3 = OpTypeTensorARM %fp8e5m2 3
%ts_fp8e5m2_s1 = OpTypeTensor %fp8e5m2 [rank] [shape]
%ts_float16_s1 = OpTypeTensor %float16 [rank] [shape]
%ts_float32_r3 = OpTypeTensorARM %float32 3
%ts_bfloat16_r3 = OpTypeTensorARM %bfloat16 3
%ts_bfloat16_s1 = OpTypeTensor %bfloat16 [rank] [shape]
%ts_float32_s1 = OpTypeTensor %float32 [rank] [shape]
Enabling profile/extension |
A |
B |
A_zp |
B_zp |
output |
PRO-INT |
|
|
|
|
|
EXT-INT16 |
|
|
|
|
|
EXT-FP8E4M3 |
|
|
|
|
|
EXT-FP8E5M2 |
|
|
|
|
|
PRO-FP |
|
|
|
|
|
PRO-FP |
|
|
|
|
|
EXT-BF16 |
|
|
|
|
|
PRO-FP |
|
|
|
|
|
MAX_POOL2D
MAX_POOL2D
stride: [stride_y, stride_x]
pad: [pad_top, pad_bottom, pad_left, pad_right]
nan_mode: PROPAGATE or IGNORE. Set to PROPAGATE by default.
This attribute affects the floating-point NaN propagation approach. This attribute is ignored by non floating-point types.
See nan_propagation_mode_t for valid values. input: Input tensor 4D
output: Output tensor 4D
|
||||||||||
10 |
12 |
<id> |
Result <id> |
Extended instructions set <id> |
7 |
<id> |
<id> |
<id> |
<id> |
<id> |
Type support
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int8_r4 = OpTypeTensorARM %int8 4
%ts_int16_r4 = OpTypeTensorARM %int16 4
%ts_fp8e4m3_r4 = OpTypeTensorARM %fp8e4m3 4
%ts_fp8e5m2_r4 = OpTypeTensorARM %fp8e5m2 4
%ts_float16_r4 = OpTypeTensorARM %float16 4
%ts_bfloat16_r4 = OpTypeTensorARM %bfloat16 4
%ts_float32_r4 = OpTypeTensorARM %float32 4
Enabling profile/extension |
input |
output |
PRO-INT |
|
|
EXT-INT16 |
|
|
EXT-FP8E4M3 |
|
|
EXT-FP8E5M2 |
|
|
PRO-FP |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
RFFT2D
Type support
%float32 = OpTypeFloat 32
%ts_float32_r3 = OpTypeTensorARM %float32 3
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input_real |
output_real |
output_imag |
EXT-FFT |
|
|
|
TRANSPOSE_CONV2D
TRANSPOSE_CONV2D
stride: [stride_y, stride_x]
acc_type: Enumerated type, must be one of INT32, INT48, FP16, FP32 matching the type of acc_t in the Supported Data Types table for this operation
See acc_type_t for valid values. local_bound: This optional attribute affects the floating-point compliance error bound.
The default of false allows for direct and transform based, fast convolution algorithms.
Only set to true if direct dot-product calculation precision is required.
input: Input tensor
weight: Weight kernel size KH x KW
bias: Per output channel bias data.
input_zp: Input tensor zero point. Must be zero for non-int8 types.
weight_zp: Weight zero point. Must be zero for non-int8 types.
output: Output tensor
|
||||||||||||||
14 |
12 |
<id> |
Result <id> |
Extended instructions set <id> |
9 |
<id> |
<id> |
<id> |
<id> |
<id> |
<id> |
<id> |
<id> |
<id> |
Type support
%int8 = OpTypeInt 8 0
%int32 = OpTypeInt 32 0
%int16 = OpTypeInt 16 0
%int64 = OpTypeInt 64 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%float16 = OpTypeFloat 16
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int8_r4 = OpTypeTensorARM %int8 4
%ts_int32_r1 = OpTypeTensorARM %int32 1
%ts_int8_s1 = OpTypeTensor %int8 [rank] [shape]
%ts_int32_r4 = OpTypeTensorARM %int32 4
%ts_int16_r4 = OpTypeTensorARM %int16 4
%ts_int64_r1 = OpTypeTensorARM %int64 1
%ts_int16_s1 = OpTypeTensor %int16 [rank] [shape]
%ts_int64_r4 = OpTypeTensorARM %int64 4
%ts_fp8e4m3_r4 = OpTypeTensorARM %fp8e4m3 4
%ts_float16_r1 = OpTypeTensorARM %float16 1
%ts_fp8e4m3_s1 = OpTypeTensor %fp8e4m3 [rank] [shape]
%ts_float16_r4 = OpTypeTensorARM %float16 4
%ts_fp8e5m2_r4 = OpTypeTensorARM %fp8e5m2 4
%ts_fp8e5m2_s1 = OpTypeTensor %fp8e5m2 [rank] [shape]
%ts_float16_s1 = OpTypeTensor %float16 [rank] [shape]
%ts_bfloat16_r4 = OpTypeTensorARM %bfloat16 4
%ts_bfloat16_r1 = OpTypeTensorARM %bfloat16 1
%ts_bfloat16_s1 = OpTypeTensor %bfloat16 [rank] [shape]
%ts_float32_r4 = OpTypeTensorARM %float32 4
%ts_float32_r1 = OpTypeTensorARM %float32 1
%ts_float32_s1 = OpTypeTensor %float32 [rank] [shape]
Enabling profile/extension |
input |
weight |
bias |
input_zp |
weight_zp |
output |
PRO-INT |
|
|
|
|
|
|
EXT-INT16 |
|
|
|
|
|
|
EXT-FP8E4M3 |
|
|
|
|
|
|
EXT-FP8E5M2 |
|
|
|
|
|
|
PRO-FP |
|
|
|
|
|
|
PRO-FP |
|
|
|
|
|
|
EXT-BF16 |
|
|
|
|
|
|
PRO-FP |
|
|
|
|
|
|
Activation operators
CLAMP
CLAMP
max_val: Maximum clip value
nan_mode: PROPAGATE or IGNORE. Set to PROPAGATE by default.
This attribute affects the floating-point NaN propagation approach. This attribute is ignored by non floating-point types.
See nan_propagation_mode_t for valid values. input: Input tensor
output: Output tensor of same type and shape as input
|
|||||||||
9 |
12 |
<id> |
Result <id> |
Extended instructions set <id> |
10 |
<id> |
<id> |
<id> |
<id> |
Type support
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input |
output |
PRO-INT |
|
|
EXT-INT16 |
|
|
PRO-FP |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
ERF
6 |
12 |
<id> |
Result <id> |
Extended instructions set <id> |
11 |
<id> |
Type support
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input |
output |
PRO-FP |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
SIGMOID
6 |
12 |
<id> |
Result <id> |
Extended instructions set <id> |
12 |
<id> |
Type support
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input |
output |
PRO-FP |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
TANH
6 |
12 |
<id> |
Result <id> |
Extended instructions set <id> |
13 |
<id> |
Type support
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input |
output |
PRO-FP |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
Elementwise-binary operators
ADD
Type support
%int32 = OpTypeInt 32 0
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input1 |
input2 |
output |
PRO-INT, PRO-FP |
|
|
|
PRO-INT, PRO-FP |
|
|
|
PRO-FP |
|
|
|
EXT-BF16 |
|
|
|
PRO-FP |
|
|
|
ARITHMETIC_RIGHT_SHIFT
Type support
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
Enabling profile/extension |
input1 |
input2 |
output |
PRO-INT |
|
|
|
PRO-INT |
|
|
|
PRO-INT |
|
|
|
BITWISE_AND
Type support
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
Enabling profile/extension |
input1 |
input2 |
output |
PRO-INT |
|
|
|
PRO-INT |
|
|
|
PRO-INT |
|
|
|
BITWISE_OR
Type support
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
Enabling profile/extension |
input1 |
input2 |
output |
PRO-INT |
|
|
|
PRO-INT |
|
|
|
PRO-INT |
|
|
|
BITWISE_XOR
Type support
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
Enabling profile/extension |
input1 |
input2 |
output |
PRO-INT |
|
|
|
PRO-INT |
|
|
|
PRO-INT |
|
|
|
INTDIV
Type support
%int32 = OpTypeInt 32 0
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
Enabling profile/extension |
input1 |
input2 |
output |
PRO-INT, PRO-FP |
|
|
|
PRO-INT, PRO-FP |
|
|
|
LOGICAL_AND
Type support
%bool = OpTypeBool
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]
Enabling profile/extension |
input1 |
input2 |
output |
PRO-INT, PRO-FP |
|
|
|
PRO-INT, PRO-FP |
|
|
|
LOGICAL_LEFT_SHIFT
Type support
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
Enabling profile/extension |
input1 |
input2 |
output |
PRO-INT, PRO-FP |
|
|
|
PRO-INT, PRO-FP |
|
|
|
PRO-INT, PRO-FP |
|
|
|
PRO-INT, PRO-FP |
|
|
|
PRO-INT, PRO-FP |
|
|
|
PRO-INT, PRO-FP |
|
|
|
LOGICAL_RIGHT_SHIFT
Type support
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
Enabling profile/extension |
input1 |
input2 |
output |
PRO-INT, PRO-FP |
|
|
|
PRO-INT, PRO-FP |
|
|
|
PRO-INT, PRO-FP |
|
|
|
PRO-INT, PRO-FP |
|
|
|
PRO-INT, PRO-FP |
|
|
|
PRO-INT, PRO-FP |
|
|
|
LOGICAL_OR
Type support
%bool = OpTypeBool
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]
Enabling profile/extension |
input1 |
input2 |
output |
PRO-INT, PRO-FP |
|
|
|
PRO-INT, PRO-FP |
|
|
|
LOGICAL_XOR
Type support
%bool = OpTypeBool
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]
Enabling profile/extension |
input1 |
input2 |
output |
PRO-INT, PRO-FP |
|
|
|
PRO-INT, PRO-FP |
|
|
|
MAXIMUM
MAXIMUM
See nan_propagation_mode_t for valid values. input1: Input tensor
input2: Input tensor with the same rank as input1
output: Output tensor
|
||||||||
8 |
12 |
<id> |
Result <id> |
Extended instructions set <id> |
25 |
<id> |
<id> |
<id> |
Type support
%int32 = OpTypeInt 32 0
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input1 |
input2 |
output |
PRO-INT |
|
|
|
PRO-FP |
|
|
|
EXT-BF16 |
|
|
|
PRO-FP |
|
|
|
MINIMUM
MINIMUM
See nan_propagation_mode_t for valid values. input1: Input tensor
input2: Input tensor with the same rank as input1
output: Output tensor
|
||||||||
8 |
12 |
<id> |
Result <id> |
Extended instructions set <id> |
26 |
<id> |
<id> |
<id> |
Type support
%int32 = OpTypeInt 32 0
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input1 |
input2 |
output |
PRO-INT |
|
|
|
PRO-FP |
|
|
|
EXT-BF16 |
|
|
|
PRO-FP |
|
|
|
MUL
Type support
%int8 = OpTypeInt 8 0
%int32 = OpTypeInt 32 0
%int16 = OpTypeInt 16 0
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input1 |
input2 |
output |
PRO-INT |
|
|
|
PRO-INT |
|
|
|
PRO-INT, PRO-FP |
|
|
|
PRO-INT, PRO-FP |
|
|
|
PRO-FP |
|
|
|
EXT-BF16 |
|
|
|
PRO-FP |
|
|
|
POW
Type support
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input1 |
input2 |
output |
PRO-FP |
|
|
|
EXT-BF16 |
|
|
|
PRO-FP |
|
|
|
SUB
Type support
%int32 = OpTypeInt 32 0
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input1 |
input2 |
output |
PRO-INT, PRO-FP |
|
|
|
PRO-INT, PRO-FP |
|
|
|
PRO-FP |
|
|
|
EXT-BF16 |
|
|
|
PRO-FP |
|
|
|
TABLE
Type support
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int8_r1 = OpTypeTensorARM %int8 1
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int16_r1 = OpTypeTensorARM %int16 1
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
Enabling profile/extension |
input1 |
table |
output |
PRO-INT |
|
|
|
EXT-INT16 |
|
|
|
Elementwise-unary operators
ABS
6 |
12 |
<id> |
Result <id> |
Extended instructions set <id> |
31 |
<id> |
Type support
%int32 = OpTypeInt 32 0
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input1 |
output |
PRO-INT |
|
|
PRO-FP |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
BITWISE_NOT
6 |
12 |
<id> |
Result <id> |
Extended instructions set <id> |
32 |
<id> |
Type support
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
Enabling profile/extension |
input1 |
output |
PRO-INT |
|
|
PRO-INT |
|
|
PRO-INT |
|
|
CEIL
6 |
12 |
<id> |
Result <id> |
Extended instructions set <id> |
33 |
<id> |
Type support
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input1 |
output |
PRO-FP |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
CLZ
6 |
12 |
<id> |
Result <id> |
Extended instructions set <id> |
34 |
<id> |
Type support
%int32 = OpTypeInt 32 0
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
Enabling profile/extension |
input1 |
output |
PRO-INT |
|
|
COS
6 |
12 |
<id> |
Result <id> |
Extended instructions set <id> |
35 |
<id> |
Type support
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input1 |
output |
PRO-FP |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
EXP
6 |
12 |
<id> |
Result <id> |
Extended instructions set <id> |
36 |
<id> |
Type support
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input1 |
output |
PRO-FP |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
FLOOR
6 |
12 |
<id> |
Result <id> |
Extended instructions set <id> |
37 |
<id> |
Type support
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input1 |
output |
PRO-FP |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
LOG
6 |
12 |
<id> |
Result <id> |
Extended instructions set <id> |
38 |
<id> |
Type support
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input1 |
output |
PRO-FP |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
LOGICAL_NOT
6 |
12 |
<id> |
Result <id> |
Extended instructions set <id> |
39 |
<id> |
Type support
%bool = OpTypeBool
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]
Enabling profile/extension |
input1 |
output |
PRO-INT, PRO-FP |
|
|
PRO-INT, PRO-FP |
|
|
NEGATE
Type support
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int8_s1 = OpTypeTensor %int8 [rank] [shape]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int16_s1 = OpTypeTensor %int16 [rank] [shape]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_int32_s1 = OpTypeTensor %int32 [rank] [shape]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_float16_s1 = OpTypeTensor %float16 [rank] [shape]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_bfloat16_s1 = OpTypeTensor %bfloat16 [rank] [shape]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
%ts_float32_s1 = OpTypeTensor %float32 [rank] [shape]
Enabling profile/extension |
input1 |
input1_zp |
output_zp |
output |
PRO-INT |
|
|
|
|
PRO-INT |
|
|
|
|
PRO-INT |
|
|
|
|
PRO-FP |
|
|
|
|
EXT-BF16 |
|
|
|
|
PRO-FP |
|
|
|
|
RECIPROCAL
6 |
12 |
<id> |
Result <id> |
Extended instructions set <id> |
41 |
<id> |
Type support
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input1 |
output |
PRO-FP |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
RSQRT
6 |
12 |
<id> |
Result <id> |
Extended instructions set <id> |
42 |
<id> |
Type support
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input1 |
output |
PRO-FP |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
SIN
6 |
12 |
<id> |
Result <id> |
Extended instructions set <id> |
43 |
<id> |
Type support
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input1 |
output |
PRO-FP |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
Elementwise-ternary operators
SELECT
Type support
%bool = OpTypeBool
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input2 |
input3 |
output |
PRO-INT, PRO-FP |
|
|
|
PRO-INT, PRO-FP |
|
|
|
PRO-INT |
|
|
|
PRO-INT |
|
|
|
PRO-INT |
|
|
|
PRO-FP |
|
|
|
EXT-BF16 |
|
|
|
PRO-FP |
|
|
|
Comparison operators
EQUAL
Type support
%int32 = OpTypeInt 32 0
%bool = OpTypeBool
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input1 |
input2 |
output |
PRO-INT |
|
|
|
PRO-FP |
|
|
|
EXT-BF16 |
|
|
|
PRO-FP |
|
|
|
GREATER
Type support
%int32 = OpTypeInt 32 0
%bool = OpTypeBool
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input1 |
input2 |
output |
PRO-INT |
|
|
|
PRO-FP |
|
|
|
EXT-BF16 |
|
|
|
PRO-FP |
|
|
|
GREATER_EQUAL
Type support
%int32 = OpTypeInt 32 0
%bool = OpTypeBool
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input1 |
input2 |
output |
PRO-INT |
|
|
|
PRO-FP |
|
|
|
EXT-BF16 |
|
|
|
PRO-FP |
|
|
|
Reduction operators
REDUCE_ALL
Type support
%bool = OpTypeBool
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]
Enabling profile/extension |
input |
output |
PRO-INT, PRO-FP |
|
|
PRO-INT, PRO-FP |
|
|
REDUCE_ANY
Type support
%bool = OpTypeBool
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]
Enabling profile/extension |
input |
output |
PRO-INT, PRO-FP |
|
|
PRO-INT, PRO-FP |
|
|
REDUCE_MAX
REDUCE_MAX
nan_mode: PROPAGATE or IGNORE. Set to PROPAGATE by default.
This attribute affects the floating-point NaN propagation approach. This attribute is ignored by non floating-point types.
See nan_propagation_mode_t for valid values. input: Input tensor
output: Output tensor. Same rank as the input tensor.
|
||||||||
8 |
12 |
<id> |
Result <id> |
Extended instructions set <id> |
50 |
<id> |
<id> |
<id> |
Type support
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input |
output |
PRO-INT |
|
|
PRO-INT |
|
|
PRO-INT |
|
|
PRO-FP |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
REDUCE_MIN
REDUCE_MIN
nan_mode: PROPAGATE or IGNORE. Set to PROPAGATE by default.
This attribute affects the floating-point NaN propagation approach. This attribute is ignored by non floating-point types.
See nan_propagation_mode_t for valid values. input: Input tensor
output: Output tensor. Same rank as the input tensor.
|
||||||||
8 |
12 |
<id> |
Result <id> |
Extended instructions set <id> |
51 |
<id> |
<id> |
<id> |
Type support
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input |
output |
PRO-INT |
|
|
PRO-INT |
|
|
PRO-INT |
|
|
PRO-FP |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
REDUCE_PRODUCT
Type support
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input |
output |
PRO-FP |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
REDUCE_SUM
Type support
%int32 = OpTypeInt 32 0
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input |
output |
PRO-INT |
|
|
PRO-FP |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
Data-layout operators
CONCAT
Type support
%bool = OpTypeBool
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_fp8e4m3_r[rank] = OpTypeTensorARM %fp8e4m3 [rank]
%ts_fp8e5m2_r[rank] = OpTypeTensorARM %fp8e5m2 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input1 |
output |
PRO-INT, PRO-FP |
|
|
PRO-INT, PRO-FP |
|
|
PRO-INT |
|
|
EXT-INT16 |
|
|
PRO-INT |
|
|
EXT-FP8E4M3 |
|
|
EXT-FP8E5M2 |
|
|
PRO-FP |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
PAD
Type support
%bool = OpTypeBool
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]
%ts_bool_s1 = OpTypeTensor %bool [rank] [shape]
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int8_s1 = OpTypeTensor %int8 [rank] [shape]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int16_s1 = OpTypeTensor %int16 [rank] [shape]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_int32_s1 = OpTypeTensor %int32 [rank] [shape]
%ts_fp8e4m3_r[rank] = OpTypeTensorARM %fp8e4m3 [rank]
%ts_fp8e4m3_s1 = OpTypeTensor %fp8e4m3 [rank] [shape]
%ts_fp8e5m2_r[rank] = OpTypeTensorARM %fp8e5m2 [rank]
%ts_fp8e5m2_s1 = OpTypeTensor %fp8e5m2 [rank] [shape]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_float16_s1 = OpTypeTensor %float16 [rank] [shape]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_bfloat16_s1 = OpTypeTensor %bfloat16 [rank] [shape]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
%ts_float32_s1 = OpTypeTensor %float32 [rank] [shape]
Enabling profile/extension |
input1 |
pad_const |
output |
PRO-INT, PRO-FP |
|
|
|
PRO-INT, PRO-FP |
|
|
|
PRO-INT |
|
|
|
PRO-INT |
|
|
|
PRO-INT |
|
|
|
EXT-FP8E4M3 |
|
|
|
EXT-FP8E5M2 |
|
|
|
PRO-FP |
|
|
|
EXT-BF16 |
|
|
|
PRO-FP |
|
|
|
RESHAPE
Type support
%bool = OpTypeBool
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_fp8e4m3_r[rank] = OpTypeTensorARM %fp8e4m3 [rank]
%ts_fp8e5m2_r[rank] = OpTypeTensorARM %fp8e5m2 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input1 |
output |
PRO-INT, PRO-FP |
|
|
PRO-INT, PRO-FP |
|
|
PRO-INT |
|
|
PRO-INT |
|
|
PRO-INT |
|
|
EXT-FP8E4M3 |
|
|
EXT-FP8E5M2 |
|
|
PRO-FP |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
REVERSE
Type support
%bool = OpTypeBool
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_fp8e4m3_r[rank] = OpTypeTensorARM %fp8e4m3 [rank]
%ts_fp8e5m2_r[rank] = OpTypeTensorARM %fp8e5m2 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input1 |
output |
PRO-INT, PRO-FP |
|
|
PRO-INT, PRO-FP |
|
|
PRO-INT |
|
|
PRO-INT |
|
|
PRO-INT |
|
|
EXT-FP8E4M3 |
|
|
EXT-FP8E5M2 |
|
|
PRO-FP |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
SLICE
Type support
%bool = OpTypeBool
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_fp8e4m3_r[rank] = OpTypeTensorARM %fp8e4m3 [rank]
%ts_fp8e5m2_r[rank] = OpTypeTensorARM %fp8e5m2 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input1 |
output |
PRO-INT, PRO-FP |
|
|
PRO-INT, PRO-FP |
|
|
PRO-INT |
|
|
PRO-INT |
|
|
PRO-INT |
|
|
EXT-FP8E4M3 |
|
|
EXT-FP8E5M2 |
|
|
PRO-FP |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
TILE
Type support
%bool = OpTypeBool
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_fp8e4m3_r[rank] = OpTypeTensorARM %fp8e4m3 [rank]
%ts_fp8e5m2_r[rank] = OpTypeTensorARM %fp8e5m2 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input1 |
output |
PRO-INT, PRO-FP |
|
|
PRO-INT, PRO-FP |
|
|
PRO-INT |
|
|
PRO-INT |
|
|
PRO-INT |
|
|
EXT-FP8E4M3 |
|
|
EXT-FP8E5M2 |
|
|
PRO-FP |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
TRANSPOSE
Type support
%bool = OpTypeBool
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_fp8e4m3_r[rank] = OpTypeTensorARM %fp8e4m3 [rank]
%ts_fp8e5m2_r[rank] = OpTypeTensorARM %fp8e5m2 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
Enabling profile/extension |
input1 |
output |
PRO-INT, PRO-FP |
|
|
PRO-INT, PRO-FP |
|
|
PRO-INT |
|
|
PRO-INT |
|
|
PRO-INT |
|
|
EXT-FP8E4M3 |
|
|
EXT-FP8E5M2 |
|
|
PRO-FP |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
Scatter-gather operators
GATHER
Type support
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int8_r3 = OpTypeTensorARM %int8 3
%ts_int16_r3 = OpTypeTensorARM %int16 3
%ts_int32_r3 = OpTypeTensorARM %int32 3
%ts_fp8e4m3_r3 = OpTypeTensorARM %fp8e4m3 3
%ts_fp8e5m2_r3 = OpTypeTensorARM %fp8e5m2 3
%ts_float16_r3 = OpTypeTensorARM %float16 3
%ts_bfloat16_r3 = OpTypeTensorARM %bfloat16 3
%ts_float32_r3 = OpTypeTensorARM %float32 3
Enabling profile/extension |
values |
output |
PRO-INT |
|
|
PRO-INT |
|
|
PRO-INT |
|
|
EXT-FP8E4M3 |
|
|
EXT-FP8E5M2 |
|
|
PRO-FP |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
SCATTER
Type support
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int8_r3 = OpTypeTensorARM %int8 3
%ts_int16_r3 = OpTypeTensorARM %int16 3
%ts_int32_r3 = OpTypeTensorARM %int32 3
%ts_fp8e4m3_r3 = OpTypeTensorARM %fp8e4m3 3
%ts_fp8e5m2_r3 = OpTypeTensorARM %fp8e5m2 3
%ts_float16_r3 = OpTypeTensorARM %float16 3
%ts_bfloat16_r3 = OpTypeTensorARM %bfloat16 3
%ts_float32_r3 = OpTypeTensorARM %float32 3
Enabling profile/extension |
values_in |
input |
values_out |
PRO-INT |
|
|
|
PRO-INT |
|
|
|
PRO-INT |
|
|
|
EXT-FP8E4M3 |
|
|
|
EXT-FP8E5M2 |
|
|
|
PRO-FP |
|
|
|
EXT-BF16 |
|
|
|
PRO-FP |
|
|
|
Image operators
RESIZE
See resize_mode_t for valid values. input: Input tensor
scale: [scale_y_n, scale_y_d, scale_x_n, scale_x_d]
offset: [offset_y, offset_x]
border: [border_y, border_x]
output: Output tensor
|
||||||||||
10 |
12 |
<id> |
Result <id> |
Extended instructions set <id> |
63 |
<id> |
<id> |
<id> |
<id> |
<id> |
Type support
%int8 = OpTypeInt 8 0
%int32 = OpTypeInt 32 0
%int16 = OpTypeInt 16 0
%int64 = OpTypeInt 64 0
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%ts_int8_r4 = OpTypeTensorARM %int8 4
%ts_int32_r4 = OpTypeTensorARM %int32 4
%ts_int16_r4 = OpTypeTensorARM %int16 4
%ts_int64_r4 = OpTypeTensorARM %int64 4
%ts_float16_r4 = OpTypeTensorARM %float16 4
%ts_bfloat16_r4 = OpTypeTensorARM %bfloat16 4
%ts_float32_r4 = OpTypeTensorARM %float32 4
Enabling profile/extension |
input |
output |
PRO-INT |
|
|
PRO-INT |
|
|
EXT-INT16 |
|
|
EXT-INT16 |
|
|
PRO-FP |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
Type-conversion operators
CAST
6 |
12 |
<id> |
Result <id> |
Extended instructions set <id> |
64 |
<id> |
Type support
%bool = OpTypeBool
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%float16 = OpTypeFloat 16
%bfloat16 = OpTypeFloat 16 BFloat16KHR
%float32 = OpTypeFloat 32
%fp8e4m3 = OpTypeFloat 8 Float8E4M3EXT
%fp8e5m2 = OpTypeFloat 8 Float8E5M2EXT
%ts_bool_r[rank] = OpTypeTensorARM %bool [rank]
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_float16_r[rank] = OpTypeTensorARM %float16 [rank]
%ts_bfloat16_r[rank] = OpTypeTensorARM %bfloat16 [rank]
%ts_float32_r[rank] = OpTypeTensorARM %float32 [rank]
%ts_fp8e4m3_r[rank] = OpTypeTensorARM %fp8e4m3 [rank]
%ts_fp8e5m2_r[rank] = OpTypeTensorARM %fp8e5m2 [rank]
Enabling profile/extension |
input |
output |
PRO-INT |
|
|
PRO-INT |
|
|
PRO-INT |
|
|
PRO-INT |
|
|
PRO-INT |
|
|
PRO-INT |
|
|
PRO-FP |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
PRO-INT |
|
|
PRO-INT |
|
|
PRO-INT |
|
|
PRO-FP |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
PRO-INT |
|
|
PRO-INT |
|
|
PRO-INT |
|
|
PRO-FP |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
EXT-BF16 |
|
|
EXT-BF16 |
|
|
EXT-BF16 |
|
|
EXT-BF16 and EXT-FP8E4M3 |
|
|
EXT-BF16 and EXT-FP8E5M2 |
|
|
EXT-BF16 |
|
|
EXT-FP8E4M3 |
|
|
EXT-BF16 and EXT-FP8E4M3 |
|
|
EXT-FP8E4M3 |
|
|
EXT-FP8E5M2 |
|
|
EXT-BF16 and EXT-FP8E5M2 |
|
|
EXT-FP8E5M2 |
|
|
PRO-FP |
|
|
PRO-FP |
|
|
PRO-FP |
|
|
EXT-FP8E4M3 |
|
|
EXT-FP8E5M2 |
|
|
PRO-FP |
|
|
PRO-FP |
|
|
PRO-FP |
|
|
PRO-FP |
|
|
EXT-FP8E4M3 |
|
|
EXT-FP8E5M2 |
|
|
EXT-BF16 |
|
|
PRO-FP |
|
|
RESCALE
RESCALE
rounding_mode: Select rounding mode
See rounding_mode_t for valid values. per_channel: if (per_channel) NC=shape[rank(shape)-1] else NC=1
input_unsigned: If True, treat the input values as unsigned.
output_unsigned: If True, treat the output values as unsigned.
input: Input tensor
multiplier: Scaling multiplier array
shift: Scaling shift array
input_zp: Input tensor zero point. int8/uint8 can have zero point within their valid range. uint16 zero point must be either 0 or 32768. All other types must have zero point equal to 0.
output_zp: Output tensor zero point.int8/uint8 can have zero point within their valid range. uint16 zero point must be either 0 or 32768. All other types must have zero point equal to 0.
output: Output tensor with the same shape as input
|
|||||||||||||||
15 |
12 |
<id> |
Result <id> |
Extended instructions set <id> |
65 |
<id> |
<id> |
<id> |
<id> |
<id> |
<id> |
<id> |
<id> |
<id> |
<id> |
Type support
%int8 = OpTypeInt 8 0
%int16 = OpTypeInt 16 0
%int32 = OpTypeInt 32 0
%int64 = OpTypeInt 64 0
%ts_int8_r[rank] = OpTypeTensorARM %int8 [rank]
%ts_int8_s1 = OpTypeTensor %int8 [rank] [shape]
%ts_int16_s1 = OpTypeTensor %int16 [rank] [shape]
%ts_int16_r[rank] = OpTypeTensorARM %int16 [rank]
%ts_int32_s1 = OpTypeTensor %int32 [rank] [shape]
%ts_int32_r[rank] = OpTypeTensorARM %int32 [rank]
%ts_int64_r[rank] = OpTypeTensorARM %int64 [rank]
%ts_int64_s1 = OpTypeTensor %int64 [rank] [shape]
Enabling profile/extension |
input |
input_zp |
output_zp |
output |
PRO-INT |
|
|
|
|
PRO-INT |
|
|
|
|
PRO-INT |
|
|
|
|
PRO-INT |
|
|
|
|
PRO-INT |
|
|
|
|
PRO-INT |
|
|
|
|
PRO-INT |
|
|
|
|
PRO-INT |
|
|
|
|
PRO-INT |
|
|
|
|
EXT-INT16 |
|
|
|
|
EXT-INT16 |
|
|
|
|
EXT-INT16 |
|
|
|
|
Enumerated types
resize_mode_t
Name | Value | Description | Required TOSA extension |
---|---|---|---|
NEAREST_NEIGHBOR |
1 |
Nearest neighbor resize |
|
BILINEAR |
2 |
Bilinear resize |
acc_type_t
Name | Value | Description | Required TOSA extension |
---|---|---|---|
INT32 |
1 |
32-bit integer |
|
FP16 |
2 |
16-bit floating-point |
|
FP32 |
3 |
32-bit floating-point |
|
INT48 |
4 |
48-bit integer |
nan_propagation_mode_t
Name | Value | Description | Required TOSA extension |
---|---|---|---|
PROPAGATE |
1 |
NaN is returned when the operation has a NaN |
|
IGNORE |
2 |
NaN is ignored when the operation has a NaN. NaN is produced if and only if all operands are NaN |
rounding_mode_t
Name | Value | Description | Required TOSA extension |
---|---|---|---|
SINGLE_ROUND |
1 |
Perform single rounding. |
|
INEXACT_ROUND |
2 |
Allow rounding results to be inexact. |
EXT-INEXACTROUND |
DOUBLE_ROUND |
3 |
Perform double rounding. |
EXT-DOUBLEROUND |
Revision history
-
Revision 1 - 2025-09-12 - Kevin Petit
-
First public revision
-