Format Version: 2.0

Document Revision: draft9

Editor: Mark Callow (Edgewise Consulting)

Abstract

The KTX file format, version 2 is a format for storing textures for OpenGL®, OpenGL ES™️, Vulkan® and WebGL™️ applications. It is distinguished by the simplicity of the loader required to instantiate texture objects from the file contents.

It extends the version 1 format with support for easier loading of Vulkan textures, easier use by non-OpenGL and non-Vulkan applications, support for streaming and supercompression.

Status of this document

Ready for trial implementation. Some changes may still be made to the DFD support but the spec. is otherwise stable.

1. Introduction

This document describes the KTX file format version 2.0, hereafter referred to as KTX2™️. KTX2 files are used for storing textures for use with 3D API’s such as OpenGL, OpenGL ES and Vulkan.

The canonical version of the specification is available in the Khronos Registry (https://www.khronos.org/registry). The source files used to generate the specification are stored in the KTX-Specification Repository (https://github.com/KhronosGroup/KTX-Specification). The source repository has a public issue tracker and allows the submission of pull requests that improve the specification.

1.1. Document Conventions

The KTX2 specification is intended for use by both creators and consumers of KTX2 files forming a contract between these parties. Specification text may address either party; typically the intended audience can be inferred from context

1.1.1. Normative Terminology

Within this specification, the key words must, required, should, recommended, may, and optional are to be interpreted as described in Key words for use in RFCs to Indicate Requirement Levels [RFC2119]. In text addressing creators, their use expresses requirements that apply to the files produced. In text addressing consumers, their use expresses requirements that must be followed when, e.g, uploading the textures via a 3D API.

1.1.2. Admonitions

Notes are non-normative and give further background information such as rationales.
Tips are non-normative and give helpful suggestions for implementers.
Cautions are normative, giving restrictions that must be followed.

2. File Structure

Basic Structure
Byte[12] identifier
UInt32 vkFormat
UInt32 typeSize
UInt32 pixelWidth
UInt32 pixelHeight
UInt32 pixelDepth
UInt32 numberOfArrayElements
UInt32 numberOfFaces
UInt32 numberOfMipLevels
UInt32 supercompressionScheme
UInt64 bytesOfImages
UInt64 bytesOfUncompressedImages

// Index (1)
UInt32 dataFormatDescriptorOffset
UInt32 bytesOfDataFormatDescriptor
UInt32 keyValueDataOffset
UInt32 bytesOfKeyValueData
UInt64 supercompressionGlobalDataOffset
UInt64 bytesOfSupercompressionGlobalData
struct {
    UInt64 offset
    UInt64 bytesOfImages
    UInt64 bytesOfUncompressedImages
} levels[max(1, numberOfMipLevels)]

// Data Format Descriptor (2)
UInt32 dfdTotalSize
continue
    dfDescriptorBlock dfdBlock
          ︙
until bytesOfDataFormatDescriptor read

// Key/Value Data (3)
continue
    UInt32   keyAndValueByteSize
    Byte     keyAndValue[keyAndValueByteSize]
    Byte     valuePadding[3 - (keyAndValueByteSize + 3) % 4]
                    
    UInt32   keyAndValueByteSize
    Byte     keyAndValue[keyAndValueByteSize]
until bytesOfKeyValueData read
Byte keyValuePadding[7 - (bytesOfKeyValueData + 7) % 8]

// Supercompression Global Data (4)
Byte supercompressionGlobalData[bytesOfSupercompressionsGlobalData]
Byte sgdPadding[7 - (bytesOfSupercompressionGlobalData + 7) % 8]

// Mip Level Array (5)
for each mip_level in numberOfMipLevels (6)
    Byte levelImages[bytesOfLevelImages] (7)
    Byte mipPadding[7 - (bytesOfLevelImages + 7) % 8]
end
1 Required. See Section 3.11, “Index”.
2 Required. See Section 3.12, “Data Format Descriptor”.
3 Required. See Section 3.13, “Key/Value Data”.
4 Not required. See Section 3.14, “Supercompression Global Data”.
5 Required. See Section 3.15, “Mip Level Array”.
6 Replace with 1 if numberOfMipLevels is 0
7 See the levelImages structure below.

After inflation from supercompression or when supercompressionScheme == 0, levelImages looks like this:

levelImages Structure
for each array_element in numberOfArrayElements (1)
   for each face in numberOfFaces
       for each z_slice_of_blocks in num_blocks_z (2)
           for each row_of_blocks in num_blocks_y (2)
               for each block in num_blocks_x (2)
                   Byte data[format-specific-number-of-bytes] (3)
               end
           end
       end
   end
end
1 Replace with 1 if numberOfArrayElements is 0.
2 See the definitions below.
3 Rows of uncompressed texture images must be tightly packed, equivalent to a GL_UNPACK_ALIGNMENT of 1.

In the levelImages loops above,

\[num\_blocks\_z = \max\left(1, \left\lceil{\frac{pixelDepth}{block\_depth}}\right\rceil\right)\]
\[num\_blocks\_y = \max\left(1, \left\lceil{\frac{pixelHeight}{block\_height}}\right\rceil\right)\]
\[num\_blocks\_x = \left\lceil{\frac{pixelWidth}{block\_width}}\right\rceil\]

where block_depth, block_height, and block_width are 1 for uncompressed formats and the block size in that dimension for block compressed formats as given in the format’s section of the Khronos Data Format specification [KDF13].

A block is a single pixel for uncompressed formats and \(block\_width \times block\_height \times block\_depth\) pixels for block compressed formats.

3. Field Descriptions

3.1. identifier

The file identifier is a unique set of bytes that will differentiate the file from other types of files. It consists of 12 bytes, as follows:

Byte[12] FileIdentifier = {
  0xAB, 0x4B, 0x54, 0x58, 0x20, 0x32, 0x32, 0xBB, 0x0D, 0x0A, 0x1A, 0x0A
}

This can also be expressed using C-style character definitions as:

Byte[12] FileIdentifier = {
  '«', 'K', 'T', 'X', ' ', '2', '2', '»', '\r', '\n', '\x1A', '\n'
}

The rationale behind the choice of values in the identifier is based on the rationale for the identifier in the PNG specification. This identifier both identifies the file as a KTX file and provides for immediate detection of common file-transfer problems.

  • Byte [0] is chosen as a non-ASCII value to reduce the probability that a text file may be misrecognized as a KTX file.

  • Byte [0] also catches bad file transfers that clear bit 7.

  • Bytes [1..6] identify the format, and are the ascii values for the string "KTX 22".

  • Byte [7] is for aesthetic balance with byte 1 (they are a matching pair of double-angle quotation marks).

  • Bytes [8..9] form a CR-LF sequence which catches bad file transfers that alter newline sequences.

  • Byte [10] is a control-Z character, which stops file display under MS-DOS, and further reduces the chance that a text file will be falsely recognised.

  • Byte [11] is a final line feed, which checks for the inverse of the CR-LF translation problem.

3.2. vkFormat

vkFormat specifies the image format using Vulkan VkFormat enum values. It can be any value defined in core Vulkan 1.1 [VULKAN11], future core versions or by a registered Vulkan extension. Values defined by core Vulkan 1.1 are given in section 30.3.1 Format Definition of [VULKAN11]. The list of registered extensions is provided in the Khronos Vulkan Registry. A complete list of values defined by both core Vulkan 1.1 and extensions can be found in section 35.4.1 Format Definition of [VULKAN11EXT].

The section number given for [VULKAN11EXT] is as of this writing (Vulkan 1.1.96). It is subject to change as future extensions are added to the document but the link should remain valid as it is to an internal anchor.

vkFormat can be VK_FORMAT_UNDEFINED (0) if the format of the data is a not a recognized Vulkan format. The data layout is always given by the Data Format Descriptor.

Values listed in Table 1, “Prohibited Formats” must not be used nor any *SCALED* formats added in future. The table in Appendix A, Mapping of vkFormat values gives the mapping for all VkFormat enum values in Vulkan 1.1 core and the extensions known at the time of writing, to the equivalent OpenGL format (internal format, format and type values), DXGI_FORMAT and MTLPixelFormat. Applications must use these mappings. If Appendix A, Mapping of vkFormat values does not have an entry for the value of vkFormat, and a mapping for one or more of the other APIs exists the KTX2 writer must provide that mapping using one or more of the metadata items described in Section 5.3, “Format Mapping”. This includes the case of VK_FORMAT_UNDEFINED.

There are not yet Vulkan extensions for the ASTC HDR and 3D formats described in OES_texture_compression_ASTC [OES_ASTC]. ASTC formats are indicated in the DFD by setting color_model to KHR_DF_MODEL_ASTC (= 162). HDR data is indicated by setting the channel_id KHR_DF_SAMPLE_DATATYPE_FLOAT bit to 1. The block size is given by the values of texture_block_dimension_0 and texture_block_dimension_1 and an ASTC 3D texture is indicated by texel_block_dimension_2 > 0. Tools handling ASTC and OpenGL loaders must be be able to recognize these formats from the DFD.

Before loading any image, Vulkan loaders should confirm via vkGetPhysicalDeviceFormatProperties that the Vulkan physical device (VkDevice) supports the the intended use of the format.

Vulkan applications using a core Vulkan format whose name has the _BLOCK suffix must ensure they enable the corresponding textureCompression* physical device feature at VkDevice creation time. Those using formats defined by extensions must ensure they enable the defining extension at VkDevice creation time.

Vulkan applications handling textures whose formats are not known at VkDevice creation time are recommended to enable all available texture compression features and format defining extensions when creating a device.

Table 1. Prohibited Formats
Format Name Value

VK_FORMAT_A8B8G8R8_UNORM_PACK32

51

VK_FORMAT_A8B8G8R8_SNORM_PACK32

52

VK_FORMAT_A8B8G8R8_UINT_PACK32

55

VK_FORMAT_A8B8G8R8_SINT_PACK32

56

VK_FORMAT_A8B8G8R8_SRGB_PACK32

57

VK_FORMAT_R8_USCALED

11

VK_FORMAT_R8_SSCALED

12

VK_FORMAT_R8G8_USCALED

18

VK_FORMAT_R8G8_SSCALED

19

VK_FORMAT_R8G8B8_USCALED

25

VK_FORMAT_R8G8B8_SSCALED

26

VK_FORMAT_B8G8R8_USCALED

32

VK_FORMAT_B8G8R8_SSCALED

33

VK_FORMAT_R8G8B8A8_USCALED

39

VK_FORMAT_R8G8B8A8_SSCALED

40

VK_FORMAT_B8G8R8A8_USCALED

46

VK_FORMAT_B8G8R8A8_SSCALED

47

VK_FORMAT_A8B8G8R8_USCALED_PACK32

53

VK_FORMAT_A8B8G8R8_SSCALED_PACK32

54

VK_FORMAT_A2R10G10B10_USCALED_PACK32

60

VK_FORMAT_A2R10G10B10_SSCALED_PACK32

61

VK_FORMAT_A2B10G10R10_USCALED_PACK32

66

VK_FORMAT_A2B10G10R10_SSCALED_PACK32

67

VK_FORMAT_R16_USCALED

72

VK_FORMAT_R16_SSCALED

73

VK_FORMAT_R16G16_USCALED

79

VK_FORMAT_R16G16_SSCALED

80

VK_FORMAT_R16G16B16_USCALED

86

VK_FORMAT_R16G16B16_SSCALED

87

VK_FORMAT_R16G16B16A16_USCALED

93

VK_FORMAT_R16G16B16A16_SSCALED

94

Rationale

The A8B8G8R8*PACK32 formats are prohibited because the end result is the same regardless of whether the data is treated as packed into 32-bits or as the equivalent R8G8B8A8 format, i.e. as an array of 4 bytes, and a Data Format Descriptor cannot distinguish between these cases.

The *SCALED* formats are prohibited because they are intended for vertex data, very few, if any, implementations support using them for texturing and a Data Format Descriptor cannot distinguish these from int values having the same bit pattern.

Legacy Formats

The legacy OpenGL & OpenGL ES formats specified by the following extensions, do not have equivalent Vulkan formats and are not supported.

  • OES_compressed_paletted_texture

  • AMD_compressed_3DC_texture

  • AMD_compressed_ATC_texture

  • 3DFX_texture_compression_FXT1

  • EXT_texture_compression_latc

Only a few of these formats can be described without an extended Data Format Descriptor so VK_FORMAT_UNDEFINED must not be used as a workaround.

This is felt to be an acceptable trade-off for simplifying this specification as the formats are not in wide use and applications needing them can use KTX version 1.

3.2.1. Depth and Stencil Formats

Despite that Vulkan requires separate uploads of depth and stencil components, combined depth/stencil pixel formats can be used with KTX.

Rationale

Other GPU APIs support combined uploads and given KTX data alignment it’s trivial to upload components separately in Vulkan.

Depth or stencil formats cannot be used for 3D textures.

VK_FORMAT_D16_UNORM_S8_UINT is defined as two 16-bit words per texel. The first word contains the D16 value. The second word contains the S8 value in the eight LSBs and zeros in the eight MSBs.

VK_FORMAT_D24_UNORM_S8_UINT is defined as one 32-bit word per texel with the S8 value in the eight LSBs of the word and the D24 value in the MSBs.

VK_FORMAT_X8_D24_UNORM_PACK32 is defined as one 32-bit word per texel with the D24 value in the LSBs of the word and zeros in the eight MSBs.

VK_FORMAT_D32_SFLOAT_S8_UINT is defined as two 32-bit words per texel. The first word contains the floating-point D32 value. The second word contains the S8 value in the eight LSBs and zeros in the MSBs.

VK_FORMAT_S8_UINT, VK_FORMAT_D16_UNORM, and VK_FORMAT_D32_SFLOAT are defined as in [VULKAN11EXT].

3.3. typeSize

typeSize specifies the data type size that should be used when the texture data must be endian converted. Software on big-endian systems will need to this as all data in a KTX file is little endian. For formats whose Vulkan names have the suffix _BLOCK it must equal 1. For formats with the suffix _PACKxx it must equal the value of \(xx / 8\). For unpacked formats, except combined depth/stencil formats, it must equal the number of bytes needed for a single component which can be derived from the format name. E.g for VK_FORMAT_R16G16B16_UNORM it will be \(16 / 8\). This means it will equal 1 for any format with 8-bit components. For the combined depth/stencil formats using the layouts defined in this specification, the value will be 4.

Rationale

The type size can be calculated from the Data Format Descriptor but the calculation is not straightforward. Although big-endian machines are in the minority we have chosen to provide a useful piece of data for them instead of the 4 bytes of padding that would otherwise be needed for proper alignment of bytesOfImages.

3.4. pixelWidth, pixelHeight, pixelDepth

The size of the texture image for level 0, in pixels.

Image dimensions must adhere to format-specific requirements, including:

  • width and height being multiples of 4 for BCn and ETC1/ETC2/EAC formats;

  • width, height, and depth being multiples of the corresponding block size dimensions for ASTC formats;

  • various restrictions for PVRTC formats (see [PVRTC], [PVRTC1_OES], and [PVRTC2_OES]).

For 1D textures pixelHeight and pixelDepth must be 0. For 2D and cube textures pixelDepth must be 0.

pixelWidth cannot be 0.

pixelDepth must be 0 for depth or stencil formats.

3.5. numberOfArrayElements

numberOfArrayElements specifies the number of array elements. If the texture is not an array texture, numberOfArrayElements must equal 0.

Although current graphics APIs do not support 3D array textures, KTX files can be used to store them.

Refer to Section 4.3, “Texture Type” for more details about valid values.

3.6. numberOfFaces

numberOfFaces specifies the number of cubemap faces. For cubemaps and cubemap arrays this must be 6. For non cubemaps this must be 1. Cube map faces are stored in the order: +X, -X, +Y, -Y, +Z, -Z.

Applications wanting to store incomplete cubemaps should flatten faces into a 2D array and use the metadata described in Section 5.1, “KTXcubemapIncomplete” to signal which faces are present.

3.7. numberOfMipLevels

numberOfMipLevels specifies the number of levels in the Mip Level Array and, by extension, the number of indices in the levels array. A KTX file does not need to contain a complete mipmap pyramid. Mip level data is ordered from the level with the smallest size images, \(level_p\) to that with the largest size images, \(level_{base}\) where \(p = numberOfMipLevels - 1\) and \(base = 0\). \(level_p\) must not be greater than the maximum possible, \(level_{max}\), where

\[max = \log _2\left(\max\left(pixelWidth, pixelHeight, pixelDepth\right)\right)\]

\(numberOfMipLevels = 1\) means that a file contains only the first level and the texture isn’t meant to have other levels. E.g., this could be a LUT rather than a natural image.

\(numberOfMipLevels = 0\) is allowed, except for block-compressed formats, and means that a file contains only the first level and consumers, particularly loaders, should generate other levels if needed.

3.8. supercompressionScheme

supercompressionScheme indicates if an optional supercompression scheme has been applied to the data in levelImages structure. It must be one of the values from Table 2, “Supercompression Schemes”. A value of 0 indicates no supercompression.

Table 2. Supercompression Schemes
Scheme Id Scheme Name Level Data Format Global Data Format

0

None

n/a

n/a

1

Crunch CRN

T.B.C

T.B.C

2

ZLIB

[ZLIB]

n/a

3

Zstandard

[ZSTD]

n/a

4・・・0xffff

Reserved1

0x10000・・・0x1ffff

Reserved2

0x20000・・・0xffffffff

Reserved3

  1. Reserved for KTX use.

  2. Reserved for vendor compression schemes. A registry will be established from which vendors can request assignment of values thus avoiding conflicts.

  3. Reserved. Do not use.

The supercompression scheme is applied independently to each mip level to permit streaming and random access to the levels. The format of the data in levelImages structure for a scheme is specified in the reference given in the Level Data Format column of Table 2, “Supercompression Schemes”.

Schemes that require data global to all levels can store it as described in Section 3.14.1, “supercompressionGlobalData”. Currently only Crunch CRN uses global data. The format of the global data for a scheme is specified in the reference given in the Global Data Format column of Table 2, “Supercompression Schemes”.

When a supercompression scheme is used, the image data must be inflated from the scheme prior to GPU sampling.

LZW-style lossless supercompression, e.g, schemes 2 and 3, is generally ineffective on the block-compressed data of GPU texture formats. It is best reserved for use with uncompressed texture formats or with block-compressed data that has been specially optimized for LZW-style supercompression, such as by Crunch’s Rate Distortion Optimization mode [RDO].

Crunch CRN is specially designed for supercompression of some block-compressed texture formats.

3.8.1. Scheme Notes (Normative)

Crunch CRN
  • A file that specifies Crunch CRN with base formats other than ETC, ETC2 and BC[1-3] (S3TC_DXT[1-5]) must be considered invalid.

ZLIB
Zstandard
  • Only Zstandard frames are required. Inflators may skip Skippable frames.

  • Checksums are optional. If a checksum is present, inflators should verify it.

3.9. bytesOfImages

The total size of the image data. That is the sum of the bytesOfLevelImages within the Mip Level Array.

3.10. bytesOfUncompressedImages

The total size of the image data after expansion from supercompression. When supercompressionScheme = 0, bytesOfImages must have the same value as this.

3.11. Index

An index giving the byte offsets from the start of the file and byte sizes of the various sections of the KTX file.

3.11.1. dataFormatDescriptorOffset

The offset from the start of the file of the dfdTotalBytes field of the Data Format Descriptor.

3.11.2. bytesOfDataFormatDescriptor

The total number of bytes in the Data Format Descriptor including the dfdTotalSize field. bytesOfDataFormatDescriptor must equal dfdTotalSize.

This field is not necessary. Since no padding is needed for DFDs the value is easily calculated from the offsets. However, if it is removed, we would need 4 bytes of padding instead for proper alignment of supercompressionGlobalData. Retaining it means all sections of the file can be handled uniformly.

3.11.3. keyValueDataOffset

An arbitrary number of key/value pairs may follow the Index. These can be used to encode any arbitrary data. The keyValueDataOffset field gives the offset of this data, i.e. that of first key/value pair, from the start of the file.

3.11.4. bytesOfKeyValueData

The total number of bytes of key/value data including all keyAndValueByteSize fields, all keyAndValue fields and all valuePadding fields but not the keyValuePadding field.

3.11.5. supercompressionGlobalDataOffset

The offset from the start of the file of supercompressionGlobalData. The value must be 0 when bytesOfSupercompressionGlobalData = 0.

3.11.6. bytesOfSupercompressionGlobalData

The number of bytes of supercompressionGlobalData. It does not include sgdPadding. For most supercompression schemes the value is 0.

3.11.7. levels

An array giving the offset from the start of the file and compressed and uncompressed byte sizes of the image data for each mip level within the Mip Level Array The array is ordered starting with \(level_{base}\) (the level with the largest size images) at index 0. Image for \(level_p\) will be found at index p.

levels[n].offset

The offset from the start of the file of the first byte of image data for mip level n.

levels[n].bytesOfImages

The total size of the data for supercompressed mip level n.

levels[n].bytesOfImages is the number of bytes of pixel data in LOD \(level_n\). This includes all z slices, all faces, all rows (or rows of blocks) and all pixels (or blocks) in each row for the mip level.

If

\[\sum_{i=0}^{\max\left(1, numberOfMipLevels\right) - 1} level[i].bytesOfImages \neq bytesOfImages\]

the file is invalid.

3.11.8. levels[n].bytesOfUncompressedImages

The number of bytes of image data for mipmap level n after reflation from supercompression. When supercompressionScheme == 0, levels[n].bytesOfImages must have the same value as this.

levels[n].bytesOfUncompressedImages is the number of bytes of pixel data in LOD \(level_n\) after reflation from supercompression. This includes all z slices, all faces, all rows (or rows of blocks) and all pixels (or blocks) in each row for the mipmap level. It does not include any bytes in mipPadding.

The value of a level’s bytesOfUncompressedImages must satisfy the following condition:

bytesOfUncompressedImages % (numberOfFaces * max(1, numberOfArrayElements)) == 0

If

\[\sum_{i=0}^{\max\left(1, numberOfMipLevels\right) - 1} level[i].bytesOfUncompressedImages \neq bytesOfUncompressedImages\]

the file is invalid.

In versions of OpenGL < 4.5 and in OpenGL ES, faces of non-array cubemap textures (any texture where numberOfFaces is 6 and numberOfArrayElements is 0) must be uploaded individually. Loaders wishing to minimize the size of their intermediate buffers may want to read the faces individually rather then as a block of size level[n].bytesOfUncompressedImages.

3.12. Data Format Descriptor

These 3 items combined form a Data Format Descriptor (dfDescriptor) describing the layout of the texel blocks in data. The full specification for this is found in the Khronos Data Format Specification version 1.3 [KDF13].

If the dfDescriptor describes an sRGB transfer function then vkFormat must be one of the SRGB formats.

The dfDescriptor is partially expanded here in order to provide sufficient information for a KTX2 file to be parsed without having to refer to [KDF13]. It consists of one or more Descriptor Blocks (dfDescriptorBlock).

The dfDescriptor describes the texel blocks as they are when supercompressionScheme == 0 or after reflation when supercompressionScheme != 0.

Rationale

A dfDescriptor is useful in the following cases:

  • precise color management using the descriptor’s color space information,

  • easier use of the images by non-OpenGL and non-Vulkan applications. There will be no need for large tables to interpret format enums.

  • easier calculation of the offsets of each level, face and layer within the data. Again there will be no need for large tables.

3.12.1. dfdTotalSize

Called total_size in [KDF13], dfdTotalSize indicates the total number of bytes in the dfDescriptor including dfdTotalSize and all dfdBlock fields. bytesOfDataFormatDescriptor must equal dfdTotalSize.

If

\[dfdTotalSize \neq keyValueDataOffset - dataFormatDescriptorOffset\]

the file is invalid.

dfdTotalSize is included so that the KTX file contains a complete descriptor as defined in [KDF13].

3.12.2. dfdBlock

A Descriptor Block as defined in [KDF13], the high-order 16 bits of its first UInt32 are the descriptor_type and the high-order 16 bits of the second UInt32 are the descriptor_block_size. descriptor_block_sizes are mandated to be multiples of 4 which guarantees that the following keyAndValueByteSize will be aligned in a 32-bit word.

3.13. Key/Value Data

Key/Value data consists of a set of key/value pairs. The number of pairs is such that

\[\sum_{i=0}^{n-1} \left\lceil{\frac{keyAndValueByteSize[i]}{4}}\right\rceil * 4 + keyAndValueByteSize[n] = bytesOfKeyValueData.\]

Any file that does not meet the above condition is invalid.

KTX2 editors must preserve any key/value data they do not understand or which is not modified by the user.

Key/value data must be written to the file sorted by the Unicode code points of the keys starting from a key’s first character.

3.13.1. keyAndValueByteSize

The number of bytes of combined key and value data in one key/value pair. This includes the size of the key, the required NUL byte terminating the key, and all the bytes of data in the value. If the value is a UTF-8 string it should be NUL terminated and keyAndValueByteSize should include the NUL character (but code that reads KTX files must not assume that value fields are NUL terminated). keyAndValueByteSize does not include the bytes in valuePadding.

3.13.2. keyAndValue

keyAndValue contains 2 separate sections. First it contains a key encoded in UTF-8 without a byte order mark (BOM). The key must be terminated by a NUL character (a single 0x00 byte). Keys that begin with the 3 ASCII characters 'KTX' or 'ktx' are reserved and must not be used except as described by this specification (this version of the KTX spec. defines eight keys). Immediately following the NUL character that terminates the key is the Value data.

The Value data may consist of any arbitrary data bytes. Any byte value is allowed. It is encouraged that the value be a NUL terminated UTF-8 string without a BOM, but this is not required. If the Value data is binary, it is a sequence of bytes rather than of words. It is up to the vendor defining the key to specify how those bytes are to be interpreted (including the endianness of any encoded numbers). If the Value data is a string of bytes then the NUL termination should be included in the keyAndValueByteSize byte count (but programs that read KTX files must not rely on this).

3.13.3. valuePadding

Contains between 0 and 3 bytes of value 0x00 to ensure that the byte following the last byte in valuePadding is at a file offset that is a multiple of 4. This ensures that every keyAndValueByteSize field is 4-byte aligned. This padding is included in the bytesOfKeyValueData field but not the individual keyAndValueByteSize fields.

3.13.4. keyValuePadding

Contains between 0 and 7 bytes of value 0x00 to ensure that the following supercompressionGlobalData field is at a file offset that is a multiple of 8.

3.14. Supercompression Global Data

3.14.1. supercompressionGlobalData

An array of data used by certain supercompression schemes that must be available before any mip level can be expanded.

3.14.2. sgdPadding

Contains between 0 and 7 bytes of value 0x00 to ensure that mip level data starts at a file offset that is a multiple of 8.

3.15. Mip Level Array

Mip levels in the array are ordered from the level with the smallest size images, \(level_p\) to that with the largest size images, \(level_{base}\).

Rationale

When streaming a KTX file, sending smaller mip levels first can be used together with, e.g., the GL_TEXTURE_MAX_LEVEL and GL_TEXTURE_BASE_LEVEL texture parameters or appropriate region setting in a VkCmdCopyBufferToImage, to display a low resolution image quickly without waiting for the entire texture data.

3.15.1. levelImages

levelImages is an array of Bytes holding all the image data for a level.

When supercompressionScheme != 0 these bytes are formatted as specified in the scheme documentation.

3.16. mipPadding

mipPadding is between 0 and 7 bytes of value 0x00 to make sure that all mip level data starts at a file offset that is a multiple of 8.

4. General comments

4.1. Endianness

KTX 2.0 files are little endian. All header fields and the data for all uncompressed texture formats are stored in little endian order. Readers on big-endian machines must endian convert all header UInt32s and UInt64s and, when typeSize > 1, all data to big endian. The data of block compressed formats does not need endian converting. When data is being converted the Data Format Descriptor must also be rewritten as it describes the data as laid out in memory. Writers must endian convert these items to little endian on writing the file. Sample code for rewriting DFDs is given in Appendix C, Data Format Descriptor Endian Conversion.

4.2. Packing

Rows of uncompressed pixel data are tightly packed. Each row in memory immediately follows the end of the preceding row. I.e the data must be packed according to the rules described in section 8.4.4.1 Unpacking of the OpenGL 4.6 specification [OPENGL46] with GL_UNPACK_ROW_LENGTH = 0 and GL_UNPACK_ALIGNMENT = 1.

4.3. Texture Type

The type of texture can be determined from the following table. Any other combination of parameters makes the KTX file invalid.

Type pixelWidth pixelHeight pixelDepth Section 3.5, “numberOfArrayElements” Section 3.6, “numberOfFaces”

1D

> 0

0

0

0

1

2D

> 0

> 0

0

0

1

3D

> 0

> 0

> 0

0

1

Cubemap

> 0

> 0

0

0

6

1D Array

> 0

0

0

> 0

1

2D Array

> 0

> 0

0

> 0

1

3D Array

> 0

> 0

> 0

> 0

1

Cubemap Array

> 0

> 0

0

> 0

6

5. Predefined Key/Value Pairs

5.1. KTXcubemapIncomplete

A KTX file can be used to store an incomplete cubemap or an array of incomplete cubemaps. In such a case, numberOfFaces must be 1 and numberOfArrayElements must be equal to the number of faces present (in case of a single cubemap) or to the number of faces present times the number of cubemaps (in case of a cubemap array). The faces that are present must be indicated using the metadata key

  • KTXcubemapIncomplete

The value is a one-byte bitfield defined as:

00xxxxx1 - +X is present
00xxxx1x - -X is present
00xxx1xx - +Y is present
00xx1xxx - -Y is present
00x1xxxx - +Z is present
001xxxxx - -Z is present

Any value, not matching the mask above is invalid.

At least one face must be present (i.e., value cannot be 0).

Within the levelImages structure structure, faces must be written in the same order as with complete cubemaps: +X, -X, +Y, -Y, +Z, -Z.

When a texture is a cubemap array, missing/present faces must be the same for each element.

5.2. KTXorientation

Texture data in a KTX file are arranged so that the first pixel in the data stream for each face and/or array element is closest to the origin of the texture coordinate system. In OpenGL that origin is conventionally described as being at the lower left, but this convention is not shared by all image file formats and content creation tools, so there is abundant room for confusion.

The desired texture axis orientation is often predetermined by, e.g. a content creation tool’s or existing application’s use of the image. Therefore it is strongly recommended that tools for generating and manipulating KTX files clearly describe their behaviour, and provide an option to specify the texture axis origin and orientation relative to the logical orientation of the source image. At minimum they should provide a choice between top-left and bottom-left as origin for 2D source images, with the positive S axis pointing right. Where possible, the preferred default is to use the logical upper-left corner of the image as the texture origin. Note that this is contrary to the standard interpretation of GL texture coordinates. However, most other APIs and the majority of texture compression tools use this convention.

When writing the logical orientation to the KTX file’s metadata, image manipulation tools and viewers must use the key

  • KTXorientation

Note that this metadata affects only the logical interpretation of the data and has no effect on the mapping from pixels in the file byte stream to texture coordinates.

The value is a NUL-terminated string formatted depending on the texture type.

Type Format ([REGEXP])

1D

/^[rl]$/

2D

/^[rl][du]$/

3D

/^[rl][du][oi]$/

where

  • r indicates S values increasing to the right

  • l indicates S values increasing to the left

  • d indicates T values increasing downwards

  • u indicates T values increasing upwards

  • o indicates R values increasing out from the screen (moving towards viewer)

  • i indicates R values increasing in towards the screen (moving away from viewer)

When a texture is an array, all its elements have the same orientation.

Values not matching the table above are invalid.

It is recommended that viewing and editing tools support at least the following values:

  • rd

  • ru

  • rdi

  • ruo

Although other orientations can be represented, it is recommended that tools that create KTX files use only the values listed above as other values may not be widely supported by other tools.

5.3. Format Mapping

When Appendix A, Mapping of vkFormat values does not have an entry for the value of vkFormat, which will happen for newly addded Vulkan formats, the KTX writer must provide any known mapping via the following key-value pairs.

Note that the length of these keys, including the terminating NUL, is a multiple of 4 bytes so the values will be 4-byte aligned.

5.3.1. KTXglFormat

For OpenGL {,ES} the mapping is specified with the key

  • KTXglFormat

The value is 12 bytes representing 3 Uint32 values:

UInt32 glInternalformat
UInt32 glFormat
UInt32 glType

For compressed formats, glFormat and glType must be set to zero; and glInternalformat must be used for providing mapping.

5.3.2. KTXdxgiFormat__

For Direct3D the mapping is specified with the key

  • KTXdxgiFormat__

The value is a UInt32 (4 bytes) giving the format enum value.

5.3.3. KTXmetalPixelFormat

For Metal, the mapping is specified with the key

  • KTXmetalPixelFormat

The value is a UInt32 (4 bytes) giving the format enum value.

5.4. KTXswizzle

Desired component mapping for a texture can be indicated with the key

  • KTXswizzle

The value is a four-byte NUL-terminated string formatted as ([REGEXP]):

  • /^[rgba01]{4}$/

where each symbol represents source component (or fixed value) that is used for red, green, blue, and alpha values, thus rgba being a default swizzling state.

For example, rg01 means:

  • the red and green channels are sampled from the red and green texture components respectively;

  • the blue channel is set to zero, ignoring texture data;

  • the alpha channel is set to one (fully saturated), ignoring texture data.

When a channel is not present in the texture, a value of 0 must be used for colors (red, green, and blue) and a value of 1 (fully saturated) must be used for alpha.

This metadata has no effect on depth or stencil texture formats.

5.4.1. Common Mappings

Use the following formats and swizzles to map alpha-only, luminance and luminance-alpha formats.

Alpha8

vkFormat: VK_FORMAT_R8_UNORM (9)
KTXswizzle: 000r

Luminance8

vkFormat: VK_FORMAT_R8_UNORM (9)
KTXswizzle: rrr1

Luminance8Alpha8

vkFormat: VK_FORMAT_R8G8_UNORM (16)
KTXswizzle: rrrg

Loaders may opt to detect these cases and use API-provided enums when available, e.g. for the first case GL_ALPHA8 (when using compatibility profile), MTLPixelFormatA8Unorm or DXGI_FORMAT_A8_UNORM.

5.5. KTXwriter

KTX file writers must identify themselves by including a value with the key

  • KTXwriter

The value can be any UTF-8 string that will uniquely identify the tool writing the file, for example:

  • AcmeCo TexTool v1.0

Only the most recent writer should be identified. Editing tools must overwrite this value when rewriting a file originally written by a different tool.

5.6. KTXastcDecodeRGB9E5

KTX file containing ASTC HDR data that is compatible with rgb9e5 decoding mode (as defined in [VULKAN11EXT], VK_EXT_astc_decode_mode), may indicate that with the key

  • KTXastcDecodeRGB9E5

This metadata entry has no value.

6. An example KTX file:

TBC

7. IANA Mime-Type Registration Information

TBC

8. Issues

  1. How to refer to the DF descriptor block?

    Discussion: There is no such data type as dfDesriptorBlock but using primitive types would effectively mean repeating the definition of a descriptor block here which we do not want to do.

    Resolved: Show that dfDescriptorBlock is used as a shorthand for [KDF13]'s Descriptor block.

  2. How to handle endianness of the DF descriptor block?

    Discussion: The DF spec says data structures are assumed to be little-endian for purposes of data transfer. This is incompatible with the net which is big-endian and incompatible with endianness. What should we do?

    _Resolved._All fields and data in KTX files will be little endian as that is the endianness of the vast majority of machines.

  3. Can we guarantee the DF descriptor blocks are always a multiple of 4 bytes?

    Discussion The Khronos Basic Data Format Descriptor Block is a multiple of 4 bytes (24 + 16 x number of samples). Is there anything to require that extensions' block sizes be a multiple of 4 bytes? Need to maintain alignment.

    Resolved: The Data Format Specification will be updated to recommend but not require padding. This spec. will require padding.

  4. Should KTX2 support level sizes > 4GB?

    Discussion: Users have reported having base levels > 4GB for 3D textures. For this the imageSize field needs to be 64-bits. Loaders on 32-bit systems will have to ensure correct handling of this and check that imageSize <= 4GB, before loading.

    Resolved: Be future proof and make all image-size related fields 64 bits.

  5. Should KTX2 provide a way to distinguish between rectangle and regular 2D textures?

    Discussion: The difference is that unnormalized texel coordinates are used for sampling via a special sampler type in GLSL and, in the case of OpenGL {,ES}, the special TEXTURE_RECTANGLE target is used. If needed this could be supported by a metadata item instructing to use unnormalized texel coordinates.

    Unresolved:

  6. Should KTX2 provide a way to distinguish between 1D textures and buffer textures?

    Discussion: The difference is how you use the data in OpenGL. With buffer textures the image data is stored in a buffer object. Note that a TextureView can be used to give a different view of the data so supporting buffer textures probably requires metadata to indicate a preferred view as well as metadata to indicate the data should be loaded in a buffer.

    Unresolved:

  7. Should KTX2 drop the gl* fields?

    Discussion: Narrowing down and enforcing the valid combinations of glFormat, glInternalFormat and glType is fraught with issues. The spec. could be simplified by dropping them and having only vkFormat. The spec can include a table showing a standard mapping from the vkFormat value to a glInternalFormat, glFormat and glType combination.

    Resolved: Drop the gl* fields. OpenGL and OpenGL ES loaders can include code to do the mapping based on table which will be added to the spec. Such code is estimated to be about 6 kbytes.

  8. Use alphanumeric characters or binary values for component swizzles?

    Discussion: Values in the swizzle metadata could be either a character from the set [01rgba] or numeric values corresponding to the VkComponentSwizzle enum values from 0 to 6. In the latter case values could be expressed in binary or as numeric characters. The GL token values have been eliminated from this choice because they are not user friendly.

    Resolved: Use alphanumeric characters from the set [01rgba].

  9. Is anything needed to support sparse textures?

    Discussion: Sparse textures are provided by the GL_ARB_sparse_textures extension and are a standard feature of Vulkan. Are any additional KTX features needed to support them?

    Unresolved:

  10. Should KTX2 support metadata for effective use of Vulkan SCALED formats?

    Discussion: Vulkan SCALED formats convert int (or uint) values to unnormalized floating point values, equivalent to specifying a value of GL_FALSE for the normalized parameter to glVertexAttribFormat. Generally when using such data, associated scale and bias values are folded into the transformation matrix. Should KTX2 specify standard metadata for these?

    Resolved: No. These formats will not be supported. They are primarily for vertex data and several Vulkan vendors have said they can’t support them as texture formats. Also a DFD cannot distinguish these from int values having the same bit pattern.

  11. Should the supercompression scheme be applied per-mip-level?

    Discussion: Should each mip level be supercompressed independently or should the scheme, zlib, zstd, etc., be applied to all levels as a unit? The latter may result in slightly smaller size though that is unclear. However it would also mean levels could not be streamed or randomly accessed.

    Resolved: Yes. The benefits of streaming and random access outweigh what is expected to be a small increase in size.

  12. Should we remove row padding from uncompressed image data?

    Discussion: Row padding was added to KTX so that data would have the default GL_UNPACK_ALIGNMENT of 4, which was chosen to help speed up DMA of rows by the GPU. Modern architectures are apparently not sensitive to this as evidenced by Vulkan deliberately omitting any equivalent of GL_UNPACK_ALIGNMENT. Thus an annoying chunk of code is required to upload row-padded images to Vulkan.

    Resolved: Remove this and cube padding. Formats that would need padding have texel sizes that are less than 4 bytes so no benefit is obtained by starting cube faces or rows of such images at 4-byte multiples.

  13. Should we require content checksums anywhere?

    Discussion: Modern transmission mechanisms, e.g, HTTP2, provide good robustness so checksums are less important than they used to be. Some supercompressions schemes have checksum which may be optional.

    Resolved: No. We can rely on modern transmission mechanisms. However if the supercompression scheme includes a checksum readers should verify it.

9. References

Normative References

The Vulkan 1.1 references are to living documents that are updated weekly with corrections, clarifications and, in the case of [VULKAN11EXT], newly released extensions. References to the specifications do not imply that KTX header field values are limited solely to those in the referenced sections or tables. These values may be supplemented by extensions or new versions. They also do not imply that all of the texture types can be loaded in any particular version of OpenGL {,ES} or Vulkan.

Non-Normative References

Appendix A: Mapping of vkFormat values

Table 3. Mapping of vkFormat values to OpenGL, Direct3D and Metal

Appendix B: Changes compared to KTX

  • vkFormat added.

  • OpenGL format information fields removed.

  • Data format descriptor added.

  • Supercompression added.

  • Files always little endian.

  • Swizzle and writer id metadata added.

  • Row and cube padding removed.

Appendix C: Data Format Descriptor Endian Conversion

// To be written.

Revision History

Document Revision Date Remark

draft0

2017-12-08

First incarnation.

draft1

2018-01-02

Update issue discussions and change OpenGL references to 4.6.

draft2

2018-02-10

Clarify relation to Data Format Descriptor spec. Add global compression. Update issues.

draft3

2018-06-14

Remove glBaseInternalFormat. Add zstd global compression option and issue 11. Add copyright & license.

draft4

2018-06-26

Add acknowledgements.

draft5

2018-07-26

Change all size & offset fields to 64-bit. Change global compression to supercompression. Add supercompressionGlobalData, level index and writer id. Define interactions with paletted textures. Remove cubePadding.

draft6

2018-10-03

Remove rowPadding. Use registered trademarks. Improve supercompression section & add references. Add internal xrefs. Update issues.

draft7

2018-10-14

Answer questions re. supercompression posed in draft 6 & finish section. Fix scheme numbers after ANS removal. Alphabetize references. Improve wording and formatting. Change status.

draft8

2018-10-26

Change status back to not ready for implementation in view of issue #8.

draft9

2019-02-27

Use Khronos style sheet. Drop GL format info. Add index for direct access to data. Change to little endian. Specify padding values. Remove ambiguity and potential conflicts.

Acknowledgements

Thanks to Manmohan Bishnoi for designing the KTX file and application icons.

Thanks to Alexey Knyazev for enormous help tightening the specification and removing potential conflicts.

Thanks to David Wilkinson for chairing the effort.

License