Tensor Tiling Library
 
Loading...
Searching...
No Matches
v7F17Om_TTL Directory Reference
Directory dependency graph for v7F17Om_TTL:

Directories

 c
 
 doc
 
 import_export
 
 opencl
 
 pipelines
 
 scripts
 
 tensors
 

Files

 TTL.h
 
 TTL_core.h
 
 TTL_create_type.h
 
 TTL_create_types.h
 
 TTL_debug.h
 
 TTL_import_export.h
 
 TTL_macros.h
 
 TTL_pipeline_schemes.h
 
 TTL_tensors.h
 
 TTL_tiles.h
 
 TTL_trace_macros.h
 
 TTL_types.h
 

Detailed Description

Tensor Tiling Library

License Issues

Tensor & Tiling Library is an open-source library to enable the efficient tiling and compute with tensors.

Please reach out to chris.nosp@m..gea.nosp@m.ring@.nosp@m.mobi.nosp@m.leye..nosp@m.com or ayal..nosp@m.zaks.nosp@m.@mobi.nosp@m.leye.nosp@m..com for more information.

This document outlines the purpose of this sample implementation and provides build and execution instructions.

Contents

Purpose

The purpose of this software package is to provide a simple to use standardized way of tiling tensors to allow performance optimization in heterogenous machines. Tiling is one of the pillars of performance and the Tensor Tiling Library is intended to provide a standardized approach.

The library is intended to be general purpose and usable on all architectures.

Currently the Tensor Tiling Library:

Example

This is a double tiling example where the data is simultaneously moved from host<->device whilst the compute is occurring.

#include "TTL/TTL.h"
__kernel void TTL_double_buffering(__global uchar *restrict ext_base_in, int external_stride_in,
__global uchar *restrict ext_base_out, int external_stride_out, int width,
int height, int tile_width, int tile_height) {
__local uchar l_in1[MEMSZ];
__local uchar l_in2[MEMSZ];
__local uchar l_out1[MEMSZ];
__local uchar l_out2[MEMSZ];
// Logical Tiling
const TTL_shape_t global_tensor = TTL_create_shape(width, height);
const TTL_shape_t tile_shape = TTL_create_shape(tile_width, tile_height);
const TTL_tiler_t tiler = TTL_create_tiler(global_tensor, tile_shape);
// External layouts.
const TTL_layout_t ext_layout_in = TTL_create_layout(external_stride_in);
const TTL_layout_t ext_layout_out = TTL_create_layout(external_stride_out);
const TTL_const_ext_tensor_t ext_input_tensor =
TTL_create_const_ext_tensor(ext_base_in, global_tensor, ext_layout_in);
const TTL_ext_tensor_t ext_output_tensor = TTL_create_ext_tensor(ext_base_out, global_tensor, ext_layout_out);
// TTL_start_import_double_buffering will being the import of the first tile
TTL_event_t import_DB_e = TTL_get_event();
TTL_import_double_buffering_t import_db = TTL_start_import_double_buffering(
l_in1, l_in2, ext_input_tensor, &import_DB_e, TTL_get_tile(0, tiler));
TTL_event_t export_DB_e = TTL_get_event();
TTL_export_double_buffering_t export_db =
TTL_start_export_double_buffering(l_out1, l_out2, ext_output_tensor, &export_DB_e);
for (int i = 0; i < TTL_number_of_tiles(tiler); ++i) {
TTL_tile_t tile_next_import = TTL_get_tile(i + 1, tiler);
TTL_tile_t tile_current_export = TTL_get_tile(i, tiler);
// These wait for the current transfers to complete, and begin the next
TTL_int_sub_tensor_t imported_to = TTL_step_buffering(&import_db, tile_next_import);
TTL_int_sub_tensor_t exported_from = TTL_step_buffering(&export_db, tile_current_export);
// Compute whilst the transfers are taking place (on separate buffers)
compute(imported_to, exported_from);
}
// These wait for the last transfers to complete.
TTL_finish_buffering(&import_db);
TTL_finish_buffering(&export_db);
}
static void TTL_finish_buffering(TTL_import_double_const_void_tensor_buffering_t *import_double_buffering)
static TTL_int_void_sub_tensor_t TTL_step_buffering(TTL_import_double_const_void_tensor_buffering_t *const db, const TTL_tile_t next_tile)
Wait for the previous import operation to complete before beginning an import of the next tile.
static TTL_export_double_const_void_tensor_buffering_t TTL_start_export_double_buffering(__local void *int_base1, __local void *int_base2, TTL_ext_void_tensor_t ext_tensor, TTL_event_t *event)
Create a TTL_export_double_buffering_t and begin the buffering process.
static TTL_import_double_const_void_tensor_buffering_t TTL_start_import_double_buffering(__local void *int_base1, __local void *int_base2, TTL_const_ext_void_tensor_t ext_tensor, TTL_event_t *event, TTL_tile_t first_tile)
Create a TTL_import_double_buffering_t and begin the buffering process.
TTL_const_ext_void_tensor_t TTL_const_ext_tensor_t
TTL_ext_void_tensor_t TTL_ext_tensor_t
static TTL_ext_void_tensor_t TTL_create_ext_tensor(__global void *base, const TTL_shape_t shape, const TTL_layout_t layout, const TTL_offset_t offset, const TTL_dim_t elem_size)
const and non-const tensor creation functions.
static TTL_const_ext_void_tensor_t TTL_create_const_ext_tensor(__global const void *base, const TTL_shape_t shape, const TTL_layout_t layout, const TTL_offset_t offset, const TTL_dim_t elem_size)
create TTL_create_int_tensor_impl
TTL_int_void_sub_tensor_t TTL_int_sub_tensor_t
static TTL_layout_t TTL_create_layout(void)
Create a 1D Description of a Tensor layout in memory.
static TTL_tile_t TTL_get_tile(const int tile_id, const TTL_tiler_t tiler)
Return the tile_id'th tile of a tile array in row-major order.
Definition TTL_tiles.h:319
static int TTL_number_of_tiles(TTL_tiler_t tiler)
Return the number of tiles that this tile can produce.
Definition TTL_tiles.h:161
static TTL_tiler_t TTL_create_tiler(const TTL_shape_t shape, const TTL_shape_t tile)
Definition TTL_tiles.h:229
event_t TTL_event_t
TTL_event_t is a pseudonym for OpenCL event_t.
#define __global
The opencl __global namespace is not supported in C.
Definition c/TTL_types.h:26
#define __local
The opencl __local namespace is not supported in C.
Definition c/TTL_types.h:27
unsigned char uchar
opencl and so TTL supports a type called uchar which is not part of C
Definition c/TTL_types.h:25
static TTL_event_t TTL_get_event()
Return an empty event of type TTL_event_t.
Description of a Tensor layout in memory.
Description of a Shape.
TTL_tiler_t is the basic unit that describes how a tile is subdivided.
Definition TTL_tiles.h:135
static TTL_shape_t TTL_create_shape(TTL_dim_t width, TTL_dim_t height, TTL_dim_t depth)
Create a description of a Shape.

Doxygen

Doxygen is supported and can be build using the scripts/generate_doxygen.sh script. It is built automatically by github for main and the latest version can be found at https://github.khronos.org/OpenCL-TTL/

Building And Executing

The sample implementation builds under POCL on x86 and demonstrates a number of buffering schemes.

It builds using CMake and has been tested on Linux.

CMake

Tested Supported Systems

Requirements

Building the Samples

Execute the following commands:

$ cd opencl/samples
$ ./TTL_sample_runner.py *.cl
$ cd ../c/samples
$ ./TTL_sample_runner.py *.c

Installation

See INSTALL.

Preprocessed Header

TTL it very type strong and uses a lot of macros to create many variants of each method. Whilst powerful this can make debugging tricker. For these reason the ability of preprocess TTL to stdout or the file provided.

$ scripts/preprocess.sh [-f TTL_opencl.h] [-t opencl]
$ scripts/preprocess.sh [-f TTL_c.h] [-t c]

-f defaults to /dev/stdout -t defaults to opencl

Will output a processed, clang-formatted file to stdout of the given file. Replacing TTL.h with this file, can make life easier. See also the TTL_PRE_GENERATE option in INSTALL.

Included Unit Tests

See the test README.

Bug Reporting

Bug reports can be reported by filing an issue on GitHub.