SYCL and the SYCL logo are trademarks of the Khronos Group Inc.

## Coalesced Global Memory

## Learning Objectives * Learn about coalesced global memory access * Learn about the performance impact * Learn about row-major vs column-major * Learn about SoA vs AoS

#### Coalesced global memory

* Reading from and writing to global memory is generally very expensive. * It often involves copying data across an off-chip bus. * This means you generally want to avoid unnecessary accesses. * Memory access operations is done in chunks. * This means accessing data that is physically close together in memory is more efficient.

#### Coalesced global memory

![SYCL](../common-revealjs/images/coalesced_global_memory_1.png "SYCL")

#### Coalesced global memory

![SYCL](../common-revealjs/images/coalesced_global_memory_2.png "SYCL")

#### Coalesced global memory

![SYCL](../common-revealjs/images/coalesced_global_memory_3.png "SYCL")

#### Coalesced global memory

![SYCL](../common-revealjs/images/coalesced_global_memory_4.png "SYCL")

#### Coalesced global memory

![SYCL](../common-revealjs/images/coalesced_global_memory_5.png "SYCL")

#### Coalesced global memory

![SYCL](../common-revealjs/images/coalesced_global_memory_6.png "SYCL")

#### Row-major vs Column-major

* Coalescing global memory access is particularly important when working in multiple dimensions. * This is because when doing so you have to convert from a position in 2d space to a linear memory space. * There are two ways to do this; generally referred to as row-major and column-major.

#### Row-major vs Column-major

![SYCL](../common-revealjs/images/row_col_1.png "SYCL")

![SYCL](../common-revealjs/images/row_col_2.png "SYCL")

![SYCL](../common-revealjs/images/row_col_3.png "SYCL")

#### AoS vs SoA

* Another area this is a factor is when composing data structures. * It's often instinctive to have struct representing a collection of data and then have an array of this - often referred to as Array of Structs (AoS). * But for data parallel architectures such as a GPU it's more efficient to have sequential elements of the same type stored contiguously in memory - often referred to as Struct of Arrays (SoA).

#### AoS vs SoA

![SYCL](../common-revealjs/images/soa_vs_aos_1.png "SYCL")

#### AoS vs SoA

![SYCL](../common-revealjs/images/soa_vs_aos_2.png "SYCL")

#### AoS vs SoA

![SYCL](../common-revealjs/images/soa_vs_aos_3.png "SYCL")

#### AoS vs SoA

![SYCL](../common-revealjs/images/soa_vs_aos_4.png "SYCL")

#### AoS vs SoA

![SYCL](../common-revealjs/images/soa_vs_aos_5.png "SYCL")

#### AoS vs SoA

![SYCL](../common-revealjs/images/soa_vs_aos_6.png "SYCL")

#### AoS vs SoA

![SYCL](../common-revealjs/images/soa_vs_aos_7.png "SYCL")

#### AoS vs SoA

![SYCL](../common-revealjs/images/soa_vs_aos_8.png "SYCL")

#### AoS vs SoA

![SYCL](../common-revealjs/images/soa_vs_aos_9.png "SYCL")

#### AoS vs SoA

![SYCL](../common-revealjs/images/soa_vs_aos_10.png "SYCL")

#### Coalesced image convolution performance

![SYCL](../common-revealjs/images/image_convolution_performance_coalesced.png "SYCL")

## Questions

#### Exercise

Code_Exercises/Coalesced_Global_Memory/source

Try inverting the dimensions when calculating the linear address in memory and measure the performance.