Iterative stencil computations are important in scientific computing and more
also in the embedded and mobile domain. Recent publications have
shown that tiling schemes that ensure concurrent start provide efficient ways
to execute these kernels. Diamond tiling and hybrid-hexagonal tiling are two
tiling schemes that enable concurrent start. Both have different
advantages: diamond tiling has been integrated in a general purpose optimization
framework and uses a cost function to choose among tiling hyperplanes,
whereas the greater flexibility with tile sizes for hybrid-hexagonal tiling has
for effective generation of GPU code.
In this paper
we undertake a comparative study of these two tiling approaches
and propose a hybrid approach that combines them.
We analyze the effects of tile size and wavefront choices on tile-level
parallelism, and formulate constraints for optimal diamond tile shapes. We
then extend, for the case of two dimensions, the diamond tiling formulation into
a hexagonal tiling one, which offers both the flexibility of hexagonal
tiling and the generality of the original diamond tiling implementation.
We also show how to
compute tile sizes that maximize the compute-to-communication ratio,
and apply this result to compare the best achievable ratio and
the associated synchronization overhead for diamond and hexagonal tiling.