Thread如何划分为Warp?

jielahou大约 2 分钟

我们知道,在优化CUDA程序时,以Warp的视角去分析是很重要的。但是,(笔者在琢磨这个问题之前)只知道一个Warp中有32个线程,但是不知道(对于二维blockDim来说)究竟是哪32个线程划分到一个Warp中,写代码时心里也就没有谱。近来阅读CUDA C Programming Guideopen in new window,突然发现了问题的解,在此记录。

先看这里(https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#simt-architecture)的一句话:open in new window

When a multiprocessor is given one or more thread blocks to execute, it partitions them into warps and each warp gets scheduled by a warp scheduler for execution. The way a block is partitioned into warps is always the same; each warp contains threads of consecutive, increasing thread IDs with the first warp containing thread 0. Thread Hierarchy describes how thread IDs relate to thread indices in the block.

线程块被划分为线程的方式总是相同的。**每个线程束包含着Thread ID连续的、递增的线程(第一个线程束包含线程0)。**线程架构描述了Thread ID和Thread Index的关系。

看来,只要弄清楚“线程架构”,即Thread Index和Thread ID之间的关系,就能够知道给定Thread Index的线程,其和哪些线程归属同一个warp,进而去针对warp做优化。

Thread Index和Thread ID之间有什么关系呢?(线程架构参考这里:CUDA C++ Programming Guide (nvidia.com)open in new window

  • 1维的Thread Index,其Thread ID就是Thread Index

  • 2维的Thread Index,其Thread ID为tx + ty * DX

  • 3维的Thread Index,其Thread ID为tx + ty * DX + tz * DX * DY

由此再回到本文的问题:Thread如何划分为Warp?

  • 对于1维的Thread Index,直接32个为一组划分(e.g. 0~3132~6364~95...)
  • 对于2维的Thread Index,先按照x分,然后再按照y分(e.g. 假设Thread Block大小为[dx]16*[dy]32,那么(0,0),(1,0)...(14,0),(15,0),(0,1),(1,1)...(14,1),(15,1)是一个warp内的)
  • 对于3维的Thread Index,先按照x分,然后再按照y分,最后按照z分(例子略)
Loading...