《Unity着色器圣经》10.0.1 | Compute shader structure.

目录索引

译文

到目前为止,我们的研究重点是对Unlit和Surface着色器的理解,它们具有非常相似的结构;两者都在ShaderLab字段中执行,正如我们所知,这是一种允许程序和Unity之间通信的声明性语言。然而,存在另一种名为Compute的着色器,其结构与上述着色器类似,但不包括便于编程的内置着色器变量。

我们将在本节开始创建一个Compute shader,首先我们执行下面的步骤:

1.我们将转到Unity中的项目文件夹。

2.然后,我们右键单击。

3.然后选择“Create / Shaders / Compute shader”。

4.我们称之为USB_simple_color_CS。

稍后,我们将使用此着色器来了解颜色、UV坐标和纹理实现中的基本结构。因此,它在功能层面上是不美观的。尽管如此,它仍将用于说明该程序背后的语法。

打开该Compute shader,我们将看到如下内容:

// Each Kernel tells which function to compile; you can have many
// Kernels
#pragma kernel CSMain

// Create a RenderTexture with enableRandomWrite and set it
// with cs.SetTexture
RWTexture2D <float4> Result;

[numthreads(8, 8, 1)]
void CSMain (uint3 id : SV_DispatchThreadID)
{
    // TODO: insert actual code here
    Result[id.xy] = float4(id.x & id.y, (id.x & 15)/15.0, (id.y & 15)/15.0, 0)
}

在这个例子中,我们可以看到Unity在程序中默认添加的基本Compute shader的结构。在其中,我们可以找到以下组件:

•声明kernel:CSMain。

•可读写的2D纹理,称为Result。

•我们将用于处理每个纹理texel的线程数(numthreads)。

•一个名为CSMain的函数,它包括一个作为参数的语义和一个RGBA颜色输出。

这样的结构与顶点/片段着色器有相似之处,因为这两个程序都是用HLSL语言编写的。因此,为了理解它们的基本设置,我们将使用CSMain内核进行类比。

当我们确定与其函数相关的Pragma时,顶点着色器被配置为一个阶段(如第1章第3.3.2节所述);这意味着,例如,vert函数必须在pragma中声明为“vertex”,这样GPU才能在渲染管道中识别其性质。

// declare the vert function as vertex shader stage
#pragma vertex vert

// initialize the vert function
v2f vert(appdata v) { ... }

我们可以在片段着色器阶段找到相同的行为。如果我们想要默认函数称为“frag”以编译为片段着色器,我们必须在其各自的pragma中声明它。

// declare the frag function as fragment shader stage
#pragma fragment frag

// initialize the frag function
fixed4 frag (v2f i) : SV_Target { ... }

compute shader也不例外。如果我们想将“CSMain”函数发送到“计算单元”(它执行计算的物理单元),那么它必须在pragma中定义为“Kernel”。

// declare the CSMain function as kernel
#pragma Kernel CSMain

// initialize the CSMain function
[numthreads(8, 8, 1)]
void CSMain (uint3 id : SV_DispatchThreadID) { ... }

计算着色器可以处理的最小单元对应于一个独立线程,属性[numthreads(x,y,z)]与此直接相关

线程执行我们想要执行的操作的计算,例如,在纹理的情况下,它们负责处理图像中的每个纹素。

默认情况下,我们的程序有一组64个线程。我们如何确定这一点?基本上,通过乘以numthreads属性中包含的X、Y和Z中的值。

•numthreads(x,y,z)。

•每组8*8*1=64个线程。

上述值可以翻译为:

•X轴上有八列纹素。•Y轴上有八排纹素。

•Z轴上的一组纹素。

使用线程时,硬件会将numthreads的分配到称为warps的子块上。每组线程总数必须是warps大小的倍数(NVIDIA卡上每组32个线程)或wavefront大小的倍数(ATI卡上每组64个线程)。Unity在X和Y中精确地定义了八个线程(其实就是一共64个, 这样既是32倍数,又是64倍数),以确保程序在NVIDIA和ATI卡都可以运行。在本章的后面,我们将详细介绍这个属性和其他属性,因为一些语义与我们将使用的线程相关联。目前,我们将继续定义我们程序的内部结构。

名为“Result”的RWTexture2D变量是指具有读/写功能(RW)的2D RGBA纹理。此变量允许数据从CPU发送到GPU,并行处理,然后返回。

如果我们想添加一个只有“写”功能的变量,我们必须添加不带RW前缀的变量,例如Texture2D。现在,我们如何确定程序中是否需要一个变量?为此,我们必须在计算着色器中实现一些函数。

顶点/片段着色器需要ShaderLab(一种声明性语言)才能在Unity和CGPROGRAM或HLSLPROGRAM之间进行通信。类似地,“.compute”着色器需要相同函数的C#脚本。每次使用计算着色器时,我们都必须在场景中创建并关联一个C#脚本。最后一个声明了全局变量和缓冲区,我们稍后将与HLSL程序连接。

我们将在本章中反复看到的一个功能是“Dispatch”。这个函数负责执行CSMain内核,在其XYZ维度上启动一定数量的线程组。我们将在本书后面讨论与这些类型的着色器的操作相关的其他概念。


原文对照

Up to this point, we have focused our study on the understanding of Unlit and Surface shaders, which have a very similar structure; both are executed within the ShaderLab field, which, as we already know, is a declarative language that allows the communication between the program and Unity. However, another type of shader called Compute exists, which has a similar structure to the above-mentioned but does not include built-in shader variables that facilitate its programming.


We will begin this section creating a Compute shader, for it:

1. We will go to our project folder in Unity.

2. Then, we right-click.

3. And select Create / Shaders / Compute shader.

4. We will call it USB_simple_color_CS.

Later we will work with this shader to understand the basic structure in color, UV coordinates, and texture implementation. Therefore, it will not be beautiful at a functional level. Still, it will serve to illustrate the syntax behind the program.


Once opened, we will get a structure like the following:

// Each Kernel tells which function to compile; you can have many
// Kernels
#pragma kernel CSMain

// Create a RenderTexture with enableRandomWrite and set it
// with cs.SetTexture
RWTexture2D <float4> Result;

[numthreads(8, 8, 1)]
void CSMain (uint3 id : SV_DispatchThreadID)
{
    // TODO: insert actual code here
    Result[id.xy] = float4(id.x & id.y, (id.x & 15)/15.0, (id.y & 15)/15.0, 0)
}


In the example, we can see a basic color structure that Unity adds by default in the program. In it, we can find the following components:

• The kernel of our CSMain function.

• A 2D texture for writing and reading, called Result.

• The number of threads we will use to process each texture texel (numthreads).

• A function called CSMain, which includes a semantic as argument and an RGBA color output.

Such a structure has similarities to a vertex/fragment shader because both programs are written in the HLSL language. Therefore, to understand their basic setup, we will make an analogy using the CSMain kernel.

The vertex shader is configured as a stage when we determine the pragma associated with its function (As mentioned in chapter 1, section 3.3.2); this means, e.g., the vert function must be declared as “vertex” in the pragma so that the GPU can recognize its nature within the rendering pipeline.

// declare the vert function as vertex shader stage
#pragma vertex vert

// initialize the vert function
v2f vert(appdata v) { ... }

We can find the same behavior in the fragment shader stage. If we want the default function called “frag” to compile as a fragment shader, we must declare it in its respective pragma.

// declare the frag function as fragment shader stage
#pragma fragment frag

// initialize the frag function
fixed4 frag (v2f i) : SV_Target { ... }


A Compute shader is no exception. If we want to send the “CSMain” function to the “compute units” (physical unit where it performs the computation), then it must be defined as a “kernel” in the pragma.

// declare the CSMain function as kernel
#pragma Kernel CSMain

// initialize the CSMain function
[numthreads(8, 8, 1)]
void CSMain (uint3 id : SV_DispatchThreadID) { ... }

The smallest unit that a Compute shader can process corresponds to an independent thread, and the attribute [numthreads(x, y, z)] is directly related to this
The threads perform the computation of the operation that we want to carry out, e.g., in the case of a texture, they are in charge of processing each texel that the image has.

By default, our program has a group of 64 threads. How can we determine this? Basically, by multiplying the values in X, Y and Z included in the numthreads attribute.
• numthreads (x, y, z).
• 8 * 8 * 1 = 64 threads per group.


The above values can be translated as:
• Eight columns of threads in the X-axis. • Eight rows of threads in the Y-axis.
• One set of threads in the Z-axis.


When working with threads, the hardware divides the groups into sub-blocks called warps. The total number of threads per group must be a multiple of the warp size (32 threads per group on NVIDIA cards) or a multiple of the wavefront size (64 threads per group on ATI cards). Unity defines eight threads in both X and Y precisely to ensure that the program runs on both; NVIDIA and ATI cards. Later in this chapter, we will review this and other attributes in detail, since some semantics are associated with the threads we will operate with. For now, we will continue defining the internal structure of our program.


The RWTexture2D variable called “Result” refers to a 2D RGBA texture with reading/write capability (RW). This feature allows data to be sent from the CPU to the GPU, processed in parallel, and then returned.
If we want to add a variable that only has the “write” capability, we would have to add it without the RW prefix, e.g., Texture2D. Now, how could we determine the need for a variable in our program? For this, we will have to implement some functions in the Compute shader.


A vertex/fragment shader requires the declarative language ShaderLab for communication between Unity and the CGPROGRAM or HLSLPROGRAM. Analogously, a “.compute” shader requires a C# script for the same function. Every time we work with Compute shaders, we will have to create and associate a C# script in our scene. This last one declares the global variables and buffers that we will connect later with the HLSL program.


A function that we will see recurrently throughout this chapter is “Dispatch”. This function is in charge of executing the CSMain kernel, launching a certain number of thread groups in its XYZ dimensions. There are other concepts associated with the operation of these types of shaders that we will discuss later in this book.

© 版权声明
THE END
喜欢就支持一下吧
点赞0 分享
评论 抢沙发

请登录后发表评论

    暂无评论内容