LiteFX 0.3.1.2022
Computer Graphics Engine
|
This guide walks you through the steps required to write an application that renders a simple primitive. It demonstrates the most important features and use-cases of the LiteFX engine. Before you start, make sure you've successfully setup a project by following the project setup guide.
At the core of each LiteFX application lies the Backend
. In theory, an application can provide different back-ends, however currently only one type of back-ends is implemented: the RenderingBackend
. This back-end comes in two flavors: VulkanBackend
and DirectX12Backend
. For now, let's create a simple app, that uses the Vulkan backend and uses GLFW as a cross-platform window manager. In order to do this, we first need to extent the CMakeLists.txt file, created in the project setup guide. Add a FIND_PACKAGE
command below the line where you are searching for LiteFX:
In order to make GLFW available to your project, you have to also define the dependency using TARGET_LINK_LIBRARIES
:
Also - if you are not using vcpkg - you need to make sure, that the glfw shared library gets copied over to the build directory. To do this, extent the FOREACH
loop at the bottom of your file:
Re-configure your project and edit the main.h and main.cpp files and copy the following code to it:
And to the main.cpp file:
Let's take a look at the code together. First, we create a window to paint on using GLFW. This is straightforward, but any other window manager can also be used. LiteFX does not make any restrictions on how the rendering surface is created. The only restriction is, that it needs to be compatible with the rendering back-end. However, since it is possible to also request a HWND
handle from GLFW using glfwGetWin32Window
, we can use GLFW it for both backends. The window pointer is passed to the App::build<SimpleApp>()
call, which forwards the pointer to the SimpleApp
constructor, where we store it.
Note on Vulkan: for the Vulkan backend, it is important to specify the mandatory extensions that are required to create a valid surface. We do this by calling glfwGetRequiredInstanceExtensions
and passing the result to the VulkanBackend
. It is, however, possible to manually specify those extensions, if GLFW is not used. For Windows, those extensions are: VK_KHR_surface
and VK_KHR_win32_surface
.
We then specify a log target (which is completely optional) to be a console window. and tell the app to initialize a VulkanBackend
for rendering. The .go()
calls cause the builder to perform the actual object initialization and you will see it frequently when using the fluent API. However, using the fluent builder syntax is also optional - you could in fact create all the instances on your own. All classes follow an RAII idiom, so it is clear from the constructors parameter, which objects are required to be initialized in which order.
Let's go on and take a look at the SimpleApp
class. It implements the LiteFX::App
base class, which is an abstract class, that requires us to provide some overrides, that implement the basic application control flow. Most notably, those are:
initialize
, which is called by the LiteFX::App
constructor.run
, which is called by the AppBuilder
, after the app has been initialized. Note that, if you prefer not using the fluent builder syntax, you would have to call it on your own.resize
, which is called if an resize event occurs.Since the app itself is agnostic towards the actual window manager, we have to manually invoke the resize event. We do this by storing the application instance pointer using glfwSetWindowUserPointer
and calling the resize
method within the GLFW framebuffer resize event callback.
The app model automatically calls SampleApp::run
as soon as the app is ready. This method is the main method, that should implement the game loop. If it returns, the application will close. In its most simple form, the method can be implemented like this:
There are three to-dos here:
In the following, we will take a closer look into each of the steps.
Before we can do any rendering, we need to create a device. A device is the central instance for each GPU job. Conceptually it represents the hardware interface for the application. It does, however, not represent the actual hardware itself - that's the purpose of Adapters. There can be multiple devices for an adapter and there can be devices that make use of different adapters. For our example, we use the default adapter, which is typically the main GPU. To request the default adapter, we pass std::nullopt
to the findAdapter
method of the rendering backend of our app.
The next thing we need is a surface, which is a way of telling the device where to render to. The way surfaces are created slightly differs between DirectX 12 and Vulkan. For DirectX 12 a surface is nothing more than a HWND
. For Vulkan, however, we need to create a surface and we can use GLFW for it.
With the surface and adapter, we can now proceed to creating our device. Creating a device automatically initializes the Swap Chain, which we will talk about in detail later. We can simply create it with a default extent, but it is more efficient to directly tell the swap chain how large the surface is from the beginning. This way, we prevent it from beeing re-created after the window first gets drawn to. In order to do this, we can request the frame buffer size from GLFW. Note that the frame buffer size is not always equal to the window size, depending on the monitor. High DPI monitors use a more coarse window coordinate system. You can read about it in more detail here.
With the adapter, surface and frame buffer extent, we can go ahead to create our device. We also specify the output format (Format::B8G8R8A8_SRGB
) and the number of frames, which we concurrently want to draw. This is commonly referred to as frames in flight, or back buffers throughout the engine, though there is a slight difference. Back buffers refer to the number of frame buffers in the swap chain, whilst frames in flight is a broader concept, that for example influences how many buffers or descriptor sets you want to allocate later.
We store the device in a variable m_device
, which we define as a member variable of SampleApp
, since we are going to make heavy use of it throughout the whole application.
Note how the createDevice
method returns a UniquePtr
. Receiving a unique pointer from any call transfers ownership to your application. This means, that from now on, you are responsible for managing the device lifetime and make sure that it gets released properly. We will do this later, when we talk about cleaning up. From now on, all member variables that should be stored are marked with the m_
prefix and their declaration will not be explicitly mentioned, as long as it does only involve declaring a simple pointer or reference.
Each frame is drawn by at least one render pass. A render pass is a timeline entity of each frame. There can only be one render pass active at one time. Render passes, like everything in modern graphics APIs are executed asynchronously, though. Each call that you do on a render pass records a command to be executed, if the GPU allocates a time slot for it. By itself, the render pass does not store much information and is more an entity that is used to describe the flow of images through the render process of a frame. This is called Render Graph. A render graph is described by a set of inter-dependend render passes.
Note that LiteFX does not automatically build the render graph for you. Instead you are responsible to begin and end each render pass manually, as well as map the input attachments to a render pass. It does, however, provide you with two tools to model the image flow: Render Targets and Input Attachment Mappings. Render targets describe the output of a render pass. Input attachment mappings do the exact opposite and tell a render pass to use the render target of another render pass as an input. This way, there's an implicit relationship between render passes. You have to end a render pass before you can use its render targets as input attachments for another render pass.
In our example, however, we do not use multiple render passes and instead only create one render pass, that directly draws to our surface. We do this by defining a render pass with one render target, that has the Present
type. This type is special in a way, that there can only be one present target throughout all render passes. The present target is whats written to the front buffer if a swap chain is swapped. Present targets can also not be used as input attachments. Other render target types are Color
, which can store arbitrary data and DepthStencil
, which can only exist once per render pass and stores depth, stencil or depth and stencil data.
The other values that are provided to a render target are:
0.0
.DepthStencil
target and the format supports stencil values. It can be used to disable clearing stencil values and only clear depth values for depth/stencil targets.false
.Next on our list is creating the render pipeline. The singular here is important, because in most applications you will have many pipelines created from one render pass. A pipeline contains the whole state that is used to render a buffer. It tells the GPU how to interpret inputs, how to store the outputs and which shader programs to use. More importantly, it tells the GPU about the shader program parameters and how they are layed out. A pipeline itself is a state object, that can be bound to a render pass. There can only be one pipeline active at a time and since there can also be only one render pass active at a time, there's only one active pipeline at each point in the drawing timeline. Changing the pipeline state may be an expensive operation and you should not do this frequently. Instead, group together objects that can be rendered with the same pipeline and draw all of them before switching to another pipeline.
The first important states, a pipeline stores are the Viewport and Scissor states. Both tell the GPU which pixels to draw to the output render area, however a scissor does not imply a viewport transform (that is a scaling of the image to fit the viewport). For our example, we set both (the viewport and the scissor) to the whole size of our frame buffer. Note that it is possible to have multiple viewports and scissors. In order to do this, we use the width
and height
we've requested earlier.
Furthermore, each pipeline can be assigned an ID, which must be unique for the render pass and a name, which is soley used for debugging purposes. Let's start by creating our viewport and scissor and then continue with building our pipeline state.
Next we tell our pipeline how to handle geometry inputs. Geometry is described by vertices and indices. A geometry has a certain topology, which is described by the PrimitiveTopology
enumeration. Our geometry is represented as a list of triangles, so we set it to TriangleList
. Since our geometry vertices must not be ordered, we use an index buffer to tell which vertices to use for triangle rendering. We set the index type to UInt16
, which has enough space for 65535
vertices and is enough for our example.
Finally, we define out vertex buffer layout. This means, that we tell the input assembler state about the memory layout of a single vertex. We use the Vertex
object, defined in the LiteFX::Graphics
namespace, but it is possible to use a custom structure, too. Each property is of the vertex is mapped to an attribute, which describes the memory layout of the property and where to find it within the buffer. Furthermore, it tells the renderer, where the attribute should be bound to. For example, the Position
property of the vertex object is a 3-component 32-bit float vector, so its format is XYZ32F
. In the shader, the position is the first element of the vertex, so we set its location to 0
.
Next, we tell the pipeline about how those primitives (i.e. triangles in our example) should be drawn. We want to draw solid faces, so we set the PolygonMode
to Solid
. Another property of the rasterizer state is the face culling state. First, we set the order of vertices, which dictates which side of the primitive is interpreted as front and which one is the back. We set the CullOrder
to ClockWise
to tell the pipeline to treat this ordering as front face. Finally, we tell the pipeline to draw both sides of a polygon, by setting the CullMode
to Disabled
.
Each pipeline is defined using a Shader Program and a Pipeline Layout. We start by defining the shader program, which in our simple example should contain two stages: Vertex and Fragment shaders (those are also called Pixel shaders in DirectX). A program is built from multiple modules, where each module type may only exist once within a program. The modules are loaded from files and must be in a compatible binary format. For Vulkan this format is SPIR-V, for DirectX it's DXIL. We define those shaders later, for now it is only important that they are written to the shaders directory and called vs.spv (vertex shader) and fs.spv (fragment shader).
Finally we need to tell the pipeline layout about the buffers that are used by the shader. Buffers are grouped into descriptor sets. Each descriptor set can contain multiple buffers and is visible to a pre-defined range of shader stages. Each buffer is bound to a certain location within the descriptor set. It is a good pracitce to group buffers into descriptor sets, based on update frequency. We have two buffers in our example, that are updated in different frequencies:
CameraBuffer
, which is only updated when the viewport changes (our example camera is static). The camera buffer will be bound to location 0
of set 0
and will be visible to the vertex and fragment shader stages.TransformBuffer
, which is updated every frame (we want to draw a rotating rectangle). The transform buffer will be bound to location 0
of set 1
and will only be visible to the vertex shader.For now, we will only define the descriptor sets and take a look at the CameraBuffer
and TransformBuffer
objects later.
For more details about buffers and descriptor sets, kindly refer to the project wiki or read the API documentation about descriptor sets.
We already told the pipeline to load the vertex and fragment shaders, however, we do not yet have defined them. Create two new files in the project directory: vs.hlsl for the vertex shader and fs.hlsl for the fragment shader. First, let's take a look at the vertex shader:
VertexInput
corresponds to the definition we earlier passed to the input assembler. VertexData
describes the output vertex of the vertex shader. The two constant buffers correspond to the descriptors we passed to the descriptor sets in the pipeline definition. Note how the descriptor set ID maps to the space
in the shader. The main function is pretty straightforward, as it only performs the view/projection transform for the vertex and copies its color. Also note the #pragma pack_matrix(row_major)
: since we are going to use GLM in this tutorial and GLM produces row-major matrices, this is important. If you are using another math library, you can simply change this line accordingly.
The fragment shader is also pretty rudimentary for the moment:
Again, the VertexData
contains the data we are passing from the vertex to the fragment shader stage. FragmentData
holds the information about the fragment, where the Color
property maps to the first render target and the Depth
property maps to the depth/stencil target we defined earlier in the rasterizer state. The does not do any further computations, but copies the values from the vertex input.
Usually you have to manually compile the shaders before you can use them. LiteFX is capable to do this for you automatically and provides two helpers for CMake projects: ADD_SHADER_MODULE
and TARGET_LINK_SHADERS
. Let's head over to our CMakeLists.txt file and add the following lines below the TARGET_LINK_LIBRARIES
:
First, we define two targets MyVertexShader
and MyFragmentShader
, one for each shader module. The options behind specify the language, shader type, intermediate language (SPIRV
for Vulkan, change it to DXIL
for DirectX 12), the shader model and the compiler to use. DXC
is recommended, unless you want to compile GLSL shaders, which can only be compiled using GLSLC
. Note, however, that you cannot use GLSL shaders to target DirectX 12.
The shader helper attempts to find the GLSLC and DXC compilers automatically. If you have the Vulkan SDK installed, it looks for the compilers there. It prefers the Vulkan SDK, since this DXC distribution supports SPIR-V code generation and GLSLC is present too. If you only plan on using the DirectX 12 backend and don't have the Vulkan SDK installed, the helper falls back to the DXC distribution, installed with the Windows 10 SDK. Note, however, that this distribution does not support SPIR-V code generation. It is possible, however, to specify a custom location for each compiler, by setting the BUILD_DXC_COMPILER
and/or BUILD_GLSLC_COMPILER
variables.
Using TARGET_LINK_SHADERS
we setup a dependency for between the shaders and our application, so that the shaders are copied to the build directory properly. Note that by default, the shaders are copied into a shaders/
subdirectory. You can change this subdirectory by changing the SHADER_DEFAULT_SUBDIR
variable. Keep in mind to also update the pipeline state definition, if you do change the directory.
NOTE: If you want to learn more about how to write portable shaders, refer to the shader development guide. For more information on how to use the helpers, refer to the shader module targets page in the project wiki.
Next, we need to pass data to the GPU for it to process. LiteFX supports different data management strategies, that you can choose from:
Other scenarios are also possible, however, they are all implemented using the techniques demonstrated by those two (and most common) scenarios.
Vertex and index buffers are two examples of data, that is typically written once and read multiple times. We thus create a CPU-visible staging buffer for both, write the required data to it and transfer it to a GPU-visible resource. But first, let's define a vertex and index buffer structure:
We define a vertex buffer with three vertices, each with a different color. The last two vectors represent the normal and texture coordinate, which we do not want to use yet. It is totally possible to use a custom Vertex
object. Just keep in mind to change the input assembler state accordingly. We specify the index buffer to form a triangle from all three vertices. The order of the indices is specified by the CullOrder
we defined in the rasterizer state.
Next, let's transfer the buffers to the GPU. We start of by storing the input assembler reference (for easier access) and creating a command buffer to record all transfer commands. We do this right below the pipeline creation code:
We then create a CPU visible vertex buffer and copy the vertex data into it:
The BufferUsage
defines where the buffer should be visible from. Staging
corresponds to a CPU-only visible buffer, whilst Resource
is used for GPU-only visible buffers. We will use another buffer type (Dynamic
) later to represent Write once/Read once scenarios. Finally, we copy the data to the vertex buffer by calling map
. After this, we can create the GPU-visible vertex buffer and issue a transfer command:
We store the vertex buffer in a member variable. We then go ahead and repeat the same process for the index buffer:
The same memory management concepts as for vertex and index buffers apply to shader resources (i.e. constant buffers, samplers or textures). However, they have one more aspect to them: descriptors. Descriptors are basically GPU-pointers in a sense, that they point to a GPU-visible resource before a draw call is issued. They must, however, not change until the draw call has finished (i.e. the end of the frame). Managing descriptors manually can be quite challenging. Luckily, LiteFX already implements flexible descriptor management strategies and the way you interact with descriptors is always the same when using LiteFX.
Descriptors are grouped into Descriptor Sets. We already defined two descriptor sets when setting up our pipeline. For now, both are only contain one descriptor, a uniform buffer each. You can add as many descriptors to a set as you like, there are some things to keep in mind, though. Firstly, you must not mix samplers and images in one descriptor set. This rule is not directly enforced by the Vulkan backend, however it's a strong requirement for the DirectX 12 backend, so you should follow it anyway to keep your pipeline definitions consistent. Secondly, you should define descriptor sets based on the frequency they are updated. This is a good practice and generally helps to organize GPU workload.
Descriptor sets directly map to a GPU space
. Within this space, a descriptor has a binding
, which defines from which register the descriptor gets accessed by the shader. We defined both in the pipeline layout, as well as the shader code. Feel free to go back to those sections to ensure that you understood how descriptors are mapped to the shader. If you think you understood the relationship, go ahead with the next sections.
We will first map a static CameraBuffer
to the shader. This buffer contains the camera View/Projection matrix. In our example, the camera cannot move, making it a perfect fit for static buffers. As mentioned earlier, static buffers correspond to the Write once/Read multiple strategy and should be transferred to a GPU-visible resource. We will thus create two buffers, as we did for the vertices and indices again and issue a transfer command to copy the data to the GPU. We will then allocate a descriptor set and update it accordingly, so that the descriptor points to the GPU camera buffer.
Let's start off by defining our camera buffer structure:
Note that we are using glm to store the matrix here, but you can use any other representation that suits you. Keep in mind to specify the matrix order (#pragma pack_matrix
) in the shader sources, if it is different to our example. If you want to follow this guide, you need to add an include for glm matrix transformations:
Next, we create the two buffers that should store the camera data:
First, we request a reference of the descriptor set layout (at space 0), that contains the camera buffer descriptor layout (at binding point 0). We then create two constant buffers for and store them in a member variable, since we want to be able to update the camera buffer later (for example, if a resize-event occurs). The camera buffer is still static, since such events occur infrequently.
Let's move on and compute the view and projection matrix and pre-multiply them together:
In the last line, we pre-multiply the view/projection matrix and store it in the camera buffer, which we can now transfer to the GPU:
The last thing we need to do is making the descriptor point to the GPU-visible camera buffer. We only need to do this once, since we do not change the buffer location on the GPU:
Here we first allocate a descriptor set that holds our descriptor for the camera buffer. We then update the descriptor bound to register 0 to point to the GPU-visible camera buffer. Finally, with all the transfer commands being recorded to the command buffer, we can submit the buffer and wait for it to be executed:
We also explicitly release the temporary staging buffers and the transfer command buffer here, since we do not need them anymore and we need to ensure that they are released before we close the application.
We now want to look at a different memory management strategy: Write once/Read once. For resources that change frequently (i.e. every frame), this strategy is more efficient than record transfer commands and waiting for the transfer to happen. Instead we rely on the graphics queue to ensure that our buffer is transferred automatically, when it is needed.
Let's begin with defining our transform buffer structure:
Next, we create three Dynamic
buffers and map them to the descriptor set at space 1 that holds the per-frame transform buffer descriptors. There are three buffers, since we have three frames in flight, i.e. three frames that are computed concurrently. This equals the number of back-buffers in the swap chain, we created earlier. Since we have three buffers, we also need three descriptor sets, each containing a descriptor that points to the buffer for the current frame. The three buffers are stored in one buffer array with three elements, so each descriptor points to an individual element in the transform buffer array.
With everything setup so far, we can now start the actual drawing. Navigate to the main application loop (look for the // TODO: draw frame.
comment) and start by swapping out the current back buffer:
The back buffer describes the resources that are used for the frame that is currently computed. This is done concurrently, so while a frame is still waiting to be drawn, future frames can already be recorded by the CPU. This ensures, the GPU is always busy with rendering.
Each frame is drawn in one or multiple sequential render passes. We already a single defined the render pass earlier, so all we need to do is tell the GPU to start on the current back buffer:
In order to draw something, we need to acquire a command buffer to record drawing commands to. Each render pass stores a set of command buffers within the current (active) frame buffer. The right frame buffer is selected when passing the backBuffer
to the begin
method. A frame buffer can store multiple command buffers in order to allow for multiple threads to record commands concurrently, however, in our example we only use one command buffer:
Next up, we want to handle drawing geometry. Each geometry draw call requires a certain state to let the GPU know, how to handle the data we pass to it. This state is contained the pipeline we defined earlier. In a real-world application, there may be many pipelines with different shaders, rasterizer and input assembler states. You should, however, always aim minimize the amount of pipeline switches. You can do this by pre-ordering the objects in your scene, so that you draw all objects that require the same pipeline state at the same time. In this example, however, we only have one pipeline state and we now tell the GPU to use it for the subsequent workload:
Now it's time to update the transform buffer for our object. We want to animate a rotating triangle, so we can use a clock to dictate the amount of rotation. We use the duration since the beginning to compute a rotation matrix, that we use to update the transform buffer:
Before we can record the draw call, we need to make sure, the shader sees the right resources by binding all descriptor sets:
Finally, we can record the actual draw call and end the render pass, which will cause the command buffer to be submitted to the graphics queue:
When you launch the app now, you should see a rotating triangle in all its beauty.
Before we can close our application, we need to ensure that all resources are properly released. Whilst not absolutely mandatory in release builds, this will satisfy the validation or debug layer (depending on your back-end):
First we wait for the device to finish drawing the remaining frames. This ensures, that we do not destroy resources, that are still accessed by the GPU in any submitted command buffers. We then first destroy all descriptors and buffers, before finally releasing the pipeline, render pass and device instances.
If you resize the window, you might notice that the backend will return an error. This is caused by the swap chain rendering to an outdated back-buffer. In order to support window resize events, let's implement the SimpleApp::resize
method:
Since this method is inherited from the App
base class, we first invoke the base class method. This ensures, that the render area is valid. We then check, if the device has already been initialized, since resize-events may occur before any initialization has been done. If it is not initialized, there's no need for us to do anything else here. However, this during rendering it will be initialized, so let's continue with the implementation:
Again, we first wait for the device to finish all submitted work. This ensures that we do not destroy any back buffers, that might be still used by command buffers that are yet to be executed. Next we request the surface format from the current swap chain and initialize the new render area extent. We then can go ahead and re-create the swap chain, which causes the back buffers to be re-allocated with the new size and format. Furthermore, we can resize the frame buffers of our render pass. Note that you have to decide whether or not you want to do this, because you might have a render pass, that renders into a target that is deliberately at a different size than the swap chain back buffer. However, you almost certainly want to at least resize the frame buffer of the render pass that writes your present target.
We then also resize the viewport and scissor rectangles, so that the image is drawn over the whole area of our resized window:
If you launch the application now and resize the window, it already should work. You might, however, notice that the image appears stretched. This is caused, because we also need to adjust the aspect ratio in our view/projection matrix. To do this, we can use the code we've written earlier to compute the camera buffer and update the buffer once again:
Note that we do not have to release the command buffer explicitly here, since it will go out of scope anyway and will be released automatically.
This quick start covered the basics on how to interact with the engine to write a modern graphics application. For more in-depth information about the inner workings of the engine, head over to the project wiki. If you have any problems or want to contribute to the development, feel free to open an issue or create a pull request.
Nevertheless, I hope you enjoy with the project. Happy coding! 👩💻👨💻