Design and Implementation of a Vulkan Engine

Case Study of Capabilities and Performance

Master's Thesis, 2018

75 Pages, Grade: 1,3

Free Download! (PDF)



1 Introduction

2 Why New API?
2.1 Origin and History of Vulkan
2.2 Vulkan - More Performance and Efficieny
2.3 Will OpenGL Get Outdated?

3 Vulkan API Overview
3.1 Layers
3.2 Extensions
3.3 Vulkan Instance
3.4 Devices
3.4.1 VkPhysicalDeviceProperties
3.4.2 VkPhysicalDeviceFeatures
3.4.3 VkPhysicalDeviceMemoryProperties
3.4.4 VkDevice
3.5 Queues
3.6 Window System Integration
3.7 Command Buffers
3.8 Render Passes
3.9 Framebuffers
3.10 Pipelines
3.11 Descriptors
3.12 Push Constants
3.13 Buffers
3.14 Images
3.15 Synchronization
3.15.1 Fences
3.15.2 Events
3.15.3 Semaphores
3.15.4 Barriers
3.16 SPIR-V Shaders

4 The Case Study Scenario
4.1 Deferred Shading with MSAA
4.2 Transparency Blending
4.3 FXAA
4.4 Bloom
4.5 Dynamic Panel Overlay

5 Engine Design and Implementation
5.1 Ocean Resources
5.1.1 Displacement Maps
5.1.2 Dy-Normalmap and Mipmap Generation
5.1.3 Scene Reflection/Refraction and Deferred Shading
5.2 Opaque Scene G-Buffer
5.3 Sample Coverage and Deferred Shading
5.4 Transparent Scene and Blending
5.5 FXAA and Post Processing
5.6 Panel Overlay
5.7 Presentation

6 Case Study: OpenGL vs. Vulkan

7 Evaluation

A Measured Simulation Data - OpenGL
B Measured Simulation Data - Vulkan


The Vulkan API, released in February 2016, is the Khronos Group’s answer to Microsoft’s Direct3D 12 API published in 2015. Due to the revolutionary capabilities provided by the new API’s to the programmer, the releases were accompanied by an enormous hype. Vulkan and Direct3D 12 provides the programmer unprecedented control and empowerment over the GPU and its memory, which might introduce a new era in GPU computing. This elaboration deals with the design and implementation of a graphics engine along with state-of-the-art rendering features using the Vulkan API. The Vulkan engine is built upon the OpenGL engine Oreon Engine [1] developed in a previous thesis and used in the research elaboration Realtime GPGPU FFT Ocean Water Simulation [2] . Finally, an extensive study concering the capabilities of the new Vulkan API and its performance advantage compared to OpenGL is demonstrated.


I gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.

List of Figures

Abbildung in dieser Leseprobe nicht enthalten


Abbildung in dieser Leseprobe nicht enthalten

Chapter 1


Over the past years, GPU computing has become more and more relevant in a wide range of application environments. Since the early 2000s, when the first GPUs with programmable shader units were launched, the bandwidth of GPU accelerated applica- tions has reached new dimensions. Since then, GPUs have taken on much more the "role as a processor" [3, p.18], because GPUs are significant faster than CPUs in solving tasks with high data parallelism. Figure 1.1 demonstrates the evolution of GPU and CPU performance measured in computing power (GFLOPS) and the data transfer rate (GB/s) from 2001 to 2014 and 2003 to 2013, respectively. Already in the years 2002 to 2003, the GPU performance (in GFLOPS) has risen up to the factor of 6, while the clock rate even decreased by 20% as listed in figure 1.2. Due to that enormous role change of GPUs and the resulting new possibilities in offscreen computing, many industrial sectors next to the gaming and film industry considered this progress. As a result of the increasing demand for GPU programmers, e.g. financial companies hired game programmers to meet these new challenges. [3] [4]

Over the next years, high-level GPU APIs (like CUDA, Stream or OpenCL) had to rely on the graphics APIs Direct3D or OpenGL, since Direct3D and OpenGL serve as the interface between CPU and GPU communication. Direct3D’s and OpenGL’s initial releases were in the early 1990s. Obviously computer hardware has advanced enormously until today, which reveals some disadvantages in still relying on the low-level GPU APIs Direct3D (pre Version 12) and OpenGL. Hence, Microsoft released Direct3D 12, an overhauled version of its graphics API Direct3D within the scope of the DirectX 12 release. Soon after, in early 2016, Vulkan was released by the Khronos Group. Unlike Direct3D 11 (and backwards) and OpenGL, Vulkan and Direct3D 12 were evolved under consideration of modern computer hardware’s architecture and features to unleash its full power. [3] [5] [6]

This elaboration targets the development of a Vulkan graphics engine with a performance comparison of Vulkan and OpenGL. The next chapter introduces the Vulkan API and its history and points out the advantages of Vulkan compared to OpenGL and Direct3D 11 (and backwards) on modern computer hardware. Following this, the core elements of the Vulkan API specification used in the developed Vulkan engine are described in chapter three. The realtime simulation scenario including a GPGPU FFT generated ocean for the case study and the Vulkan engine implementation design is extensively explained and illustrated in the subsequent two chapters. Finally, the measured performance results of the simulation scenario with Vulkan and OpenGL are presented and compared.

Abbildung in dieser Leseprobe nicht enthalten

Figure 1.1: GPU vs CPU Peformance Scaling [4]

The two graphs show how much GPUs have outscaled CPUs in GFLOPS and GB/s over the past years. While the Pentium 4 and the Gefore FX5800 were head to head in 2002, the Gefore 780 Ti reaches a performance up to around 900% higher than the Intel Ivy Bridge CPUs in 2013. Same for GB/s: While in 2003 CPUs and GPUs were pretty much on the same level, in 2013 the 780 Ti has a lead of around 500% against the Ivy Bridge CPUs.

Abbildung in dieser Leseprobe nicht enthalten

Figure 1.2: Comparison of Nvidia GPUs [3, p. 16]

The GFLOPS of the Geforce FX5900 has increased by the factor of 2.5 compared to the Geforce FX5800, while the transistor count just increased by the factor of 1.074 and the clock rate even decreased by 5%. The Geforce 6800 reaches a performance of 50 GFLOPS, 2.65 times more than the FX5900 and 6.625 times more than the FX5800, while the transistor count has increased by the factor of 1.7 and the clock rate decreased by 15,8% compared to the FX5900.

Chapter 2

Why New API?

This chapter introduces Vulkan and its history and outlines the advantages of Vulkan over OpenGL. Finally, the question is clarified if OpenGL is still viable or if it will be outdated by Vulkan soon.

2.1 Origin and History of Vulkan

Vulkan is a cross-plattform next generation 3D graphics and compute API and is considered as the successor to OpenGL and OpenGL ES 1. The official 1.0 specification of Vulkan was released on the 16th of February 2016 by the Khronos Group. [7]

In contrast to OpenGL, Vulkan is a minimal abstraction of the GPU hardware, which facilitates its portability across multiple GPU vendors and device types, such as desktop, mobile or embedded systems. The first discussions about the idea of a completely new graphics API were already held in October 2012. Until the 1.0 release in February 2016, famous hardware and software vendors like Nvidia, AMD, Lucasfilm Ltd., EA, Epic Games and many more contributed to the development of Vulkan. [8] [9]

In contrast to the competing APIs DirectX 12 (Microsoft Windows) and Metal (macOS, iOS ), Vulkan is the only cross-platform next generation graphics API. Further, Vulkan is not only cross-platform but also supported by mobile devices and embedded systems across multiple GPU hardware vendors. Figure 2.1 illustrates the platform support comparison of the three next generation APIs Vulkan, DirectX 12 and Metal. In 2018, Vulkan is also supported by Apple platforms with the MoltenVK API, which maps Vulkan to Metal. Additionally, many popular Game Engines like Unreal Engine, Unity, CryEngine and Xenko offer Vulkan support. [9] [10]

Abbildung in dieser Leseprobe nicht enthalten

Figure 2.1: Platform Support of Next Generation GPU APIs [9, p. 21]

The illustration shows the range of Vulkan platform support in contrast to DirectX 12 and Metal. While Vulkan offers platform support for the latest four Windows versions, Ubuntu and Android (and some more), DirectX12 and Metal are dedicated to Windows 10 and MacOS respectively. Meanwhile, Vulkan runs also on Nintendo Switch consoles and Apple platforms (with MoltenVK).

The next section clarifies the question why there was such a big desire by the developers for a new graphics API and why Vulkan is considered to be next generation. In order to confirm this, a theoretical and abstract comparison of Vulkan and OpenGL in performance and efficiency is demonstrated.

2.2 Vulkan - More Performance and Efficieny

The graphics API OpenGL has been initially released in 1992. Since then, the archi- tecture of GPUs and platforms has evolved enormously. OpenGL was fundamentally invented for fixed graphic workstations with single-threaded direct renderers and split memory. 25 years ago, no considerations about multi-core CPU architecture and multi-threading were made. Hence, OpenGL doesn’t provide a parallel multithreading execution model and thus multi-core CPUs cannot be effectively used. Further, thec OpenGL model doesn’t match mobile device arichtectures and platforms. That’s why OpenGL ES has been outsourced as a standalone API for smartphones, tablets and video game consoles. However, OpenGL ES is based on OpenGL which was not tailored for mobile GPU hardware. [11] [8] [12]

A further point is that GPU vendors are responsible for their individual OpenGL API implementation within the scope of their GPU drivers, since OpenGL is just an API specification, which describes the interface and its expected behaviour. This leads to complex and unpredictable drivers with different bugs on different GPUs. Programmers doesn’t really know what is happening behind the OpenGL interface within the related driver and its hidden implementation. The driver has to do lots of work like state validation, dependency tracking and error checking. This driver overhead limits or even randomizes the performance. In addition, each driver has to provide an implementation of the GLSL shader language compiler and thus different behavior of the same GLSLm shadercode across different GPU drivers can occur. As a consequence, software that uses OpenGL must be tested against multiple GPU vendors and often implementation variabilities across these vendors are necessary. [11] [13] [5]

In summary, OpenGL can no longer be considered as contemporary and thus programmers desired a completely new next-generation GPU API. The answer to the programmer’s desirement is the Vulkan API. Since software in graphics, vision and deep learning across diverse devices and platforms will profit from GPU acceleration, a next generation GPU API should be flexible and portable. In contrast to OpenGL, Vulkan is designed for modern cross-platform usage on cloud, desktop, console, mobile and embedded devices. Hence, there are no separate APIs necessary for desktop and mobile devices as OpenGL and OpenGL ES. Further, Vulkan provides efficient usage of multi- core CPUs and a parallel multithreading execution model. GPU vendors does not need to provide their individual implementations of Vulkan for their drivers, because Vulkan is not only a specification that just defines the interface and its behaviour (as it is the case in OpenGL) but Vulkan is also open source with one explicit implementation for all GPU vendors. Hence, the problem that software needs often a tailored implementation for different GPU vendors, as it is the case for OpenGL, does not exist for Vulkan. Moreover GPU vendors do not even need to provide a compiler for the shader language within their drivers, because Vulkan uses precompiled SPIR-V shader files. [5] [9]

However, the most important innovation in the Vulkan API is the principle of explicit control. As mentioned before, one big problem with OpenGL is the driver overhead. The programmer must explicitly tell the Vulkan driver everything he is going to do in advance. This explicitness simplifies the GPU driver and favors cross vendor consistency, which makes Vulkan that much portable and flexible. Moreover, the explicit control reduces driver overhead and latency, which leads to a reduced CPU load and at the end a better performance. In contrast, OpenGL allows the programmer to change the state at any time which may result in huge performance costs, especially when the programmer changes the OpenGL render state very late just before a draw or compute command. [5] [9] [8]

As a final point, Vulkan allows the programmer to disable validation and error checking (for example in delivery versions) which reduces driver overhead even further. [5] Figure 2.2 illustrates the driver overhead comparison of Vulkan and OpenGL/OpenGL ES.

Abbildung in dieser Leseprobe nicht enthalten

Figure 2.2: Vulkan Explicit GPU Control [5, p. 16]

Figure 2.2 delineates the driver overhead (orange-colored blocks) and the application responsibility (green-colored blocks) of Vulkan and OpenGL/OpenGL ES over the GPU. The application-block of Vulkan is multiple times larger than the OpenGL’s application-block, which means that the programmer is responsible for many things in order to run a Vulkan application efficiently. As a consequence, the driver overhead of Vulkan is multiple times smaller than in OpenGL/OpenGL ES, which leads to a better performance and efficiency.

2.3 Will OpenGL Get Outdated?

After listing the strengths of Vulkan in contrast to the weaknesses of OpenGL, the question arises whether Vulkan will supersede OpenGL completely in the future. Even pif it is obvious after a close examination that Vulkan has much more potential than OpenGL and solves all the problems OpenGL poses, the answer to this question is most likely: No.

Using Vulkan is extremely challenging and needs a lot of effort compared to OpenGL. For purposes where the main focus is not on performance, OpenGL is still more appropriate and the Khronos Group will not discontinue the evolvement of OpenGL and OpenGL ES soon. Nevertheless, for applications where the work can be well parallelized and the CPU load and performance is significant it is worth using Vulkan despite its many challenges and big effort.

Chapter 3

Vulkan API Overview

This chapter introduces the core elements and functions of the Vulkan API. All of the following Vulkan elements are used in various ways by the implementation of the Vulkan engine developed in the scope of this thesis.

First of all, the Vulkan syntax and its handling is mentioned. Every Vulkan function starts with the prefix vk. Vulkan objects or structures starts with Vk and enumerations with VK. Informations for object creations are generally handled as structures. These structures are created according to the following pattern in C++ and Java and are used as parameters for object creation functions with the prefix vkCreate.

Abbildung in dieser Leseprobe nicht enthalten

Listing 3.1: Vulkan Structs Pattern

XXX is a placeholder for the Vulkan information name (e.g. VkFrameBufferCreateInfo for creating a VkFrameBuffer object). The sType parameter is an enumeration with again XXX as the placeholder for the name of the Vulkan information. pNext is an optional pointer to a struct of a Vulkan extension which is rarely used (not used by the Vulkan implementaion of this thesis), hence, it can be set to a nullpointer (C++) resp. to the enumeration VK_NULL_HANDLE (Java). Vulkan functions generally return an enumeration which is VK_SUCCESS for successful execution or a specific enumeration to identify an error during function call. [14] [15] [16]

3.1 Layers

As metioned in section 2.2, all kind of validation and error checking in Vulkan can be enabled or disabled. Since Vulkan is a minimal hardware abstraction to reduce the overhead as much as possible, the driver has no built-in validation and error checking. The Vulkan driver does not provide any feedback and assumes that the programmer does everything correctly. However, to enable any kind of validation Vulkan was designed as a layered architecture. Developers can enbale validation by inserting layers between the application (top layer) and the Vulkan API (bottom layer). Layers facilitates the development with Vulkan, since they provide validation and error checking by intercepting Vulkan functions and modify or evaluate them, so that the developer can debug its application or obtain any kind feedback when calling Vulkan functions. It is even possible to cascade multiple layers into a layer chain. However, layer injection should be disbaled in released versions, since they decrease the performance. Furthermore, developers can create their own layers for individual purposes. [17] [14]

A set of useful validation layers are provided by LunarG 1 in its Vulkan SDK. The layer VK_LAYER_LUNARG_standard_validation involves a layer chain for common usage and is sufficient in most use cases. Since SDK 1.0.68, VK_LAYER_LUNARG_assistant_layer is available which provides feedback about potential performance issues or suspect usage atterns. [18]

Figure 3.1 shows the interaction of Vulkan layers with Vulkan drivers and applica- tions.

Abbildung in dieser Leseprobe nicht enthalten

Figure 3.1: Vulkan Loader [17]

The Vulkan loader is the interface between application and Vulkan drivers (ICD) with the related devices. The loader injects enabled layers between application and driver and delivers Vulkan function calls to a specific ICD by inserting a set of layers prior to the a subsequent function call by the ICD. The Vulkan loader is responsible for supporting on or more independent Vulkan ICD’s on a system. [17]

3.2 Extensions

Extensions expand the functionality of the Vulkan API by providing new functions, structures or enumerations. Extensions can be created by independent developers and must be specified and registered by the Khronos Group in order to become an official extension within the published Vulkan specification. There exist two types of extensions, instance-level extensions and device-level extensions. Instance-level extensions extend the functionality of VkInstance objects, while device-level extenions extend the functionality of VkDevice objects. If the programmer intends to use extensions in the application, the extension-support must be queried and enbaled in advance at instance and device creation, respectively. [14] [17]

3.3 Vulkan Instance

The Vulkan object VkInstance is the starting point for every Vulkan application. It exists exactly once per application. The VkInstance gathers information about the application such as application name, engine name and version. Also used extensions and layers must be specified in advance at VkInstance creation with the vkCreateInstance function call. In order to specify extensions, their platform availablity should be queried with the vkEnumerateInstanceExtensionProperties function. Further, available physical devices (GPUs) must be queried and created as logical devices from the Vulkan instance. [14] [16]

3.4 Devices

Devices in Vulkan represent GPUs within the operating system. The GPU representation is splitted into physical and logical devices as the Vulkan objects VkPhysicalDevice (representing a physical device) and VkDevice (representing a logical device). The logical device object VkDevice, on which all common Vulkan operations like draw- ing, computing or memory allocations are executed, serves as the interface to the GPU. In order to create a VkDevice object for feeding the GPU with work, the operating system has to be queried for available physical devices with the vkEnumeratePhysicalDevices function. Since Vulkan applications can run on work- stations, notebooks, tablets or mobile phones with all different graphics hardware and different performance and capabilities, the GPU capabilities must be checked against the application’s needs. Apart from that, a system may have multiple GPUs installed so that one GPU needs to be selected by the programmer properly, which suits the application’s needs best. Once a VkPhysicalDevice object has been selected and created, its properties and features can be queried. [16]

3.4.1 VkPhysicalDeviceProperties

The VkPhysicalDeviceProperties structure holds general information about the related physical device and is retrieved by vkGetPhysicalDeviceProperties. The following listing shows the VkPhysicalDeviceProperties of the Titan Xp:

Abbildung in dieser Leseprobe nicht enthalten

Listing 3.2: VkPhysicalDeviceProperties of the Titan Xp

The field apiVersion indicates the Vulkan version supported by the device. The following field driverVersion exposes the version of the installed GPU device driver. vendorID and deviceID are unique identifiers of the GPU vendor and its manufactured device. The value 0x10de (decimal 4318) identifies the Nvidia Corporation. The field deviceName denotes the vendor’s GPU model name and deviceType indicates the type of the GPU’s hardware architecture. [14]

The enumeration VkPhysicalDeviceType lists all device types that are differentiated by Vulkan. Listing 3.3 points out the VkPhysicalDeviceType enumeration:

Abbildung in dieser Leseprobe nicht enthalten

Listing 3.3: VkPhysicalDeviceType enumeration

As mentioned in Listing 3.2, the Titan Xp is of discrete type, which means that the GPU is separated from the CPU with its own VRAM. In contrast, integrated device types are embedded in or closely located to the CPU without an own RAM, since they share the system RAM with the CPU. Generally integrated GPUs are installed in mobile devices or laptops. The type VK_PHYSICAL_DEVICE_TYPE_VIRTUAL_GPU denotes that the GPU is a virtual instance and VK_PHYSICAL_DEVICE_TYPE_CPU are GPUs and CPUs as a com- mon entity. Further, the VkPhysicalDeviceProperties structure holds an object of VkPhysicalDeviceLimits, which indicates the limitations of the device such as the maximum framebuffer size or the maximum number of color attachments. [14]

3.4.2 VkPhysicalDeviceFeatures

VkPhysicalDeviceFeatures, obtained from vkGetPhysicalDeviceFeatures, is a struc- ture of boolean flags indicating the support of specific features (e.g. the availability of tessellation shaders or samplers with anisotropic filtering support). [14]

3.4.3 VkPhysicalDeviceMemoryProperties

The Vulkan execution model differentiates between the following three kinds of physical

memory, where a host in Vulkan represents the CPU environment which hosts one or

more devices:

- Device local is the video memory (VRAM). The VRAM is physically connected to the device (GPU) without direct access from the host (CPU).
- Device local, host visible is a unified memory phsysically connected to the device and host.
- Host local, host visible is the host’s system memory and physically connected to the host only but also accessible by the device.

The VkPhysicalDeviceMemoryProperties object exposes information regarding avail- able memories and is returned by vkGetPhysicalDeviceMemoryProperties. The VkPhysicalDeviceMemoryProperties object contains two arrays of the structures VkMemoryHeap and VkMemoryType. The VkMemoryHeap structure contains the size of the memory and a bitmask composed of attribute flags. VkMemoryType holds an index for indentifying to which VkMemoryHeap it corresponds and a bitmask composed of property flags. [14]

Listing 3.4 shows the VkPhysicalDeviceMemoryProperties structure of the Ti- tan Xp. It reveals two found memory heaps (l. 1). The first memory is of size 11.86 GB (l. 3) with the attribute VK_MEMORY_HEAP_DEVICE_LOCAL_BIT, which in- dicates that the memory is physically connected to the GPU. Further, there are eleven memory types found (l. 9). The VkMemoryType at index seven and eight (ll. 13-20) belongs to the first memory (the GPU corresponding VRAM) of the two mem- ory heaps, since heapIndex = 0 (ll. 14, 18). Since their property flags are both VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT only (lines 16, 20), it can be derived that the first memory heap is the local video memory (VRAM) of the graphics card. The second found memory heap with size of 15.96 GB (l. 6) and no attribute flags (l.7) with corresponding memory types with indices 0 to 6 (ll. 10-12) and 9 to 10 (ll. 21-31) is the host’s system memory, since the corresponding memory types at index 9 and 10 exposes the property flag VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT (ll. 24, 29), which indicates that the memory heap is directly accessible by the host. VK_MEMORY_PROPERTY_HOST_COHERENT_BIT (ll. 25, 30) specifies that no further host cache management commands are needed when flushing host writes to device memory or make device writes host-visible. VK_MEMORY_PROPERTY_HOST_CACHED_BIT (l. 31) denotes that the memory type is cached on the host which speeds up memory access. The memory types with index 0 to 6 have no property flags. Hence, these types have no usage for Vulkan. [14]

Abbildung in dieser Leseprobe nicht enthalten

Listing 3.4: VkPhysicalDeviceMemoryProperties of the Titan Xp

Abbildung in dieser Leseprobe nicht enthalten

Listing 3.5: VkPhysicalDeviceMemoryProperties of the Intel HD 620

In contrast to the system with the Titan Xp as a discrete GPU with its local VRAM, listing 3.5 shows the memory properties of the integrated Intel HD Graphics 620 with a unified memory. Since only one memory heap (lines 2-4) with two memory types (lines 6-18) containing the flags VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT (lines 9, 15) and VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT (lines 10, 16) is found for the Intel HD Graphics 620, it follows that the device shares the memory with the host as a unified memory, which is the common architecture for integrated GPUs. [14]

When allocating memory in a Vulkan application, it is important to consider all available VkMemoryTypes and choose the type which suits the needs of the memory to allocate best, since it can affect the performance significantly. [16]

3.4.4 VkDevice

Once a physical device object VkPhysicalDevice is created, the logical device object VkDevice can be generated from it. Almost all GPU work in Vulkan is processed on logical devices with its corresponding queues. The work is submitted to queues as command buffers, where they are processed by the related device. Queues and command buffers are covered later in this chapter. Device-level extensions which are intended to be used by the application must be explicity specified during logical device creation. In order to do this, the extension-support by the physical device should be checked with the vkEnumerateDeviceExtensionProperties command. Further available features from VkPhysicalDeviceFeatures must be explicitly enabled during VkDevice creation. It also needs to be specified which queue families the application is intended to use. [14] [16]

3.5 Queues

Queues in Vulkan (VkQueue objects) receive and process execution commands (like draw/compute commands or memory operations). Vulkan differentiates four types of queues. Every queue type has its individual capabilities and is dedicated for specific kinds of execution commands. The queues are divided into the following types:

- Graphics queues are specialized for draw commands. Further, only graphics queues are able to display images onto the screen.
- Compute queues are optimized for offscreen compute operations.
- Transfer queues are optimized for memory operations.
- Sparse queues support sparse memory operations.

Different GPUs possess individual sets of queues, which support one or more queue types. These queue sets are seperated into queue families. The available queue families of a physical device can be queried with vkGetPhysicalDeviceQueueFamilyProperties. [14] [16]

Abbildung in dieser Leseprobe nicht enthalten

Listing 3.6: VkQueueFamilyProperties of the Titan Xp

Listing 3.6 shows the queue families of the Titan Xp. The first queue family (lines 1-5) supports graphics, compute, transfer and sparse queue types (line 2). There are 16 queues of this queue family available (line 3). The timestampValidBits (line 4) field indicates how many bits are available when writing timestamps into memory. The field minImageTransferGranularity (line 5) indicates the minimum granularity of image texel block transfer operations for x-,y- and z-dimensions of the image. However, timestampValidBits and minImageTransferGranularity are not further considered in this thesis. The second queue family supports transfer capabilities (line 7) and contains one queue (line 8). The third queue family provides eight queues (line 11) of compute type (line 12). [14]

The Vulkan queue model facilitates concurrency. For optimal usage, the application can split its work into segments and submit these segments to multiple queues. For best performance, the work segments can be categorized into graphics, compute, transfer or sparse type and submitted to a queue of the appropriate queue family. The work must be submitted to queues as VkCommandBuffer objects (more on command buffers in section 3.7). Since the queues process their work asynchronously, the synchronization of the segments is completely in the developer’s hands. Moreover, it is not guaranteed that one queue processes its work in the same order the work was submitted to the queue. For proper inter- and intra-queue synchronization, Vulkan provides various synchronization objects (more in section 3.15). As mentioned in section 3.4.4, it must be explicitly specified at VkDevice creation which queue families are intended to be used. After logical device creation, queues can be accessed via vkGetDeviceQueue. [16]

3.6 Window System Integration

Since Vulkan is a platform-agnostic low-level hardware abstraction, the API core itself does not provide a mechanism to present images onto a screen. In order to display rendered images, a set of instance-level and device-level extensions must be explic- itly enabled. The instance-level extension VK_KHR_surface provides a VkSurfaceKHR object, which represents a logical abstraction of a native platform surface. An addi- tional instance-level extension must be enabled to connect the generic surface object VkSurfaceKHR to the platform specific window system (e.g. VK_KHR_win32_surface for Windows platforms). After enabling the VkSurfaceKHR extension and the appropriate platform surface extension at instance creation, the surface can be created with vkCreateWin32SurfaceKHR on Windows platforms (similar for other platforms, e.g. vkCreateAndroidSurfaceKHR on Android platforms). [19] [20]

For presenting images onto the surface, a swapchain provided by the device-level extension VK_KHR_swapchain is needed. Swapchains in Vulkan are responsible for displaying rendered images and must be created explicitly by the programmer. The Vulkan core API does not provide a swapchain (or default framebuffer as OpenGL does) itself. Even though Vulkan is a graphics API, not every use case needs a graphics output, such as GPGPU accelerated applications. Further, different operating systems provide different window systems for displaying images onto a monitor. However, Vukan is a platform-agnostic API. That’s why surfaces and swapchains are provided by extensions and must be created explicitly and OS dependent by the programmer if a monitor output is desired. In order to create a swapchain, a device needs to be selected, which possesses a queue family with the capability of displaying images to a window surface, but not every queue family supports presentation to a given surface. Hence, the device-extension function vkGetPhysicalDeviceSurfaceSupportKHR verifies the presentation support for a given logical device, surface and queue. [19] [16]

As mentioned, swapchains are responsible for the display output and holds a set of images for presenting them. To present something on the window surface, the application acquires an image from the swapchain, renders to the image and returns it afterwards back to the swapchain. A swapchain image is acquired with vkAcquireNextImageKHR and returned back as a present request to the swapchain with vkQueuePresentKHR. At present request with vkQueuePresentKHR a queue that supports presentation must be specified. The swapchain provides four different presentation modes specifying in which condition the images are presented. By selection of an appropriate mode, screen tearing 1 can be prevented. Since not every device supports all four presentation modes, available modes must be checked in advance. The presentation modes are presented in the following. [19] [16] [20]


In the immediate presentation mode the swapchain holds exactly one presentable image and immediately displays the image at appli- cation’s present request. At high frame rates tearing may be noticeable. Figure 3.2 shows an immediate presentation scenario. The presen- tation engine symbolizes the swapchain with the surface resp. monitor. The swapchain holds seven images. Images 1, 5 and 7 are acquired by the application. Images 2, 3 and 6 are unused and ready to acquire. Image 4 is currently present on the monitor. Once the application calls a present request on one of the acquired images, image 4 is immediately displaced by the acquired image on which the present request is called. Simultaneously, im- age 4 is moved to the unused image pool of the swapchain. [14] [20]


In this mode, the swapchain holds a queue of presentable images and selects the image to display according to FIFO rule. Tearing does not occur, since the swapchain is waiting for v-sync 1 signals of the monitor before replac- ing the displayed image by an image from the queue. Figure 3.3 shows a FIFO presentation mode example. Images 1, 5 and 7 are acquired by the application. Images 2 and 3 are unused and ready to acquire. Image 4 is currently present on the monitor. Image 6 is waiting in the FIFO-queue for being displayed. Once the application calls a present request on an acquired image, this image is added to the FIFO-queue. After v-sync is signalled, image 4 is displaced by image 6, the first (and only) image in the queue. Simultaneously, image 4 is moved to the unused image pool of the swapchain. [14] [20]

Abbildung in dieser Leseprobe nicht enthalten

Figure 3.2: Immediate Mode [20]

Abbildung in dieser Leseprobe nicht enthalten

Figure 3.3: FIFO Mode [20]

FIFO Relaxed

This mode is similar to FIFO, but the presented image is immediately released after the first v-sync signal since it was displayed. The FIFO Relaxed mode may expose tearing when the refresh rate of the monitor exceeds the framerate of the application. Hence, this mode is only reasonable for applications with high framerates. [20]


The Mailbox mode is similar to FIFO, but only one presentable image is wait- ing for being displayed. If the appli- cation calls a present request on an ac- quired image, the image that waits for being presented is replaced by the image on which the present call was executed. Figure 3.4 shows such a mailbox presen- tation scenario where image 6 is moved to the unused image pool instead of be- ing presented, because the present request by the application on an acquired image occured before the v-sync signal. [14] [20]

Abbildung in dieser Leseprobe nicht enthalten

Figure 3.4: Mailbox Mode [20]

The presentation of images needs additional synchronization between swapchain and logical device queue with Semaphores [16] . Synchronization is covered in sectionb 3.15.

3.7 Command Buffers

Command buffers, represented by VkCommandBuffer objects, are containers holding a set of execution commands. Command buffers must be allocated from command pools (VkCommandPool). All kind of work for GPUs is submitted as command buffers via VkQueue objects to a device. In contrast to OpenGL, where commands are implicitly collected by the driver and immediately transferred to the GPU, multiple Vulkan GPU commands might be recorded in advance into a single VkCommandBuffer and can be submitted all at once. Recorded command buffers can afterwards be reused as often as intended. Hence, command buffers relieves the CPU a lot by summarizing multiple GPU commands into a VkCommandBuffer object, which can be submitted to the GPU with only one CPU call. Vulkan command record functions have the prefix vkCmd.

Command buffers can be recorded concurrently and submitted via multiple threads. Further, Vulkan provides secondary and primary command buffers. Multiple secondary command buffers can be recorded in multiple threads and finally, all concurrently recorded secondary command buffers can be submitted within a primary command buffer to the GPU at once. However, every thread needs its own command pool to allocate command buffers from. [14] [16]

3.8 Render Passes

Render passes in Vulkan define the scope of one or multiple rendering commands as a VkRenderPass object which consists of a set of attachments and subpasses with dependencies between these subpasses. A render pass specifies how attachments are used by its subpasses. By means of render passes the Vulkan driver is able to setting up its hardware in advance, such that the rendering operations are executed under optimal conditions. The attachments of a VkRenderPass object specify all attachments that are used in at least one of the subpasses as a framebuffer attachment (input, color, depth/stencil). Further, the initial and final image layouts of each attachment must be specified along with the image format and number of samples. Image formats and layouts in Vulkan are covered later in section 3.14. By specifying the initial and final layouts of the attachments, the driver implicitly executes image layout transitions on the attachments at the beginning and end of the render pass. These transitions are necessary to have the attachments in the right format at the start of the render pass resp. to set up the layouts of the attachments for purposes later when the render pass has finished. A render pass must contain at least one subpass. A subpass holds references of the attachments which are intended to be used by the subpass. Further, the usage of each attachment must be specified (input, color, depth/stencil) together with its image layout. Again the driver implicitly performs an image layout transition to the specified layout before entering the subpass execution. Subpass dependencies synchronize the access to attachments between subpasses resp. between a subpass and the entry or exit of the render pass. Hence, at least two subpass dependencies must be specified. As an example, if one subpass is writing to an attachment while the next subpass wants to read from the same attachment, a subpass dependency synchronizes the two subpasses by blocking the read access for the second subpass until the first subpass has finished writing to the attachment. [16] [21]

3.9 Framebuffers

Vulkan framebuffers, represented by VkFrameBuffer objects, are closely connected to render passes. The attachments used in a render pass are enveloped as a set of VkImageView (3.14) references in a VkFrameBuffer object. Since a framebuffer always acts in conjunction with a specific render pass, a VkRenderPass reference must be specified at VkFrameBuffer creation. [16]

3.10 Pipelines

A Vulkan pipeline specifes what the GPU is claimed/demanded to do in a render or compute operation by encompassing a programmable shaderpipeline and a set of configurable function states along with a render pass reference in a VkPipeline object. By means of a pipeline object, the GPU is aware of almost all configurations of a render or compute operation in advance. VkPipeline objects are immutable except for some few configurable dynamic function states. If a shader or a state in the fixed function configuration needs to be marginally switched, a separate VkPipeline object must be created. Hence, many pipeline objects need to be created in advance for all different combinations of shaders and configurations. Pipelines are separated into graphics pipelines and compute pipelines. Graphics pipelines represent common draw operations with vertex input assembler, graphics shader pipelines and rasterization while compute pipelines define offscreen compute operations with a single compute shader module without vertex input assembler or rasterization. [14] [16]

The programmable shaderpipeline of graphics or compute pipeline objects is similar to the OpenGL shaderpipeline with the shader stages vertex, tessellation, geometry, fragment and compute. The difference is that Vulkan uses precompiled shader bytecode instead of GLSL shader files as OpenGL does. Vulkan shader stages with its precompiled bytecode are represented as VkShaderModule objects. At VkPipeline creation multiple (or one for compute pipelines) VkShaderModule objects are specified as a shaderpipeline. The configurable function state consists of a set of pipeline configuration state objects. Every configuration state object must be explicitly specified even if the programmer doesn’t take care about some of them. These configuration state objects are listed and briefly explained below:

- Vertex Input State specifies the layout and format of the vertex buffer.
- Vertex Input Assembly State defines the topology (points, lines, triangles etc.) of the vertices.
- Rasterization State adjusts the configuration of the rasterizer auch as the polygon cullmode.
- Color Blend State specifies the color and alpha blending functions to be applied.
- Multisample State sets the number of samples together with some additional multisampling configurations.
- Viewport State sets up the viewport size and offset.
- Depth and Stencil State enables or diables depth and stencil test together with compare function parameters.
- Tessellation State defines the control point number for tessellation patches (Ignored if tessellation is disabled).
- Dynamic State enables dynamic state functions. Even though pipeline objects must be configured in advance and cannot be modified afterwards, a small set of states can be dynamically changed during command buffer execution. These modifiable states must be explicitly enabled within the dynamic state object.

In addition to the configuration state objects a pipeline layout must be specified at VkPipeline creation. The pipeline layout contains a set of descriptor set layouts and optionally informations about a push constants block. Descriptors and push constants are covered in the following sections 3.11 and 3.12. [22] [16]

Figure 3.5 illustrates the Vulkan pipeline with its programmable shader stages and configureable functions stages as a block diagram.

Abbildung in dieser Leseprobe nicht enthalten

Figure 3.5: Vulkan Pipeline Block Diagram [23]

The block diagram shows the Vulkan graphics/compute pipeline as a block diagram where one block element represents a specific pipeline stage or resource. The left block flow diagram from Draw to Color/Blending Operations portrays the graphics pipeline while the right block flow diagram from Dispatch to Compute Shader illustrates the compute pipeline. An indirect buffer is an optional resource for the fixed function stages Draw resp. Dispatch. Indirect buffers are needed for indirect draw and dispatch operations, where draw resp. dispatch parameters are specified via buffers. As mentioned, graphics pipelines possess an Input Assembler with a vertex buffer and/or index buffer as resource. Descriptor sets (3.11) and push constants (3.12) can be used as resources in all programmable stages of graphics and compute pipelines. The graphics pipeline is capable to access framebuffer input attachments at the programmable Fragment Shader stage. The Pre-Fragment Operations stage verifies the primitive fragments of the current pipeline against existing values in framebuffer’s Depth/Stencil Attachment (depth/stencil test) and discards or passes a fragment depending on the specified configuration of the pipeline’s Depth and Stencil State. In the Post-Fragment Operatins stage the depth and/or stencil values are written to the Depth/Stencil Attachment. Colors are written into one or multiple Color Attachments at the Color/Blending Operations stage. [23] [14]

3.11 Descriptors

Vulkan descriptors represent shader resources. A descriptor consists of a descriptor set and a descriptor set layout. The descriptor set layout specifies which type of resources (like buffers, images or samplers) are contained in the related descriptor set. The descriptor set holds references to the actual resource data. A shaderpipeline is capable of using multiple descriptor sets by specifying the descriptor sets in the programmable shader stages with indices in the preserved order as specified in the related pipeline layout. The descriptor set layouts of the descriptor sets used by the shaderpipeline must be added to the related pipeline layout. [16] [24]

Figure 3.6 illustrates the relation of descriptor set layouts and pipeline layouts.

Abbildung in dieser Leseprobe nicht enthalten

Figure 3.6: Descriptor Set Layout and Pipeline Layout [24]

The »Alpha« descripor set layout specifies the usage of a uniform buffer, a storage buffer and an image view in the related descriptor set. The »Beta« descriptor set contains a uniform buffer. »PipelineLayout B« specifies descriptor set layout »Alpha« and »Beta« which implies the usage of two uniform buffers, one storage buffer and one image view by the related pipeline.

3.12 Push Constants

Push constants are small blocks of raw data that can be passed to shaders in a very straighforward manner. Unlike uniform and storage buffers push constants are provided directly to shaders by the single command vkCmdPushConstants during command buffer recording without encapsulating the data in a VkBuffer object and referencing it in a descriptor set. However, the maximum data block size of push constants is highly limited compared to uniform or storage buffers, which can hold megabytes of data. Vulkan drivers force GPUs to offer push constants with a size of at least 128 bytes.

The Titan Xp supports a maximum push constants size of 256 byte. In contrast to uniform buffers, the push constants data cannot be altered after the command buffer was recorded. Nevertheless, if a command buffer is rerecorded, updating data with push constants is more performant than updating uniform buffers. [16] [24]

3.13 Buffers

Buffers in Vulkan are linear memory areas for arbitrary data that is accessible by the GPU and represented by VkBuffer objects. As mentioned in section 3.4.3, Vulkan differentiates between three kinds of physical memory. Allocating buffers on different kinds of memory exposes different limitations and advantages for the usage of these buffers. Buffers allocated on discrete video memory (device local) are accessed much faster by the GPU than buffers in system memory (host local). However, the host cannot access discrete video memory directly and hence the host is not able to write to it. To circumvent this limitation, the host writes the data to a buffer in a host visible memory, also called staging buffer in this context, and afterwards the device copies the data in the host visible memory with the vkCmdCopyBuffer to a preallocated device local memory buffer. [16]

VkBuffer objects are used as vertex- and index-buffers for the input assembler of graphics pipelines. Further VkBuffers serve as read and write resources for shaders as uniform and storage buffers. Uniform and storage buffers can be much larger than push constants. The Titan Xp offers a maximum uniform buffer size of 65,536 bytes (= 0.065536 megabytes) and a maximum storage buffer size of around 4,295 megabytes. Uniform buffers provide read access to shaders and can be updated by the host directly if the buffers are located on host visible memory, otherwise a staging buffer is necessary for updating a uniform buffer. Storage buffers offer read and write operations to shaders and are commonly of much larger size than uniform buffers. [16] [14]

3.14 Images

Images in Vulkan are represented by VkImage objects. A VkImage object defines the size, format and layout of an image. Images are used as render targets by graphics pipelines. Before an image is capable to be used as a render target, the related VkImageView of the image must be created. VkImageView objects specify an explicit part of the image’s memory together with the format to properly read the image. The VkImageView object can be referenced as an attachment of a framebuffer what makes the image a render target. [16] [14]

Shaders can read/sample from images or write to them. Storage image descriptor types offer direct texel read/write access to shaders by referencing the VkImageView object of an image in a descriptor set. With a VkSampler reference along with a VkImageView reference in a descriptor set as a combined image sampler descriptor type, shaders can sample from an image, where the VkSampler object specifies sampling parameters to control the filtering and transformations of the retrieved color. [16]

An important property of images in Vulkan are image layouts.


1 OpenGL ES is the OpenGL congruent specification for embedded systems

1 Software company responsible for Vulkan runtime and SDK

1 Screen tearing is an effect where artifacts from multiple images are displayed at once.

1 Vertical synchronization prevents the display image data to be updated while the monitor builds up its screen.

Excerpt out of 75 pages


Design and Implementation of a Vulkan Engine
Case Study of Capabilities and Performance
Hamburg University of Technology
Catalog Number
ISBN (eBook)
ISBN (Book)
design, implementation, vulkan, engine, case, study, capabilities, performance
Quote paper
Fynn Flügge (Author), 2018, Design and Implementation of a Vulkan Engine, Munich, GRIN Verlag,


  • No comments yet.
Look inside the ebook
Title: Design and Implementation of a Vulkan Engine
Free Download! (PDF)

Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free