Due to recent advancement in technology, one of the popular ways of achieving performance with respect to execution time of programs is by utilizing massive parallelism power of GPU-based accelerator computing along with CPU computing. In GPU- based accelerator computing, the data intensive or computationally intensive part is computed on the GPU whereas the simple yet complex instructions are computed on the CPU in order to achieve massive speedup in execution time of the computer program executed on the computer system.
In physics, especially in electromagnetism, Finite-Difference Time-Domain (FDTD) is a popular numerical analysis method, which is used to solve the set of Maxwells partial differential equations to unify and relate electric field with magnetic field. Since FDTD method is computationally intensive and has high level of parallelism in the computational implementation, for this reason for past few years researchers are trying to compute the computationally intensive part of FDTD methods on the GPU instead of CPU. Although computing parallelized parts of FDTD algorithms on the GPU achieve very good performance, but fail to gain very good speedup in execution time because of the very high latency between the CPU and GPU. Calculation results at each FDTD time-step is supposed to be produced and saved on the hard disk of the system. This can be called as data output of the FDTD methods, and the overlapping of data output and computation of the field values at next time step cannot be performed simultaneously. Because of this and latency gap between the CPU and GPU, there is a bottleneck in the performance of the data output of the GPU. This problem can be regarded as the inefficient performance of data input/output (I/O) of FDTD methods on GPU.
Hence, this project focuses on this aforementioned problem and addresses to find solutions to improve the efficiency of the data I/O of FDTD computation on GPGPU (General Purpose Graphics Processing Unit).
Table of Contents
1 Introduction
1.1 Computation in Electromagnetism
1.1.1 Maxwell’s Equations
1.1.2 Finite-Difference Time-Domain (FDTD)
1.2 Computational Parallelization Techniques & GPGPU
1.2.1 Parallel Computer Architecture
1.2.2 Parallel Algorithms & Programs
1.2.3 Emerging Parallelization Techniques: GPGPU
1.3 The Problem and The Objective
1.4 Thesis Overview
1.5 Original Contribution
2 Electromagnetism & Finite-Difference Time-Domain - Overview
2.1 Maxwell’s Equations
2.2 Finite-Difference Time-Domain (FDTD)
2.2.1 Frequency Dependent Material Parameters & Frequency Dependent FDTD
2.2.2 Boundary Condition
2.3 Summary of Maxwell’s Equations and FDTD Method
2.4 Computer Implementation of FDTD Method
2.4.1 Basics of FORTRAN 90 Programming
2.4.2 Implementation of FDTD Method
2.5 Advantages and Limitations of FDTD Computation
2.6 Concluding Remarks
3 Computation of FDTD on GPGPU using CUDA Programming
3.1 GPGPU - The Parallelization Monster and Computation Techniques
3.2 CUDA and CUDA Fortran
3.3 CUDA Implementation of FDTD Method for GPGPU Computation
3.4 Computation on Nvidia’s General Purpose GPU
3.4.1 GPU Hardware and support for FDTD Computation
3.4.2 Memory Coalescing
3.5 Execution of FDTD Method on GPU Hardware
3.6 Concluding Remarks
4 The Solution to The Problem
4.1 The Problem - Revisited
4.2 The Solution
4.3 Programmatic Implementation of the Solution
4.3.1 Implementation
4.3.2 Invoking Buffer Kernel
4.4 Possible Limitations and their Solutions
4.5 Concluding Remarks
5 Evaluation and Validation of The Solution
5.1 Testing of the Implemented Solution
5.1.1 Input Parameters for FDTD Computation
5.1.2 Hardware Environment
5.1.3 Test Results
5.2 Critical Analysis & Evaluation of Test Results
5.2.1 Speed-Up Analysis
5.2.2 Evaluation and Comments
6 Conclusion and Future Scope
6.1 Future Scope
6.2 Conclusion
A Survey Questions posted
A.1 Survey Questions posted via email and Researchgate OSN platform
Research Objectives and Themes
The primary objective of this thesis is to address the performance bottleneck in FDTD computations on GPGPU architectures, which is caused by the high latency of data transfers between the host CPU and the device GPU. By implementing a software-based buffer mechanism, the work aims to optimize memory access patterns and improve the efficiency of data input/output (I/O) during the simulation process.
- Analysis of parallel computing architectures and GPGPU performance constraints.
- Deep dive into the Finite-Difference Time-Domain (FDTD) method and its computational requirements.
- Development of a programmatic data buffer solution using CUDA Fortran.
- Critical evaluation and validation of the buffer technique through performance testing and speed-up analysis.
Excerpt from the Book
4.2 The Solution
The novel approach which is proposed in this thesis actually tries to optimize the memory access from the GPU to the main memory. This is achieved by introducing the concept of ’Buffer’ in the FDTD computation.
In computing, Data Buffer [44] can be implemented as a region on the physical memory storage used to temporarily store data, which may be moved from one place to another. In the world of High Performance Computing, when processes have on work on a piece of data, which has to be moved from one process to another, the technique to use data buffer is often used. The technique of data buffer is used while sending data from the sender to the receiver in order to hold the data being transferred. Buffers are very popular in implementation in input/output (I/O) of hardware devices and uses the First in, First out (FIFO) method to output the data. Another technique, which is also very popular in data transfer and holding the data for future usage, is called ’Cache’. But the basic difference between a buffer and a cache is that cache holds data, which has high possibility to be accessed again in near future, whereas, buffer only holds the data temporarily while the data is being transferred or communicated.
Since, the FDTD computation on General Purpose GPU can be imagined as a continuos sender-receiver computation, therefore this simple solution of introducing buffer for the data input/output (I/O) of FDTD computation improves the performance and optimizes the memory access by compensating the lantency gap between the memory host-GPU. When a program is computed on the CPU (host) and then the GPU kernel is launched to be able to compute on the GPU, then a CPU can be thought as a sender and the GPU can be thought as a receiver. In the same way, when the GPU completes the computation and sends back the computed data (output) to the CPU for further analysis or some other operation to be done by the CPU, then the GPU can be thought of as a sender and the CPU as a receiver.
Summary of Chapters
1 Introduction: Provides an overview of electromagnetic computation, the necessity for FDTD, and the role of GPGPU parallelization in addressing modern computational demands.
2 Electromagnetism & Finite-Difference Time-Domain - Overview: Details the theoretical basis of Maxwell’s Equations and the FDTD method, including discretization and implementation strategies.
3 Computation of FDTD on GPGPU using CUDA Programming: Explores the hardware architecture of NVIDIA GPUs and the use of CUDA kernels to parallelize FDTD simulations.
4 The Solution to The Problem: Introduces the buffer-based memory optimization strategy to mitigate latency issues between the host CPU and GPU.
5 Evaluation and Validation of The Solution: Presents the experimental setup and results, analyzing the speed-up achieved by the buffer technique.
6 Conclusion and Future Scope: Summarizes the thesis findings and suggests potential future improvements to the implemented methodology.
Keywords
data I/O, buffer, Finite difference methods, FDTD, time domain analysis, hardware, acceleration, high performance computing, parallel programming, parallel architectures, GPU, graphics processing unit, parallel computing, CUDA, multi-core computing
Frequently Asked Questions
What is the core focus of this research?
The work focuses on improving the efficiency of data input/output (I/O) operations for FDTD simulations running on GPGPU systems by addressing the high latency gap between the CPU and the GPU.
What are the primary themes of this study?
The key themes include parallel computing architectures, the Finite-Difference Time-Domain method, GPU memory management, and the implementation of software buffers to optimize data transfers.
What is the main objective of the thesis?
The objective is to develop and implement a buffer-based technique to minimize performance bottlenecks caused by excessive data movement between the host and the GPU during FDTD computations.
Which methodology is utilized for the implementation?
The author uses Fortran and CUDA Fortran to implement the FDTD algorithm and the proposed buffer kernels, running tests on a system equipped with an NVIDIA Tesla K20Xm GPU.
What does the main part of the thesis cover?
The main body examines the theoretical background of FDTD, explains the GPGPU parallelization model (CUDA), details the design of the buffer-based solution, and validates its performance through experimental benchmarks.
Which keywords best characterize this work?
The study is characterized by keywords such as GPGPU, FDTD, CUDA, data I/O, parallel computing, memory latency, and performance acceleration.
How does the proposed buffer improve performance?
The buffer reduces the need for constant, high-latency memory transfers between the GPU and the main memory by temporarily storing intermediate field data on the GPU device itself.
Are there any limitations to the buffer technique mentioned?
Yes, the thesis discusses potential issues such as buffer overflow and under-run, explaining that the software-implemented approach in this study effectively manages these through dynamic sizing.
What does the evaluation show?
The evaluation demonstrates that the implemented buffer technique results in a speed-up (approximately 1.03x to 1.09x) compared to versions without the buffer, though performance may vary for random excitation points.
- Arbeit zitieren
- Somdip Dey (Autor:in), 2014, Efficient Data Input/Output (I/O) for Finite Difference Time Domain (FDTD). Computation on Graphics Processing Unit (GPU), München, GRIN Verlag, https://www.grin.com/document/462250