UNIT V Parallel Programming Patterns in CUDA (T2 Chapter 7) - P P With CUDA
UNIT V Parallel Programming Patterns in CUDA (T2 Chapter 7) - P P With CUDA
Chapter 7
(Learn CUDA Programming)
Ajeet K Jain
CSE, KMIT, Hyderabad
We will cover parallel programming algorithms that will help us to
understand how to parallelize different algorithms and optimize
CUDA. The techniques covered can be applied to a variety of
problems:
Any N-body simulation is a simulation of the dynamical system that evolves under
the influence of physical forces. Numerical approximation is done as the bodies
continuously interact with each other. N-body simulation is done extensively in
physics and astronomy, for example, so that scientists can understand the
dynamics of particles in the Universe. Nbody simulations are used in many other
domains, including computational fluid dynamics in order to understand turbulent
fluid flow simulation.
A relatively easy method for solving N-body simulation is to make use of a brute-
force technique that has O(N2) complexity. This approach is embarrassingly
parallel in nature.
There are various optimizations at algorithmic scale that can reduce the compute
complexity. Instead of applying all-pairs to the whole simulation, it can be used to
determine forces in close-range interactions. Even in this case, creating a kernel
for solving the forces on CUDA is very useful as it will also improve the
performance of far-field components. Accelerating one component will offload
work from the other components, so the entire application benefits from
accelerating one kernel.
Histogram calculation
Dynamic parallelism allows the threads within a kernel to launch new kernels
from the GPU without returning control back to the CPU. The word dynamic
comes from the fact that it is dynamically based on the runtime data. Multiple
kernels can be launched by threads at once. The following diagram simplifies
this explanation:
Dynamic parallelism guidelines and constraints
Though dynamic parallelism provides us with an opportunity to port algorithms
such as Quicksort on GPU, there are some fundamental rules and guidelines
that need to be followed.