CS0051 - M1-Parallel Computing Hardware
CS0051 - M1-Parallel Computing Hardware
CCS0049
CS Elective 1
Learning Objectives
CCS0049
CS Elective 1
Sequential vs. Parallel Computing
Real Life Analogy of Parallel Computing
• A computer program is just a list of instructions that tells a computer what to do, like the
steps in a recipe that tell me what to do when cooking. Like a computer, we simply follow
those instructions to execute the program.
• So to execute the program or recipe to make a salad, we'll start by chopping some lettuce and
putting it on a plate. Then we'll slice up a cucumber and add it. Next we'll slice and add a few
chunks of tomato. We'll slice the onion and finally, we add the dressing.
Sequential vs. Parallel Computing
Real Life Analogy of Parallel Computing
• A single cook working alone in the kitchen is a single processor, executing this program in a
sequential manner.
• The program is broken down into a sequence of discrete instructions that we execute one
after another, and we can only execute one instruction at any given moment. There's no
overlap between them.
• This type of serial or sequential programming is how software has traditionally been written,
and it's how new programmers are usually taught to code, because it's easy to understand,
but it has its limitations.
Sequential vs. Parallel Computing
Real Life Analogy of Parallel Computing
• The time it takes for a sequential program to run is limited by the speed of the processor
and how fast it can execute that series of instructions.
• We can break down the salad recipe and execute some of those steps in parallel.
– While person A chops the lettuce,
person B slices the cucumber.
– And when person A is done
chopping lettuce, he can slice the
tomatoes. And person B chops
the onion and finally he adds
some dressing.
Sequential vs. Parallel Computing
Real Life Analogy of Parallel Computing
• Working together, we've broke the recipe into independent parts that can be executed
simultaneously by different processors.
• Adding a second cook in the kitchen doesn't necessarily mean we'll make the salad twice as
fast, because having extra cooks in the kitchen adds complexity. We have to spend extra
effort to communicate with each other to coordinate actions.
Sequential vs. Parallel Computing
Real Life Analogy of Parallel Computing
• There might be times when a processor has to wait for the other one to finish a certain
step before moving on.
• Those coordination challenges are part of what make writing parallel programs harder than
simple sequential programs, but that extra work can be worth the effort because when done
right, parallel execution increases the overall throughput of a program.
– Enables breaking down large tasks to accomplish them faster
– Accomplish more tasks in a given amount of time.
Sequential vs. Parallel Computing
Parallel Computing Architectures
• Parallel computing requires parallel hardware, with multiple processors to execute different
parts of a program at the same time.
CCS0049
CS Elective 1
Learning Objectives
CCS0049
CS Elective 1
Shared vs. Distributed Memory
Memory
• It's important to understand how the memory is organized, and how the computer accesses
data.
• You could put a billion processors in a computer but if they can access memory fast enough to
get the instructions and data they need, then you won't gain anything from having all those
processors.
Shared vs. Distributed Memory
Memory
– Shared Memory
– Distributed Memory
Shared vs. Distributed Memory
Memory
• Shared Memory
– In a shared memory
system, all processors
have access to the same
memory as part of a
global address space.
Shared vs. Distributed Memory
Memory
• Shared Memory
– Although each processor operates independently, if one processor changes a memory
location, all of the other processors will see that change.
– The term shared memory doesn't necessarily mean all of this data exists on the same
physical device. It could be spread across a cluster of systems.
– The key is that both processors see everything that happens in the shared memory space.
Shared vs. Distributed Memory
Memory
• Shared Memory
– Shared memory is often classified into one of two categories, which are based on how the
processors are connected to memory and how quickly they can access it:
• Uniform Memory Access or UMA System
– All of the processors have equal access to the memory, meaning they can access it equally fast.
– There are several types of UMA architectures, but the most common is a symmetric multiprocessing
system or SMP.
– An SMP system has two or more identical processors which are connected to a single shared
memory often through a system bus.
Shared vs. Distributed Memory
Memory
• Shared Memory
Shared vs. Distributed Memory
Memory
• Shared Memory
• Uniform Memory Access or UMA System
– In the case of modern multicore processors, which you find in everything from desktop computers to
cell phones, each of the processing cores are treated as a separate processor.
– In most modern processors, each core has its own cache, which is a small, very fast piece of
memory that only it can see, and it uses it to store data that it's frequently working with.
– However, caches introduce the challenge that if one processor copies a value from the shared main
memory, and then makes a change to it in its local cache, then that change needs to be updated
back in the shared memory before another processor reads the old value, which is no longer
current.
Shared vs. Distributed Memory
Memory
• Shared Memory
• Uniform Memory Access or UMA System
– Cache coherency, is an issue handled by the hardware in multicore processors.
Shared vs. Distributed Memory
Memory
• Shared Memory
Shared vs. Distributed Memory
Memory
• Shared Memory
• Nonuniform Memory Access or NUMA System
– This is often made by physically connecting multiple SMP systems together.
– The access is nonuniform because some processors will have quicker access to certain parts of
memory than others.
– It takes longer to access things over the bus. But overall, every processor can still see everything in
memory.
Shared vs. Distributed Memory
Memory
• Shared Memory
– Advantage:
• Easier for programming in regards to memory, because it's easier to share data between different parts
of a parallel program.
– Disadvantage:
• They don't always scale well.
• Adding more processors to a shared memory system will increase traffic on the shared memory bus, and
if you factor in maintaining cache coherency, it becomes a lot of communication that needs to happen
between all the parts.
• Shared memory puts responsibility on the programmer to synchronize memory accesses to ensure
correct behavior.
Shared vs. Distributed Memory
Memory
• Distributed Memory
Shared vs. Distributed Memory
Memory
• Distributed Memory
– Advantage:
• It's scalable. When you add more processors to the system, you get more memory too.
• This structure makes it cost-effective to use commodity, off-the-shelf computers, and networking
equipment to build large distributed memory systems. Most supercomputers use some form of
distributed memory architecture or a hybrid of distributed and shared memory.
Shared vs. Distributed Memory
Memory
• Distributed Memory
– Disadvantage:
• Each processor has its own local memory with its own address space, so the concept of a global
address space doesn't exist. All the processors are connected through a network, which can be as
simple as Ethernet.
• Each processor operates independently, and if it makes changes to its local memory, that change is not
automatically reflected in the memory of other processors.
• If a change is made to the data in my memory, the processor is oblivious to that change. It's up to the
programmer to explicitly define how and when data is communicated between the nodes in a distributed
system, and that's often a disadvantage.
References
• Kirk, D. (2016). Programming Massively Parallel Processors: A Hands-On Approach. USA:
Morgan Kaufmann
• Balaji, P. (2015). Programming Models for Parallel Computing (Scientific and Engineering
Computation). Massachusetts: The MIT Press
• Barlas, G (2015). Multicore and GPU Programming (An Integrated Approach). USA: Morgan
Kaufmann
• Stone, B. 2019, Parallel and Concurrent Programming with Java 1, LinkedIn Learning, viewed
31 March 2020, <https://www.linkedin.com/learning/parallel-and-concurrent-programming-
with-java-1>.