CS0051 - M1-Parallel Computing Hardware

MODULE 1
Parallel Computing Hardware

Module 1A
Sequential vs. Parallel Computing
CCS0049
CS Elective 1
Learning Objectives
To differentiate sequential from parallel computing

To identify and understand the different parallel computing architectures
CCS0049
CS Elective 1
Real Life Analogy of Parallel Computing
• A computer program is just a list of instructions that tells a computer what to do, like the
steps in a recipe that tell me what to do when cooking. Like a computer, we simply follow
those instructions to execute the program.
• So to execute the program or recipe to make a salad, we'll start by chopping some lettuce and
putting it on a plate. Then we'll slice up a cucumber and add it. Next we'll slice and add a few
chunks of tomato. We'll slice the onion and finally, we add the dressing.
• A single cook working alone in the kitchen is a single processor, executing this program in a
sequential manner.
• The program is broken down into a sequence of discrete instructions that we execute one
after another, and we can only execute one instruction at any given moment. There's no
overlap between them.
• This type of serial or sequential programming is how software has traditionally been written,
and it's how new programmers are usually taught to code, because it's easy to understand,
but it has its limitations.
• The time it takes for a sequential program to run is limited by the speed of the processor
and how fast it can execute that series of instructions.
• Each step takes some amount of time.

• Two cooks in the kitchen represent a system with multiple processors.
• We can break down the salad recipe and execute some of those steps in parallel.
– While person A chops the lettuce,
person B slices the cucumber.
– And when person A is done
chopping lettuce, he can slice the
tomatoes. And person B chops
the onion and finally he adds
some dressing.
• Working together, we've broke the recipe into independent parts that can be executed
simultaneously by different processors.
• Adding a second cook in the kitchen doesn't necessarily mean we'll make the salad twice as
fast, because having extra cooks in the kitchen adds complexity. We have to spend extra
effort to communicate with each other to coordinate actions.
• There might be times when a processor has to wait for the other one to finish a certain
step before moving on.
• Those coordination challenges are part of what make writing parallel programs harder than
simple sequential programs, but that extra work can be worth the effort because when done
right, parallel execution increases the overall throughput of a program.
– Enables breaking down large tasks to accomplish them faster
– Accomplish more tasks in a given amount of time.
Parallel Computing Architectures
• Parallel computing requires parallel hardware, with multiple processors to execute different
parts of a program at the same time.
• Flynn's Taxonomy distinguishes

four classes of computer
architecture based on two factors,
the number of concurrent
instruction or control streams,
and the number of data streams.
Flynn’s Taxonomy
The class names are usually written as four letter acronyms that indicate whether they have
single or multiple instruction streams and data streams.
1. Single Instruction, Single Data or SISD Architecture

– The simplest of these four classes
– A sequential computer with a single processor unit.
– At any given time, it can only execute one series of instructions and I can only act on one element of data at
a time
– It's simple, like an old computer.
Flynn’s Taxonomy
2. Single Instruction, Multiple Data, or SIMD Architecture
– A type of parallel computer with multiple processing units.
– All of its processors execute the same instruction at any given time, but they can each operate on different
data element. As an SIMD computer, two processors are both executing the same instruction and execute
those instructions in sync with each other.
– This type of SIMD architecture is well-suited for applications that perform the same handful of operations on
a massive set of data elements like image processing. And most modern computers use graphic processing
units or GPUs with SIMD instructions to do just that.
Flynn’s Taxonomy
3. Multiple Instruction, Single Data or MISD Architecture
– In a, each processing unit independently executes its own separate series of instructions.
– However, all of those processors are operating on the same single stream of data.
– MISD doesn't make much practical sense, so it's not a commonly used architecture.
Flynn’s Taxonomy
– Every processing unit can be executing a different series of instructions, and at the same time, each of those
processors can be operating on a different set of data.
– MIMD is the most commonly used architecture in Flynn's taxonomy, and you'll find it in everything from
multicore PCs to network clusters and supercomputers.
Flynn’s Taxonomy
– Now, that broad MIMD category is sometimes further subdivided into two parallel programming models,
which also have four letter names:
• Single Program, Multiple Data or SPMD Architecture
• Multiple Program, Multiple Data or MPMD Architecture

Flynn’s Taxonomy
Single Program, Multiple Data or SPMD Architecture

– Multiple processing units are executing a copy of the same single program simultaneously. However, they
can each use different data.
– That might sound a lot like the SIMD architecture, but it's different because although each processor is
executing the same program, they do not have to be executing the same instruction at the same time.
– The processors can run asynchronously and the program usually includes conditional logic that allows
different tasks within the program to only execute specific parts of the overall program.
– This is the most common style of parallel programming.
Flynn’s Taxonomy
Multiple Program, Multiple Data or MPMD Architecture

– If each of our processors is executing a different process, that represents the Multiple Program, Multiple Data
or MPMD model.
– Processors can be executing different, independent programs at the same time while also operating on
different data.
– Typically in this model, one processing node will be selected as the host or manager, which runs one
program that farms out data to the other nodes running a second program. Those other nodes do their work
and return their results to the manager.
– MPMD is not as common as SPMD, but it can be useful for some applications that lend themselves to
functional decomposition.
Module 1B
Shared vs. Distributed Memory
CCS0049
CS Elective 1
Learning Objectives
To differentiate shared from distributed memory
CCS0049
CS Elective 1
Memory
• It's important to understand how the memory is organized, and how the computer accesses
data.
• You could put a billion processors in a computer but if they can access memory fast enough to
get the instructions and data they need, then you won't gain anything from having all those
processors.
Memory
Memory Speed < Processor Speed

• Computer memory usually operates at a much slower speed than processors do and when
one processor is reading or writing to memory, that often prevents any other processors from
accessing that same memory element.
Memory
Memory
• There are two main memory architectures that exist for parallel computing:
– Shared Memory
– Distributed Memory
Memory
• Shared Memory
– In a shared memory
system, all processors
have access to the same
memory as part of a
global address space.
Memory
• Shared Memory
– Although each processor operates independently, if one processor changes a memory
location, all of the other processors will see that change.
– The term shared memory doesn't necessarily mean all of this data exists on the same
physical device. It could be spread across a cluster of systems.
– The key is that both processors see everything that happens in the shared memory space.
Memory
• Shared Memory
– Shared memory is often classified into one of two categories, which are based on how the
processors are connected to memory and how quickly they can access it:
• Uniform Memory Access or UMA System
– All of the processors have equal access to the memory, meaning they can access it equally fast.
– There are several types of UMA architectures, but the most common is a symmetric multiprocessing
system or SMP.
– An SMP system has two or more identical processors which are connected to a single shared
memory often through a system bus.
Memory
• Shared Memory
Memory
• Shared Memory
– In the case of modern multicore processors, which you find in everything from desktop computers to
cell phones, each of the processing cores are treated as a separate processor.
– In most modern processors, each core has its own cache, which is a small, very fast piece of
memory that only it can see, and it uses it to store data that it's frequently working with.
– However, caches introduce the challenge that if one processor copies a value from the shared main
memory, and then makes a change to it in its local cache, then that change needs to be updated
back in the shared memory before another processor reads the old value, which is no longer
current.
Memory
• Shared Memory
– Cache coherency, is an issue handled by the hardware in multicore processors.
Memory
• Shared Memory
Memory
• Shared Memory
• Nonuniform Memory Access or NUMA System
– This is often made by physically connecting multiple SMP systems together.
– The access is nonuniform because some processors will have quicker access to certain parts of
memory than others.
– It takes longer to access things over the bus. But overall, every processor can still see everything in
memory.
Memory
• Shared Memory
– Advantage:
• Easier for programming in regards to memory, because it's easier to share data between different parts
of a parallel program.
– Disadvantage:
• They don't always scale well.
• Adding more processors to a shared memory system will increase traffic on the shared memory bus, and
if you factor in maintaining cache coherency, it becomes a lot of communication that needs to happen
between all the parts.
• Shared memory puts responsibility on the programmer to synchronize memory accesses to ensure
correct behavior.
Memory
• Distributed Memory
Memory
– Advantage:
• It's scalable. When you add more processors to the system, you get more memory too.
• This structure makes it cost-effective to use commodity, off-the-shelf computers, and networking
equipment to build large distributed memory systems. Most supercomputers use some form of
distributed memory architecture or a hybrid of distributed and shared memory.
Memory
– Disadvantage:
• Each processor has its own local memory with its own address space, so the concept of a global
address space doesn't exist. All the processors are connected through a network, which can be as
simple as Ethernet.
• Each processor operates independently, and if it makes changes to its local memory, that change is not
automatically reflected in the memory of other processors.
• If a change is made to the data in my memory, the processor is oblivious to that change. It's up to the
programmer to explicitly define how and when data is communicated between the nodes in a distributed
system, and that's often a disadvantage.
References
• Kirk, D. (2016). Programming Massively Parallel Processors: A Hands-On Approach. USA:
Morgan Kaufmann
• Balaji, P. (2015). Programming Models for Parallel Computing (Scientific and Engineering
Computation). Massachusetts: The MIT Press
• Barlas, G (2015). Multicore and GPU Programming (An Integrated Approach). USA: Morgan
Kaufmann
• Stone, B. 2019, Parallel and Concurrent Programming with Java 1, LinkedIn Learning, viewed
31 March 2020, <https://www.linkedin.com/learning/parallel-and-concurrent-programming-
with-java-1>.

CS0051 - M1-Parallel Computing Hardware

Uploaded by

CS0051 - M1-Parallel Computing Hardware

Uploaded by

MODULE 1

Parallel Computing Hardware

Sequential vs. Parallel Computing

To differentiate sequential from parallel computing

• Each step takes some amount of time.

• Flynn's Taxonomy distinguishes

1. Single Instruction, Single Data or SISD Architecture

• Single Program, Multiple Data or SPMD Architecture

• Multiple Program, Multiple Data or MPMD Architecture

Single Program, Multiple Data or SPMD Architecture

Multiple Program, Multiple Data or MPMD Architecture

Shared vs. Distributed Memory

To differentiate shared from distributed memory

Memory Speed < Processor Speed

You might also like