0% found this document useful (0 votes)
68 views

Parallel Computing Lecture # 6: Parallel Computer Memory Architectures

This document discusses parallel computer memory architectures, including shared memory, distributed memory, and hybrid architectures. Shared memory architectures allow all processors to access a global address space, but lack scalability. Distributed memory assigns separate memory to each processor requiring explicit communication between tasks. Hybrid architectures combine shared memory within nodes and distributed memory between nodes for scalability.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views

Parallel Computing Lecture # 6: Parallel Computer Memory Architectures

This document discusses parallel computer memory architectures, including shared memory, distributed memory, and hybrid architectures. Shared memory architectures allow all processors to access a global address space, but lack scalability. Distributed memory assigns separate memory to each processor requiring explicit communication between tasks. Hybrid architectures combine shared memory within nodes and distributed memory between nodes for scalability.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Parallel Computing

Lecture # 6

Parallel Computer Memory


Architectures
Shared Memory
General Characteristics:

• Shared memory parallel computers vary widely, but generally


have in common the ability for all processors to access all memory
as global address space.
Multiple processors can operate independently but share the
same memory resources.
Changes in a memory location effected by one processor are
visible to all other processors.
Shared memory machines can be divided into two main classes
based upon memory access times: UMA and NUMA.
Shared Memory (UMA)
Shared Memory (NUMA)
Uniform Memory Access
(UMA):
Most commonly represented today by Symmetric
Multiprocessor (SMP) machines
Identical processors
Equal access and access times to memory
Sometimes called CC-UMA - Cache Coherent UMA.
Cache coherent means if one processor updates a
location in shared memory, all the other processors
know about the update. Cache coherency is
accomplished at the hardware level.
Non-Uniform Memory
Access (NUMA)
Often made by physically linking two or more SMPs
One SMP can directly access memory of another
SMP
Not all processors have equal access time to all
memories
Memory access across link is slower
If cache coherency is maintained, then may also be
called CC-NUMA - Cache Coherent NUMA
Advantages:

Global address space provides a user-friendly


programming perspective to memory
Data sharing between tasks is both fast and
uniform due to the proximity of memory to CPUs
Disadvantages:
Primary disadvantage is the lack of scalability between
memory and CPUs. Adding more CPUs can geometrically
increases traffic on the shared memory-CPU path, and for
cache coherent systems, geometrically increase traffic
associated with cache/memory management.
Programmer responsibility for synchronization constructs
that ensure "correct" access of global memory.
Expense: it becomes increasingly difficult and expensive to
design and produce shared memory machines with ever
increasing numbers of processors.
Distributed Memory
General Characteristics:
Like shared memory systems, distributed memory systems
vary widely but share a common characteristic. Distributed
memory systems require a communication network to
connect inter-processor memory.
Processors have their own local memory. Memory
addresses in one processor do not map to another
processor, so there is no concept of global address space
across all processors.
Because each processor has its own local memory, it
operates independently. Changes it makes to its local
memory have no effect on the memory of other processors.
Hence, the concept of cache coherency does not apply.
Distributed Memory (cont.)
When a processor needs access to data in another
processor, it is usually the task of the programmer
to explicitly define how and when data is
communicated. Synchronization between tasks is
likewise the programmer's responsibility.
The network "fabric" used for data transfer varies
widely, though it can can be as simple as Ethernet.
Distributed Memory (cont.)
Distributed Memory (cont.)
Advantages:
Memory is scalable with number of processors.
Increase the number of processors and the size of
memory increases proportionately.
Each processor can rapidly access its own memory
without interference and without the overhead
incurred with trying to maintain cache coherency.
Cost effectiveness: can use commodity, off-the-
shelf processors and networking
Distributed Memory (cont.)
Disadvantages:
The programmer is responsible for many of the
details associated with data communication
between processors.
It may be difficult to map existing data structures,
based on global memory, to this memory
organization.
Non-uniform memory access (NUMA) times
Hybrid Distributed-Shared
Memory
The largest and fastest computers in the world today
employ both shared and distributed memory
architectures.
Hybrid Distributed-Shared
Memory (cont.)
The shared memory component is usually a cache
coherent SMP machine. Processors on a given SMP
can address that machine's memory as global.
The distributed memory component is the
networking of multiple SMPs. SMPs know only
about their own memory - not the memory on
another SMP. Therefore, network communications
are required to move data from one SMP to
another.
Hybrid Distributed-Shared
Memory (cont.)
Current trends seem to indicate that this type of
memory architecture will continue to prevail and
increase at the high end of computing for the
foreseeable future.
Advantages and Disadvantages: whatever is
common to both shared and distributed memory
architectures.

You might also like