Parallel and Distributed Computing
Parallel and Distributed Computing
Computing
Lecture – 01 Introduction
Background – Serial Computing
• Computer software was written conventionally for serial computing.
• Standard computing is also known as "serial computing"
• This meant that to solve a problem, an algorithm divides the
problem into smaller instructions.
• These discrete instructions are then executed on the Central
Processing Unit of a computer one by one.
• Only after one instruction is finished, next one starts.
Background – Serial Computing
Background – Serial Computing
Background – Serial Computing
• A real-life example of this would be people standing in a queue
waiting for a movie ticket and there is only a cashier. The cashier is
giving tickets one by one to the persons. The complexity of this
situation increases when there are 2 queues and only one cashier.
• So, in short, Serial Computing is following:
• In this, a problem statement is broken into discrete instructions.
• Then the instructions are executed one by one.
• Only one instruction is executed at any moment of time.
Background – Serial Computing
• Look at point 3. This was causing a huge problem in the computing industry
as only one instruction was getting executed at any moment of time. This was
a huge waste of hardware resources as only one part of the hardware will be
running for particular instruction and of time.
• As problem statements were getting heavier and bulkier, so does the amount
of time in execution of those statements. Examples of processors are Pentium
3 and Pentium 4.
• Now let’s come back to our real-life problem. We could definitely say that
complexity will decrease when there are 2 queues and 2 cashiers giving
tickets to 2 persons simultaneously. This is an example of Parallel Computing.
Parallel Computer
• Virtually all stand-alone computers today are parallel from a hardware
perspective:
- Multiple functional units (L1 cache, L2 cache, branch, prefetch, decode,
floating-point, graphics processing (GPU), integer, etc.)
- Multiple execution units/cores
- Multiple hardware threads
Parallel Computer
Parallel Computer
• Networks connect multiple stand-alone computers (nodes) to make larger
parallel computer clusters.
Parallel Computing
• Kind of computing architecture where the large problems break into
independent, smaller, usually similar parts that can be processed in one go.
• It is done by multiple CPUs communicating via shared memory, which
combines results upon completion.
• It helps in performing large computations as it divides the large problem
between more than one processor.
Parallel Computing
Parallel Computing
Parallel Computing
• Parallel computing also helps in faster application processing and task
resolution by increasing the available computation power of systems.
• The parallel computing principles are used by most supercomputers employ
to operate.
• The operational scenarios that need massive processing power or
computation, generally, parallel processing is commonly used there.
Parallel Computing
• Typically, this infrastructure is housed where various processors are installed
in a server rack; application server distributes the computational requests
into small chunks then requests are processed simultaneously on each
server.
• The earliest computer software is written for serial computation as they are
able to execute a single instruction at one time, but parallel computing is
different where it executes several processors an application or computation
in one time.
Parallel Computing – Flynn’s Taxonomy
• Best known classification scheme for parallel computers.
• Depends on parallelism it exhibits with its
- Instruction stream
- Data stream
• A sequence of instructions (the instruction stream) manipulates a sequence
of operands (the data stream)
• The instruction stream (I) and the data stream (D) can be either single (S) or
multiple (M)
Parallel Computing – Flynn’s Taxonomy
• Four combinations: SISD, SIMD, MISD, MIMD
Parallel Computing – Flynn’s Taxonomy
• Serial Computer
• Single-CPU systems
- i.e., uniprocessors
- Note: co-processors don’t count as more processors
• Examples: older generation main frames, work stations, PCs
Parallel Computing – Flynn’s Taxonomy
• On a memory access, all active processors must access the same location in
their local memory.
• The data items form an array and an instruction can act on the complete array
in one cycle.
Parallel Computing – Flynn’s Taxonomy
- The commander barks out an order that all the active soldiers should do
and they execute the order synchronously.
Parallel Computing – Flynn’s Taxonomy
• Processors are asynchronous, since they can independently execute different programs
on different data sets.
• MIMD’s have been considered by most researchers to include the most powerful, least
restricted computers.
Parallel Computing – Flynn’s Taxonomy
• It can take advantage of non-local resources when the local resources are
finite.
- Most of the parallel work performs operations on a data set, organized into a common
structure, such as an array
- A set of tasks works collectively on the same data structure, with each task working on
a different partition
- Tasks perform the same operation on their partition
- On shared memory architectures, all tasks may have access to the data structure through
global memory. On distributed memory architectures the data structure is split up and
resides as "chunks" in the local memory of each task.
Parallel Computing – Parallel Programming Model
Data Parallel Model
Parallel Computing – Parallel Programming Model
• Hybrid
- combines various models, e.g. MPI/OpenMP
• Single Program Multiple Data (SPMD)
- A single program is executed by all tasks simultaneously
Petaflops =
Petabytes =
Parallel Computing – Why?
• Provide Accuracy
- Single compute resource can only do one thing at a time.
Multiple compute resources can do many things simultaneously.
- Example: Collaborative Networks provide a global venue where
people from around the world can meet and conduct work
"virtually".
Parallel Computing – Why?
• Provide Accuracy
- Single compute resource can only do one thing at a time.
Multiple compute resources can do many things simultaneously.
- Example: Collaborative Networks provide a global venue where
people from around the world can meet and conduct work
"virtually".
Parallel Computing – Why?
• Take Advantage of non-local Resources
- Using compute resources on a wide area network, or even the
Internet when local compute resources are scarce or insufficient.
- Example: SETI@home (setiathome.berkeley.edu) has over 1.7
million users in nearly every country in the world. (May, 2018).
Parallel Computing – Why?
• BETTER USE OF UNDERLYING PARALLEL HARDWARE
- Modern computers, even laptops, are parallel in architecture with multiple
processors/cores.
- Parallel software is specifically intended for parallel hardware with
multiple cores, threads, etc.
- In most cases, serial programs run on modern computers "waste"
potential computing power.
Parallel Computing – Who is using?
• Science and Engineering
- Historically, parallel computing has been considered to be "the high end of
computing", and has been used to model difficult problems in many areas
of science and engineering:
- Atmosphere, Earth, Environment
- Physics - applied, nuclear, particle, condensed matter, high pressure,
fusion, photonics
- Bioscience, Biotechnology, Genetics
- Chemistry, Molecular Sciences
Parallel Computing – Who is using?
• Science and Engineering
- Geology, Seismology
- Mechanical Engineering - from prosthetics to spacecraft
- Electrical Engineering, Circuit Design, Microelectronics
- Computer Science, Mathematics
- Defense, Weapons
Parallel Computing – Who is using?
• Industrial and Commercial
- • Today, commercial applications provide an equal or greater driving force
in the development of faster computers. These applications require
- the processing of large amounts of data in sophisticated ways. For
example:
- "Big Data",data mining
- Artificial Intelligence (AI)
- Oil exploration
Parallel Computing – Who is using?
• Industrial and Commercial
- Web search engines, web based business services
- Medical imaging and diagnosis
- Pharmaceutical design
- Financial and economic modeling
- Management of national and multi-national corporations
- Advanced graphics and virtual reality, particularly in the entertainment
industry
- Networked video and multi-media technologies
- Collaborative work environments
Parallel Computing – Who is using?
• Global Applications
- Parallel computing is now being used extensively around the world, in a
wide variety of applications
Parallel Computing – Limitations
• It addresses such as communication and synchronization between multiple
sub-tasks and processes which is difficult to achieve.
• The algorithms must be managed in such a way that they can be handled in a
parallel mechanism.
• The algorithms or programs must have low coupling and high cohesion. But
it’s difficult to create such programs.
Processor (CPU)
Memory Input-Output
Control Unit
ALU
Communicate with
Store data and program
"outside world", e.g.
• Screen
Execute program • Keyboard
• Storage devices
• ...
Do arithmetic/logic operations
requested by program
von Neumann Computer
• All computers more or less based on the same basic design, the Von Neumann
Architecture!
Personnel Computer
Motivations for Parallel Computing
• Fundamental limits on single processor speed
• Heat dissipation from CPU chips
• Disparity between CPU & memory speeds
• Distributed data communications
• Need for very large scale computing platforms
Motivations for Parallel Computing
• Fundamental limits – Cycle Speed
- Intel 8080 2MHz 1974
- ARM 2 8MHz 1986
- Intel Pentium Pro 200MHz 1996
- AMD Athlon 1.2GHz 2000
- Intel QX6700 2.66GHz 2006
- Intel Core i7 3770k 3.9GHz 2013
- Speed of light: 30cm in 1ns
Moore’s Law
• Moore’s observation in 1965: the number of transistors per square
inch on integrated circuits had doubled every year since the
integrated circuit was invented
• Moore’s revised observation in 1975: the pace was slowed down a
bit, but data density had doubled approximately every 18 months
• How about the future? (price of computing power falls by a half
every 18 months?)
Moore’s Law – Held for Now
Power Wall Effect in Computer Architecture
• Too many transistors in a given chip die area
• Tremendous increase in power density
• Increased chip temperature
• High temperature slows down the transistor switching rate and the overall
speed of the computer
• Chip may melt down if not cooled properly Efficient cooling systems are
expensive
Cooling Computer Chips
• Some people suggest to put computer chips in liquid nitrogen to cool them
Solutions
• Use multiple inexpensive processors A processor with multiple cores
A Multi-Core Processor
CPU and Memory Speed
• In 20 years, CPU speed (clock rate) has increased by a factor of 1000
• DRAM speed has increased only by a factor of smaller than 4
• How to feed data faster enough to keep CPU busy?
• CPU speed: 1-2 ns
• DRAM speed: 50-60 ns
• Cache: 10 ns
Memory Access and CPU Speed
CPU, Memory and Disk Speed
Possible Solutions
• A hierarchy of successively fast memory devices (multilevel caches)
• Location of data reference (code)
• Efficient programming can be an issue
• Parallel systems may provide
1.) larger aggregate cache
2.) higher aggregate bandwidth to the memory system
Parallel Computing – Useful Terms
• Concurrent - Events or processes which seem to occur or progress at the
same time.
• Parallel –Events or processes which occur or progress at the same time
• Parallel programming (also, unfortunately, sometimes called concurrent
programming), is a computer programming technique that provides for the
execution of operations concurrently, either
- within a single parallel computer OR
- across a number of systems.
• In the latter case, the term distributed computing is used.
Parallel Computing – Flynn’s Taxonmy
• Best known classification scheme for parallel computers.
• Depends on parallelism it exhibits with its
- Instruction stream
- Data stream
• A sequence of instructions (the instruction stream) manipulates a sequence
of operands (the data stream)
• The instruction stream (I) and the data stream (D) can be either single (S) or
multiple (M)
Parallel Computing – Flynn’s Taxonmy
• Four combinations: SISD, SIMD, MISD, MIMD
Parallel Computing – Flynn’s Taxonmy
• Serial Computer
• Single-CPU systems
- i.e., uniprocessors
- Note: co-processors don’t count as more processors
• Examples: older generation main frames, work stations, PCs
Parallel Computing – Flynn’s Taxonmy
• On a memory access, all active processors must access the same location in
their local memory.
• The data items form an array and an instruction can act on the complete array
in one cycle.
Parallel Computing – Flynn’s Taxonmy
- The commander barks out an order that all the active soldiers should do
and they execute the order synchronously.
Parallel Computing – Flynn’s Taxonmy
• Processors are asynchronous, since they can independently execute different programs
on different data sets.
• MIMD’s have been considered by most researchers to include the most powerful, least
restricted computers.
Parallel Computing – Flynn’s Taxonmy