0% found this document useful (0 votes)
52 views60 pages

Computer Architecture

Notes of Computer Architecture

Uploaded by

Mr X
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
52 views60 pages

Computer Architecture

Notes of Computer Architecture

Uploaded by

Mr X
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 60

Computer Architecture

100 DAYS CHALLENGE TEAM


100 Days Job Prep Challenge

Computer Architecture

EEE CSE

ICT ECE
©
100 DAYS
CHALLENGE
TEAM
SE ETE

IT CS

Enroll Now: CSE/IT Job Preparation Course


Content

Computer Architecture

Module 1 Basic Computer Fundamental

Generations of Computers
Type of Computer
Enterprise Computing
Data Center
Cloud Computing
Virtual Machine
Server and Work Station

Module 2 Architecture of Computer System

Computer Architecture
Architectural Unit
System Bus
Computer Performance
Performance Calculation Problems

Module 3 Processing unit

§ Processor instruction cycle


§ Processor performance
§ Processor Components
§ CISC RISC Processor
§ Pipelining
§ Related Math Problem

Module 4 Memory Organization

§ Introduction
§ Features of Memory
§ Classification of Memory
§ Primary or Main Memory
§ Secondary Memory
§ Cache Register
§ Related Math Problem

Module 5 Cache Memory


Cache Memory
Cache hit ratio
Chache mapping techniques
Related Math Problem
Preface
সরকারর চাকরর যেন যসানার হররণ সহজে ধরা রিজে চায়না। সরকারর চাকররর প্ররেজোরিো অজনক যেরি।
একটিমাত্র পজির েনয হাোর হাোর প্রার্থী প্ররেজোরিো কজর র্াজক। প্ররেজোরিোয় টিকজে হজে আপনাজক
অেিযই যকৌিেথ হজে হজে, প্রচু র প্রাকটিস করজে হজে, সজেীাচ্চ যোিযোর অরধকারথ হজে হজে।

আমাজির সমসযা

সাকুী োর আজস, চজে োয় রকন্তু কারষিে েে রমজে না। এর রপছজন রকছু কারণ আমরা প্রার্রমক ভাজে
রচরিে কজররছ । প্রর্মে এক্সাম যেি পরজে প্রস্তুরে যনওয়া শুরু করা এিা আমাজির একটি েড় প্রেজেম।
আমাজির ২য় সেজচজয় েড় সমসযা হয় পারজেক্ট ররজসাসী না পাওয়া । এর পজর যে সমসযা গুজো আজস
যসগুজো হজো রপ্রপাজরিন প্লান, স্ট্রাটিরে, রররভিন, িাইম মযাজনেজমন্ি ইেযারি।
“The first step in solving a problem is admitting there is a problem to be solved.”— Pete Seeger

সমসযা সমাধান

সাকুী োর যেসে রপ্রপাজরিন না রনজয় সকে CSE/IT েে এর েনয একটি সেীেনথন রপ্রপাজরিন আমাজির
১ম সমসযা যর্জক মুক্ত করজে। পরথিার রপ্রপাজরিজনর েনয পাজেী ক্ট ররজসাসী এর েজিয আমরা যচষ্টা
কররছ একটি রপ্ররময়াম ররজসাসী যসি তেরর করার। আমরা রেশ্বাস করর এই ররজসাসী োোজর প্রচরেে যেজকান
েই/ররজসাসী যর্জক যসরা হজে। একটি পাজেী ক্ট ররজসাসী আমাজির োরক সকে সমসযা রপ্রপাজরিন প্লান,
স্ট্রাটিরে, রররভিন, িাইম মযাজনেজমন্ি ইেযারি সমাধান কজর যেেজে ইনিা আল্লাহ .

100 days job Preparation

CSE/IT েে এর েনয গুরুত্বপূণী িরপক আমরা সোই োরন। রকন্তু রপ্রপাজরিন রনজে আপনাজক হজে হজে
যকৌিেথ এেং স্মািী । আমাজির ১০০ রিজনর রপ্রপাজরিন প্লান টি আপনাজক একটি Efficient &
Effective রপ্রপাজরিন রনজে সাহােয করজে। ১৪ সপ্তাজহর এই রপ্রপাজরিন প্লান টি সাোজনা হজয়জছ এমন
ভাজে োজে কজর আপরন একটি চাকু রথর পািাপারিও প্রস্তুরে রনজে পারজেন।

“এই ই-েুক/রপ্ররময়াম ররজসাসী টি এর স্বত্বারধকারথ একমাত্র ©100 DAYS CHALLENGE TEAM সংরিণ কজর।“
এই ররজসাসী টি এেথি করা, রেরি করা, যিয়ার করা(যেজকান মাধযজম) সম্পূণী রনজেধ

যকান ভু ে যচাজে পরজে আমাজির োনাজনার েনয অনুজরাধ রইে। ই-েুক/রপ্ররময়াম ররজসাসী টি
আমরা রনয়রমে আপজেি করজো । পরেেথী সংস্করণ আপজেি যপজে আমাজির গ্রুজপ েজয়ন
করুন এেং রনয়রমে ক্লাস, অনুিথেন, রেস্কািজন অনিগ্রহন করুন। ধনযোি।

Click Here to Join


Our Premium Resource

Computer Computer Computer Analog


Architecture Operating Network Communicati
System on

Digital Optical Database Linux


Communication & & Command
Telecommunication SQL

Data Programming Cyber Digital


Structure Security Logic
& Algorithm Circuit

Basic Advance
Electrical and Engineering
Electronics Technology

Click Icon to Join with Us


Collect All Resources: 01521331257
Gift and Bonus
আমাজির যকাজসী এনজরাে করার েনয ধনযোি৷ োরা আমাজির যকাসী টিজে েুক্ত হজয়জছন োজির েনযই আমাজির
এই হযান্ড যনাি/ ররজসাসী উপহার। আপনাজির েনয র্াকজে যরোজরন্স কযািেযাক ১০%। যসজিজত্র নেুন যকও
যররেজস্ট্রিন এর সময় যরোজরন্স পারসন এর ঘজর আপনার নাম এেং যমাোইে নাম্বার রিজে হজে।

Click Icon to Join with Us

 To Those Who are Enrolled the Course 


আমাজির যকাজসী এনজরাে করার েনয ধনযোি। এিা একটি যপইে ররজসাসী। হযান্ডজনাি/ ররজসাসী উপহার িা
আপনাজির কাজছ আমানে স্বরুপ। এই ররজসাসী টি এেথি করা, রেরি করা, যিয়ার করা(যেজকান মাধযজম)
আমানজের যেয়ানে হজে। যসজিজত্র আপনার যর্জক প্রাপ্ত প্ররে েন রহজসজে ৩০০ িাকা হাজর ঋনথ হজয় র্াকজেন।
আিা করর যোনাজসর সুজোি টি রনজেন এেং এই অসৎ কাে যর্জক রেরে র্াকজেন। ধনযোি।

 To Those Who are not Enrolled the Course 

আমাজির যকাজসী এনজরাে না কজরও ররজসাসী টি পড়ার েনয ধনযোি। আপনার প্ররে আহোন আমাজির গ্রুজপ রি
েজয়ন যহান। যকাজসী এনজরাে হজয় রনন আপজেটিে ররজসাসী হযান্ডজনাি/ ররজসাসী িা েুজে রনন। নেুো আপরনও
৩০০ িাকা ঋনথ হজয় র্াকজেন।

ঋন পররজিাধ করার উপায়: 01521331257 (রেকাি/নিি)


Module 1: Computer Type

Sr No. Generation Period Generation technology

1st Gen The period of first generation: 1946-1959 Vacuum tube based.

Examples: include ENIAC, EDVAC, UNIVAC, IBM-701, and IBM-650.

2nd Gen The period of second generation: 1959-1965 Transistor based.

Examples: IBM 1620, IBM 7094, CDC 1604, CDC 3600, UNIVAC 1108.

3rd Gen The period of third generation: 1965-1971 Integrated Circuit based

Examples: IBM-360 series, Honeywell-6000 series, PDP (Personal Data Processor), and IBM-
370/168.

4th Gen The period of fourth generation: 1971-1980 VLSI microprocessor based

Examples: STAR 1000, CRAY-X-MP (Super Computer), DEC 10, PDP 11, CRAY-1. This
generation of computers had the first “supercomputers”

5th Gen The period of fifth generation: 1980-onwards ULSI microprocessor based

Examples: Intel Pentium series, Core i3 – i10, AMD Athlon, etc.


Sr No Computer Type Specifications

PC (Personal Computer) It is a single user computer system having moderately


1
powerful microprocessor

It is also a single user computer system, similar to


2 Workstation personal computer however has a more powerful
microprocessor.

Mini Computer It is a multi-user computer system, capable of supporting


3
hundreds of users simultaneously.

• It is a multi-user computer system, capable of


supporting hundreds of users simultaneously. Software
technology is different from minicomputer.

• It is used primarily by large organizations for critical


4 Main Frame applications like bulk data processing for tasks such as
censuses, industry and consumer statistics, enterprise
resource planning, and large-scale transaction
processing.

• Servers are the example of mainframe.

• It is an extremely fast computer, which can execute


hundreds of millions of instructions per second.

• A supercomputer is a computer with so much


Supercomputer processing power that mainframes and commodity
5
servers don’t come close to matching it.

• Supercomputers tend to be designed for academic or


research purposes, rather than for hosting workloads
that you’d find in a typical business.

©100 Days challenge Join Our Job Prep group


Enterprise computing refers to the information technology infrastructure, systems, and applications that
businesses and large organizations use to manage and process their data on a large scale.
Some key components of enterprise computing include

Data center

VMWare Cloud

EnterPrise
Computing

Storage Servers

©100 Days challenge Join Our Job Prep group


Data centers:
Data centers are privately owned and operated
by companies, institutions, governments, or
other business entities. Enterprise facilities
provide internal data transactions and
processing, web-based services through either
intranets or extranets.

Disaster recovery (DR):


These are locations that organizations can
temporarily use after a disaster event, which
contain backups of data, systems, and other
technology infrastructure.

Access Layer Aggregation layer: Core Layer:


It also known as the edge Bridge between the It also known as the
layer, is the lowest tier in access layer and the Heart/backbone, is the
the three-tier data center core layer for controlling high-capacity, specifically
network architecture. It and shaping network designed to be highly
serves as the entry point traffic, implementing redundant and resilient.
for servers, storage policies, load balancing, This layer interlinks with
systems, and other quality of service (QoS), aggregation layer,
devices (end nodes) into packet filtering, and facilitating efficient traffic
the network. queuing. routing between them.

©100 Days challenge Join Our Job Prep group


Data center tiers are a system used to describe specific kinds of data center infrastructure in a consistent
way. Tier 1 is the simplest infrastructure, while Tier 4 is the most complex and has the most redundant
components. Each tier includes the required components of all the tiers below it.
Four Data Center Tiers

Tier 1: A Tier 1 data center has a single path for power and cooling and few, if any, redundant and
backup components. It has an expected uptime of 99.671% (28.8 hours of downtime annually).

Tier 2: A Tier 2 data center has a single path for power and cooling and some redundant and backup
components. It has an expected uptime of 99.741% (22 hours of downtime annually).

Tier 3: A Tier 3 data center has multiple paths for power and cooling and systems in place to update and
maintain it without taking it offline. It has an expected uptime of 99.982% (1.6 hours of downtime
annually).

Tier 4: A Tier 4 data center is built to be completely fault tolerant and has redundancy for every
component. It has an expected uptime of 99.995% (26.3 minutes of downtime annually).

A redundant data center architecture duplicates critical components—such as UPS systems, cooling
systems, backup generators and Networking devices—to ensure data center operations can continue even
if a component fails. There are Different redundancy model N, N+1, N+2, 2N, 2N+1.

N N The minimum capacity. N does not include any redundancy (single points of failure)

N+1 N+1 Support a failure or allow a single machine to be serviced.

N+2 N+2 redundancy design to provide two extra components

2N redundancy model creates a mirror image of infrastructure to provide full fault


2N
tolerance.

2N+1 delivers the fully fault-tolerant 2N architecture plus an extra component for an
2N+1
added layer of protection.

©100 Days challenge Join Our Job Prep group


Cloud computing is the use of computing resources (hardware and software) that are delivered
as a service over a network (typically the Internet).

Cloud
Model

Deployment Service
Models Models

Public Private Hybrid SaaS PaaS IaaS

The private cloud can be physically located at your organization’s


on-site datacenter, or it can be hosted by a third-party service
Private cloud provider. But in a private cloud, the services and infrastructure are
always maintained on a private network and the hardware and
software are dedicated solely to your organization.

The cloud resources (like servers and storage) are owned and
operated by a third-party cloud service provider and delivered over
Public Cloud the internet. With a public cloud, all hardware, software, and other
supporting infrastructure are owned and managed by the cloud
provider. Such as Amazon Web Services (AWS) or Microsoft Azure.

Hybrid clouds combine both private and public cloud models for
maximum flexibility. This is a common example of hybrid cloud:
Hybrid clouds Organizations can use private cloud environments for their IT
workloads and complement the infrastructure with public cloud
resources to accommodate occasional spikes in network traffic.

©100 Days challenge Join Our Job Prep group


• IaaS is on-demand access to cloud-hosted computing infrastructure—servers,
storage capacity and networking resources—that customers can provision,
configure and use in much the same way as they use on-premises hardware.
IaaS • Examples: Amazon Web Services, Google Cloud, IBM Cloud, Microsoft Azure

• PaaS provides a cloud-based platform for developing, running, managing


applications. The cloud services provider hosts manage and maintains all the
hardware and software included in the platform. Users access the PaaS through
a graphical user interface (GUI), where development or DevOps teams can
collaborate on all their work across the entire application lifecycle including
PaaS coding, integration, testing, delivery, deployment and feedback.
• Examples: AWS Elastic Beanstalk, Google App Engine, Microsoft Windows Azure
and Red Hat OpenShift on IBM Cloud.

• SaaS is cloud-hosted, ready-to-use application software. Users pay a monthly or


annual fee to use a complete application from within a web browser, desktop
client or mobile app. All of the infrastructure required to deliver it—servers,
storage, networking, middleware, application software, data storage—are
hosted and managed by the SaaS vendor.
SaaS • Example: Google service (Gmail, classroom, drive), Adobe Creative Suite,
Salesforce (customer relationship management software), HubSpot (marketing
software), Trello (workflow management), Slack (collaboration and messaging),
Canva (graphics).

©100 Days challenge Join Our Job Prep group


Virtualization is the process of creating a software-based, or "virtual" version of a computer, with
dedicated amounts of CPU, memory, and storage that are "borrowed" from a physical host
computer—such as your personal computer and/or a remote server such as a server in a cloud
provider's datacenter.
A hypervisor is a specialized software program that runs on the physical host and interacts with
both the host machine and the VM. VMs cant directly access these hardware resources. Theres
a layer of abstraction between a VM and its physical host. This abstraction layer is called a
hypervisor

Partitioning Run multiple operating systems on one physical machine.

Isolation Provide fault and security isolation at the hardware level.

Encapsulation Save the entire state of a virtual machine to files.

Hardware
Provision or migrate any virtual machine to any physical server.
Independence

©100 Days challenge Join Our Job Prep group


An Enterprise Server is a computer that stores programs serving the collective needs of an
enterprise rather than a single user or department. Historically, mainframe computers have
functioned as enterprise servers

Rack mounted shape is most conventional shape


servers comes in and it allows data center
technicians to stack 3-4 physical servers on top of
each other in a server rack. It is more seen in the
small to medium businesses that require multiple
servers but no more than five.

Tower shape is similar to desktop computer. Small


businesses tend to pick this shape over others,
because it is more cost effective than others,
especially if these businesses need one or two for
all their operations. Thus, shape is recommended
for them.

Blade shape or blade servers are a form factor


developed by IBM to create modular and efficient
design that allow them stack as much physical
servers as possible within a single server rack. Each
module is called a blade. Despite its modular and
efficient design, blade servers have its share of
cons.

• Web Server • DHCP Server


• File Server • Print Server
• Email Server • FTP Server
• Database Server • VoIP Server
• Application Server • Virtualization Server
• Proxy Server • Collaboration Server
• DNS Server • Backup Server

©100 Days challenge Join Our Job Prep group


S.NO Server Workstation

The device which responds the services for Perform dedicated task with having
1.
the client’s request is called server. enhanced features.

2. The example: FTP, Web, DNS etc. The example: Video, audio WS etc.

Operating system used in server are: Linux, Operating system used in workstation
3.
Solaris server and windows. are: Unix, Linux or Windows NT.

4. In server, (GUI) is optional. In workstation, (GUI) is installed.

5. A server cannot be a workstation. Whereas a workstation can be a server.

Scan this QR code to join our group


100 Days Job Prep Challenge 100 Days Job Prep Challenge
WhatsApp Group Telegram Group

©100 Days challenge Join Our Job Prep group


যেভাবে প্রস্তুতি তিবেি
এই সপ্তাবে আপতি এই চাপ্টার টি পড়বেি টিক কবর যেবেি। সপ্তাবের ৫ তিবি ৫ টি মতিউে যেষ কবর
যেবেি। সপ্তাবের োতক ২ তিি তেতভন্ন যকাবেি আন্সার সেভ করুি। তরতভেি তিি ।

Inviting You to Join Our Group Study Platform

“We believe these resources will be unparalleled”

Click Icon to Join with Us

Open Your Camera → Select QR code Scanner

◼◼◼ Let’s Start the Module!!!


যেভাবে প্রস্তুতি তিবেি
এই সপ্তাবে আপতি এই চাপ্টার টি পড়বেি টিক কবর যেবেি। সপ্তাবের ৫ তিবি ৫ টি মতিউে যেষ কবর
যেবেি। সপ্তাবের োতক ২ তিি তেতভন্ন যকাবেি আন্সার সেভ করুি। তরতভেি তিি ।

Inviting You to Join Our Group Study Platform

“We believe these resources will be unparalleled”

Click Icon to Join with Us

Open Your Camera → Select QR code Scanner

◼◼◼ Let’s Start the Module!!!


Module 3: PROCESSOR

• The CPU in a computer is brain that does all the processing.


• It is the most important component that determines the performance of your computer.
• It is installed in a socket on the motherboard.

All the Processors regardless of their origin or type perform a basic instruction cycle that consists
of three steps named Fetch, decode and execute.

Fetch Decode Execute

Fetching an instruction from Decoding the instruction - Executing the instruction - the CPU
memory - supplying the interpreting the instruction and then carries out the required action.
address and receiving the reading and retrieving the required Each part of the CPU is activated to
instruction from memory data from their addresses carry out the instructions.

Processor performance can be affected by below parameters.

Most modern CPUs have multiple cores 4,6,8 to up to 32 and 64. Each core is
Core Count
like a CPU. Multiple cores within a CPU can execute multiple progs parallelly.

Hyper-Threading feature allows each core on the CPU to act as 2 cores. A 4 core
Hyper-threading
CPU with hyperthreading support will appear to have 4 x 2 = 8 cores. A more
support
generic term is multi-threading. simultaneously thereby making it faster

The clock speed is measured in cycles per second (Hz). CPU with a clock speed
Processor
of 2 gigahertz (GHz) can carry out two thousand million (or two billion) cycles per
Speed/Clock Cycle
second. The higher the clock speed the faster it can process instructions.

When the CPU is running heavy programs or applications it automatically


Turbo Boost/Core increases its frequency upwards to do more processing in the same amount of
time. This is also called "algorithmic overclocking".

Overclocking Overclocking is the process of increasing the boost clock speed of a CPU beyond
support the limits set or specified by the manufacturer.

Most modern CPUs have 3 Levels of caches to store data needed while executing
L Caches
program instructions. These are named L1, L2, and L3, with the capacity
(L1, L2, L3)
increasing with each level.

The memory speed, MT/s, determines the speed of data transfer between the
Memory Speed CPU and ram. 3200 MT/s, means that it can potentially performs 3200 million
data transfers per second.

©100 Days challenge Join Our Job Prep group


Single Core:

• A core is a processing unit within a CPU.


• Chip that has a single CPU or processing unit is called a single-core processor.

• At any given time, there is only one process in execution.

Multi Core/ Parallel execution:

• A multicore CPU contains multiple CPU cores on a single chip.


• Each core can work on different tasks independently, improving overall performance.

Thread:

• Threads are the virtual components or cores.


• which divides the physical core of a CPU into virtual multiple cores.
• A single CPU core can have up-to 2 threads per core.

Hyper-Threading (Simultaneous Multithreading):

• Hyper-Threading technology allows a single CPU core to handle multiple threads.


• It enhances multitasking and can improve performance for certain workloads.

• Intel CPUs uses hyper-threading, and AMD CPUs uses simultaneous multithreading

ALU (Arithmetic Logic Unit):

• The ALU performs arithmetic and logical operations.

• It's responsible for mathematical calculations and comparisons.

Pipeline

• The CPU pipeline is a series of stages where instructions are processed.

• It improves instruction throughput by allowing multiple instructions to be in different


stages simultaneously.

©100 Days challenge Join Our Job Prep group


RISC stands for Reduced Instruction Set Computer. It is designed to reduce the
execution time by simplifying the instruction set of the computer. Using RISC processors,
each instruction requires only one clock cycle to execute results in uniform execution
time. This reduces the efficiency as there are more lines of code, hence more RAM is
needed to store the instructions. The compiler also has to work more to convert high-level
language instructions into machine code. Examples of RISC processors are SUN's SPARC,
PowerPC, Microchip PIC processors, RISC-V.

CISC stands for Complex Instruction Set Computer. It is designed to minimize the number of
instructions per program, ignoring the number of cycles per instruction. The emphasis is on
building complex instructions directly into the hardware. The compiler has to do very little work to
translate a high-level language into assembly level language/machine code because the length
of the code is relatively short, so very little RAM is required to store the instructions. Some of the
CISC Processors are −VAX, AMD, Intel x86 and the System/360.

RISC CISC

It is a Reduced Instruction Set Computer. It is a Complex Instruction Set Computer.

It emphasizes on software to optimize the It emphasizes on hardware to optimize the


instruction set. instruction set.

The execution time of RISC is very short. The execution time of CISC is longer.

It requires multiple register sets to store the It requires a single register set to store the
instruction. instruction.

RISC has simple decoding of instruction. CISC has complex decoding of instruction.

Uses of the pipeline are simple in RISC. Uses of the pipeline are difficult in CISC.

It uses a limited number of instructions that It uses a large number of instructions that
requires less time to execute the instructions. requires more time to execute the instructions.

It has fixed format instruction. It has variable format instruction.

©100 Days challenge Join Our Job Prep group


RISC has more transistors on memory registers. CISC has transistors to store complex
instructions.

The program written for RISC architecture Program written for CISC architecture tends to
needs to take more space in memory. take less space in memory.

RISC architecture can be used with high-end CISC architecture can be used with low-end
applications like telecommunication, image applications like home automation, security
processing, video processing, etc. system, etc.

It uses LOAD and STORE that are independent


It uses LOAD and STORE instruction in the
instructions in the register-to-register a
memory-to-memory interaction of a program.
program's interaction.

It is a hard-wired unit of programming in the Microprogramming unit in CISC Processor.


RISC Processor.

Example of RISC: ARM, PA-RISC, Power Examples of CISC: VAX, Motorola 68000 family,
Architecture, Alpha, AVR, ARC and the SPARC. System/360, AMD and the Intel x86 CPUs.

RISC Architecture CISC Architecture

©100 Days challenge Join Our Job Prep group


Pipelining is a technique for breaking down a sequential process into various sub-operations
and executing each sub-operation in its own dedicated segment that runs in parallel with all
other segments.

1. Non-Pipelined Execution-
All the instructions of a program are executed sequentially one after the other. A new instruction
executes only after the previous instruction has executed completely. This style of executing the
instructions is highly inefficient.
Example- Consider a program consisting of five instructions. In a non-pipelined
architecture, these instructions execute one after the other as-

If time taken for executing one instruction = t, then Time taken for executing n instructions = n x t

2. Pipelined Execution-
In pipelined architecture, Multiple instructions are executed parallelly. This style of executing the
instructions is highly efficient.
Instruction Pipelining- Instruction pipelining is a technique that implements a form of parallelism
called as instruction level parallelism within a single processor. A pipelined processor does not
wait until the previous instruction has executed completely. Rather, it fetches the next instruction
and begins its execution. In pipelined architecture,
• The hardware of the CPU is split up into several functional units.
• Each functional unit performs a dedicated task.
• These functional units are called as stages of the pipeline.

©100 Days challenge Join Our Job Prep group


Four-Stage Pipeline- In four stage pipelined architecture, the execution of each instruction is
completed in following 4 stages-
1. Instruction fetches (IF)
2. Instruction decodes (ID)
3. Instruction Execute (IE)
4. Write back (WB)
To implement four stage pipelines,
• The hardware of the CPU is divided into four functional units.
• Each functional unit performs a dedicated task.

Phase-Time Diagram -
Phase-time diagram shows the execution of instructions in the pipelined architecture. The
following diagram shows the execution of three(A,B,C) instructions in four stage pipeline
architecture.

©100 Days challenge Join Our Job Prep group


Let us learn how to calculate certain important parameters of pipelined architecture. Cycle time is the
value of one clock cycle. Let’s consider

• A pipelined architecture consisting of k-stage pipeline


• Total number of instructions to be executed = n
Non-Pipelined Cycle time

Cycle time = Sum of the delay offered by all stage

Pipelined Cycle time

Cycle time = Maximum delay offered by any stage + the delay of register

Frequency of Clock

Frequency of the clock (f) = 1 / Cycle time

Non-Pipelined Execution Time

Total number of instructions x Time taken to execute one instruction= n x k

Pipelined Execution Time


Time taken to execute first instruction + Time taken to execute remaining instructions
= 1 x k clock cycles + (n-1) x 1 clock cycle
= (k + n – 1) clock cycles
The following parameters serve as criterion to estimate the performance of pipelined execution-

• Speed Up
• Efficiency
• Throughput
1. Speed Up- It gives an idea of “how much faster” the pipelined execution is as compared to
non-pipelined execution. It is calculated as-

𝑁𝑜𝑛 pipelined execution time


Speed Up(s) =
Pipelined execution time

2. Efficiency- The efficiency of pipelined execution is calculated as-


Speed Up(s)
Efficiency(n) =
No of stages in Pipelined Architecture

No of boxes utilized in Phase time diagram


Efficiency(n) =
𝑇𝑜𝑡𝑎𝑙 No of boxes in Phase time diagram

3. Throughput- Throughput is defined as number of instructions executed per unit time. It is


calculated as-
Number of instructions executed
Throughput =
𝑇𝑜𝑡𝑎𝑙 time taken

©100 Days challenge Join Our Job Prep group


PRACTICE PROBLEMS BASED ON PIPELINING

Problem-01: Consider a pipeline having 4 phases with duration 60, 50, 90 and 80 ns. Given
latch delay is 10 ns. Calculate-
1) Pipeline cycle time
2) non-pipeline execution time
3) Speed up ratio
4) Pipeline time for 1000 tasks
5) Sequential time for 1000 tasks
6) Throughput

Given-

• Four stage pipeline is used


• Delay of stages = 60, 50, 90 and 80 ns
• Latch delay or delay due to each register = 10 ns
Part-01: Pipeline Cycle Time-
Cycle time = Maximum delay due to any stage + Delay due to its register
= Max { 60, 50, 90, 80 } + 10 ns
= 90 ns + 10 ns
= 100 ns
Part-02: Non-Pipeline Execution Time
Non-pipeline execution time for one instruction = 60 ns + 50 ns + 90 ns + 80 ns = 280 ns
Part-03: Speed Up Ratio
𝑁𝑜𝑛 pipelined execution time 280ns
Speed Up(s) = = = 2.8
Pipelined execution time 100ns
Part-04: Pipeline Time For 1000 Tasks-
Pipeline execution time for 1000 tasks
= Time taken for 1st task + Time taken for remaining 999 tasks
= 1 x 4 clock cycles + 999 x 1 clock cycle
= 4 x cycle time + 999 x cycle time
= 4 x 100 ns + 999 x 100 ns
= 400 ns + 99900 ns
= 100300 ns
Part-05: Sequential Time For 1000 Tasks-
Non-pipeline time for 1000 tasks
= 1000 x Time taken for one task
= 1000 x 280 ns
= 280000 ns
Part-06: Throughput-
Throughput for pipelined execution
= Number of instructions executed per unit time
= 1000 tasks / 100300 ns

©100 Days challenge Join Our Job Prep group


Problem-02: A four-stage pipeline has the stage delays as 150, 120, 160 and 140 ns respectively.
Registers are used between the stages and have a delay of 5 ns each. Assuming constant
clocking rate, what is the total time taken to process 1000 data items on the pipeline?

Solution- Given-
• Four stage pipeline is used
• Delay of stages = 150, 120, 160 and 140 ns
• Delay due to each register = 5 ns
• 1000 data items or instructions are processed
Cycle Time-
Cycle time = Maximum delay due to any stage + Delay due to its register
= Max {150, 120, 160, 140} + 5 ns
= 160 ns + 5 ns
= 165 ns
Pipeline Time To Process 1000 Data Items-
Pipeline time to process 1000 data items
= Time taken for 1st data item + Time taken for remaining 999 data items
= 1 x 4 clock cycles + 999 x 1 clock cycle
= 4 x cycle time + 999 x cycle time
= 4 x 165 ns + 999 x 165 ns
= 660 ns + 164835 ns
= 165495 ns
= 165.5 μs

Problem-04:
Problem-03: A non-pipelined
The stage delayssingle cycle processor
in a 4-stage operating
pipeline are 800, 500,at 100
400 MHz
and 300ispicoseconds.
converted into
Thea
synchronous pipelined processor with five stages requiring 2.5 ns, 1.5 ns, 2 ns, 1.5 ns and
first stage is replaced with a functionally equivalent design involving two stages with respective 2.5
ns respectively.
delays 600 and 350 The delay of the latches is 0.5 sec. What is the speed up of the pipeline
picoseconds.
processor for a large number
The throughput increase of the ofpipeline
instructions?
is _____%.
Non-Pipelined
Execution TimeProcessor
in 4 Stage- Pipeline- Pipelined
ExecutionProcessor -
Time in replace Stage Pipeline-
Frequency
Cycle time of the clock = 100 MHz. Cycle time
= Max time
Cycle delay=in1/stage
freq =+ 1Delay
/ (100due to its
MHz) register
= 0.01 μs =
= Max
Max delay
delay in
in stage
stage ++ Delay
Delay due
due to
to its
its register
register
= Max {800, 500, 400, 300} + 0 = Max {600, 350, 500, 400, 300} +
= Max {2.5, 1.5, 2, 1.5, 2.5} + 0.5 ns0
Execution time
= 800 picoseconds = 600ns
picoseconds
= 2.5 + 0.5 ns = 3 ns
Non-pipeline
Throughput execution time to process 1 instruction Throughput
Execution time
=1 clock cycle
Number = 0.01 μs executed
of instructions = 10 ns per unit time = Number of instructions
pipeline execution time toexecuted
process 1per unit time
instruction
= 1 instruction / 800 picoseconds = 1 instruction / 600
= 1 clock cycle = 3 nspicoseconds

Throughput
Speed up- Increase-
=
= {(Final throughput
Non-pipeline – Initial
execution timethroughput) / Initial throughput}
/ Pipeline execution time x 100
= ((1/600 -
= 10 ns / 3 ns1/800) / (1/800)) * 100
=
= 33.33 %
3.33 Ans

©100 Days challenge Join Our Job Prep group


Problem-05: We have 2 designs D1 and D2 for a synchronous pipeline processor. D1 has 5
stage pipelines with execution time of 3 ns, 2 ns, 4 ns, 2 ns and 3 ns. While the design D2 has 8
pipeline stages each with 2 ns execution time. How much time can be saved using design D2
over design D1 for executing 100 instructions?

D1-Pipelined Processor - D2-Pipelined Processor -


Cycle time Cycle time
= Max delay in stage + Delay due to its register = Max delay in stage + Delay due to its register
= Max {3, 2, 4, 2, 3} + 0 = 2 ns + 0 [each stages delay 2 ns]
= 4 ns = 2 ns
Execution Time For 100 Instructions Execution Time For 100 Instructions

Time taken for 1st instruction + Time taken for Time taken for 1st instruction + Time taken for
remaining 99 instructions remaining 99 instructions

= 1 x 5 clock cycles + 99 x 1 clock cycle = 1 x 8 clock cycles + 99 x 1 clock cycle


= 5 x cycle time + 99 x cycle time = 8 x cycle time + 99 x cycle time
= 5 x 4 ns + 99 x 4 ns = 8 x 2 ns + 99 x 2 ns
= 20 ns + 396 ns = 16 ns + 198 ns
= 416 ns = 214 ns

Time saved-
= Execution time in design D1 – Execution time in design D2
= 416 ns – 214 ns
= 202 ns Ans
problem-06: The 5 stages of the processor have the following latencies. Assume that
when pipelining, each pipeline stage costs 20ps extra for the registers between pipeline
stages.
1. non-pipelined processor: what is the cycle time? What is the latency of an instruction? What is the
throughput?

2. Pipelined processor: What is the cycle time? What is the latency of an instruction? What is the
throughput?

Non-pipelined processor: Pipelined processor


Cycle time = Sum of the delay offered by all Cycle time = Maximum delay offered by any stage
stage + the delay of register
Cycle time = 300 + 400 + 350 + 550 + 100 Cycle time = 550 + 20 = 570 ps
Cycle time = 1700ps
Throughput = 1/570 inst/ps
Throughput = 1/1700 inst/ps
Latency = 5 * 570 = 2850ps
Latency = 1700ps
[Latency becomes CT*N where N is the number of
[The latency is the same as cycle time since it
takes the instruction one cycle to go from the stages as one instruction will need to go through
beginning of fetch to the end of writeback.] each of the stages and each stage takes one cycle.]

©100 Days challenge Join Our Job Prep group


যেভাবে প্রস্তুতি তিবেি
এই সপ্তাবে আপতি এই চাপ্টার টি পড়বেি টিক কবর যেবেি। সপ্তাবের ৫ তিবি ৫ টি মতিউে যেষ কবর
যেবেি। সপ্তাবের োতক ২ তিি তেতভন্ন যকাবেি আন্সার সেভ করুি। তরতভেি তিি ।

Inviting You to Join Our Group Study Platform

“We believe these resources will be unparalleled”

Click Icon to Join with Us

Open Your Camera → Select QR code Scanner

◼◼◼ Let’s Start the Module!!!


In the computer system, we need
computer memory to store various types of
data like text, images, video, audio,
documents, etc. The memory block is split
into a small number of components, called
cells. Each cell has a unique address to
store the data in memory, ranging from
zero to memory size minus one. For
example, if the size of computer memory is
64k words, the memory units have 64 *
1024 = 65536 locations or cells. The
address of the memory's cells varies from
0 to 65535.

Following are the different features of the memory system that includes:
1. Location: It represents the internal or external location of the memory in a computer.
2. Capacity: External devices' storage capacity is measured in terms of bytes, whereas the
internal memory is measured with bytes or words
3. Access Methods: Memory can be accessed through four modes of memory.
a. Direct Memory Address: access data directly from the main memory.
b. Sequential Access Method: read stored data sequentially from memory.
c. Random Access Method: randomly access data from memory. opposite of SAM.
d. Associative Access Method: optimizes search performance -directly access.
4. Unit of transfer: Transfer rate of bits that can be read or write in or out of the memory
devices. The transfer rate of bits is mostly equal to the word size. The transfer rate of unit
greater than a word or may be referred to as blocks.
5. Performance: The performance of memory is majorly divided into three parts.
1. Access Time: time taken by memory devices to perform a read or write operation
2. Memory Cycle Time: Total time required to access memory block and additional
required time before starting second access.
3. Transfer rate: It describes the transfer rate of data used to transmit memory to or
from an external or internal memory device.

©100 Days challenge Join Our Job Prep group


Primary memory is also known as the computer system's main memory that communicates
directly within the CPU. The primary memory is further divided into two parts:
1. RAM (Random Access Memory)
2. ROM (Read Only Memory)

Random Access Memory (RAM)

is one of the faster types of main memory accessed


directly by the CPU. It is the hardware in a computer
device to temporarily store data, programs or program
results. It is used to read/write data in memory until the
machine is working. It is volatile, which means if a power
failure occurs or the computer is turned off, the information
stored in RAM will be lost. All data stored in computer
memory can be read or accessed randomly at any time.
There are two types of RAM:
• SRAM (Static Random-Access Memory)
• DRAM (Dynamic Random-Access Memory)

©100 Days challenge Join Our Job Prep group


SRAM DRAM

It is a Static Random-Access Memory. It is a Dynamic Random Access Memory.

The access time of SRAM is low [10 ns] The access time of DRAM is high [90 ns]

It uses flip-flops to store each bit of It uses a capacitor to store each bit of
information. information.

It does not require periodic refreshing to It requires periodically refreshing to preserve


preserve the information. the information.

It is used in cache memory. It is used in the main memory.

The cost of SRAM is expensive. The cost of DRAM is less expensive.

It has a complex structure. Its structure is simple.

It requires low power consumption. It requires more power consumption.

Structure of Memory

©100 Days challenge Join Our Job Prep group


Formula 1 kilobyte (KB) = 2^10 bytes (1024 bytes)

Memory capacity ( bytes) 1 megabyte (MB) = 2^20 bytes (1,048,576 bytes)


𝑛𝑜 𝑜𝑓 𝑐ℎ𝑖𝑝 =
Chip capacity ( bytes) 1 gigabyte (GB)= 2^30 bytes (1,073,741,824
bytes
𝑛𝑜 𝑜𝑓 𝑏𝑖𝑡 = log2 (Addresable memory)

Problem: Which is the number of address lines needed for a 4 Mbit memory chip: a) organized
as 4Mx1; b) organized as 1Mx4?

a) 4 M = 222; hence the number of address lines is log2222 = 22 Ans


b) 1 M = 220; therefore, the number of address lines is log2220 = 20 Ans

Problem: A computer has 32 MB of memory. How many bits are needed to


address any single byte in memory?

The memory address space is 32 MB, or 225 MB = (25 ×220).


This means that we need log2 (225), or 25 bits, to address each byte. Ans

Problem: A computer has 128 MB of memory. Each word in this computer is eight bytes. How
many bits are needed to address any single word in memory?

The memory address space is 128 MB, which means 27 x 220 = 227.
However, each word is eight (23) bytes,
which means that we have 227/23 = 224 words.
This means that we need log2 (224) , or 24 bits, to address each word. Ans

Problem: A computer has 64 MB (megabytes) of memory. Each word is 4 bytes.


How many bits are needed to address each single word in memory??

If the computer has 64MB of memory, and each word is 4 bytes,


then there are 64/4 = 16MB= 224 words
This means that we need log2 (224) , or 24 bits, to address each word. Ans

A computer has 512MB of memory. Each word in this computer is 32bytes. How many bits are
needed to address any single word in memory?

This equals 512 MB / 32 = 2²4 words possible. To address all possible words
It will require log2 (224) bits = 24 bits to select a single word of memory. Ans

©100 Days challenge Join Our Job Prep group


Problem: How many chips are necessary to implement a 4 MBytes memory:
1) using 64 Kbit SRAM;
2) using 1Mbit DRAM;

Formula 1 kilobyte (KB) = 2^10 bytes (1024 bytes)

Memory capacity ( bytes) 1 megabyte (MB) = 2^20 bytes (1,048,576 bytes)


Chip capacity ( bytes) 1 gigabyte (GB)= 2^30 bytes (1,073,741,824 bytes

Memory capacity = 4 Mbytes = 4 ∗ 2^20 bytes= 222 bytes


Chip capacity = 64 Kbit = 26 /23 = 23 Kbytes = 213 bytes
Ans: no of chips need = 222/213 = 512 chips.

Memory capacity = 4 Mbytes = 4 ∗ 2^20 bytes= 222 bytes


Chip capacity = 1 Mbit = 220 /23 = 217 bytes
Ans: no of chips need = 222/217 = 32 chips.

Design of 512×8 RAM using 128×8 RAM

Memory capacity = 512 bytes = 29 bytes


Chip capacity = 128 bytes = 27 bytes
No. of chips need = 29/27 = 4 chips.

So, Chip select line = 2 for addressing 4 Chip


[Chip selection line= log2(4) =log2 (22)]
Decoder Size= nx2n
We need 4 O/P decoder for addressing 4 Chip
So, Decoder Size= 2x22 = 2x4

©100 Days challenge Join Our Job Prep group


Read-Only Memory (ROM) is a memory device or storage medium that is used
to permanently store information inside a chip. It is a read-only memory that can only read stored
information, data or programs, but we cannot write or modify anything. A ROM contains some
important instructions or program data that are required to start or boot a computer. It is a non-
volatile memory.

BIOS: A computer's basic input/output system (BIOS) is a program that's stored in nonvolatile
memory such as read-only memory (ROM) or flash memory, making it firmware. The BIOS
(sometimes called ROM BIOS) is always the first program that executes when a computer is
powered up.

MROM (Masked Read Only Memory):

Data is pre-configured by the integrated circuit manufacture at the time of manufacturing.


Therefore, a program or instruction cannot be changed by the user

PROM (Programmable Read Only Memory):

User can write any type of information or program only once using the special PROM
programmer or PROM burner device; after that, the data or instruction cannot be changed or
erased.

EPROM (Erasable and Programmable Read Only Memory):

Stored data can be erased and re-programmed only once in the EPROM memory. If we want to
erase any stored data and re-programmed it, first, we need to pass the ultraviolet light for 40
minutes to erase the data; after that, the data is re-created in EPROM.

EEPROM (Electrically Erasable and Programmable Read Only Memory):

EEPROM, the stored data can be erased and reprogrammed up to 10 thousand times, and the
data erase one byte at a time. Erase the stored data using a high voltage electrical charge and
re-programmed it.

Flash ROM:

Flash memory is a non-volatile storage memory chip that can be written or programmed in small
units called Block or Sector. Flash Memory is an EEPROM form of computer memory, and the
contents or data cannot be lost when the power source is turned off. It is also used to transfer
data between the computer and digital devices.

©100 Days challenge Join Our Job Prep group


RAM ROM

It is a Random-Access Memory. It is a Read Only Memory.

Read and write operations can be performed. Only Read operation can be performed.

Data can be lost in volatile memory when the Data cannot be lost in non-volatile memory
power supply is turned off. when the power supply is turned off.

It is a faster and expensive memory. It is a slower and less expensive memory.

Storage data requires to be refreshed in Storage data does not need to be refreshed in
RAM. ROM.

The size of the chip is bigger than the ROM The size of the chip is smaller than the RAM chip
chip to store the data. to store the same amount of data.

Type of RAM: DRAM and SRAM Type of ROM: MROM, EPROM, EEPROM

Scan this QR code to join our group


100 Days Job Prep Challenge 100 Days Job Prep Challenge
WhatsApp Group Telegram Group

©100 Days challenge Join Our Job Prep group


It is a permanent storage space to hold a large amount of data. Secondary memory is also
known as external memory on which the computer data and program can be saved on a
long-term basis. Unlike primary memory, secondary memory cannot be accessed directly
by the CPU. Instead of that, secondary memory data is first loaded into the RAM (Random
Access Memory) and then sent to the processor to read and update the data.
Features of Secondary Memory
▪ Its speed is slower than the primary/ main memory.
▪ Store data cannot be lost due to non-volatile nature.
▪ It can store large collections of different types, such as audio, video, pictures, text,
software, etc.
▪ All the stored data in a secondary memory cannot be lost because it is a
permanent storage area; even the power is turned off.
▪ It has various optical and magnetic memories to store data.

Hard Disk A hard disk is a computer's permanent storage device. Typically, it is located internally
on computer's motherboard that stores and retrieves data using one or more rigid fast rotating
disk platters inside an air-sealed casing

Floppy Disk is a secondary storage system that consisting of thin, flexible magnetic coating disks
for holding electronic data such as computer files which can store data up to 1.44 MB.

CD (Compact Disc) is an optical disk storage device, stands for Compact Disc, which can store
approximately 783 MB of data size. It uses laser light to read and write data from the CDs. CDs
are divided into three types, such as: CD-ROM (Compact Disc Read Only Memory, CD-R
(Compact Disc Recordable) CD-RW (Compact Disc Rewritable):

DVD Drive/Disc is an optical disc storage device, stands for Digital Video Display or Digital
Versatile Disc. DVD drives are divided into three types, such as DVD ROM (Read Only
Memory), DVD R (Recordable) and DVD RW (Rewritable or Erasable). The storing capacity of
data in DVD is 4.7 GB to 17GB.

Pen Drive is a portable device used to permanently store data and is also known as a USB flash
drive. It does not have any moveable part to store the data. The storing capacity of pen drives
from 64 MB to 128 GB or more.

©100 Days challenge Join Our Job Prep group


Primary Vs. Secondary Memory

Primary Memory Secondary Memory

It is known as temporary memory. Data can be It is known as a permanent memory Data cannot
access directly by the processor or CPU be accessed directly by processor or CPU.

limited storage capacity and stored data can be large storage capacity and non-volatile nature.
a volatile or non-volatile memory.

It is more costly and faster memory It is less costly and slower memory.

It required the power to retain the data in It does not require power to retain the data in
primary memory. Example: EPROM, PROM secondary memory. Example: CD, DVD, HDD
and cache memory.

HDD vs SSD

HDD uses magnetism, which allows you to store data on a rotating platter. It has a read/write
head that floats above the spinning platter for Reading and Writing of the data. The faster the
platter spins, the quicker an HDD can perform.

SSD: These are solid-state drives. It is a type of storage medium that does not have moving
parts. It lasts longer and performs better than the traditional hard disk drives.

mSATA SSD uses only the SATA interface.

M.2 SSDs support SATA or PCIe.

NVMe technology utilizes the PCIe bus, instead of the SATA bus

PATA vs SATA vs PCIe

PATA: It stands for Parallel Advanced Technology Attachment. It is a 40-pin connector. It is


expensive. The speed of data transfer is low. It consumes more power. The size of the cable is
big. It doesn’t come with hot swapping feature. External hard drives can’t be used with PATA.

SATA: It stands for Serial Advanced Technology Attachment. It is a 7-pin connector. It is cheap.
The speed of data transfer is high. It consumes less power. The size of the cable is small. It
comes with the hot swapping feature. External hard drives can be used with SATA.

PCIe: Peripheral component interconnects express. PCIe is also known as PCI Express. This is a
slot on the motherboard of a PC that is used to connect everything from graphics cards to solid-
state drives.

©100 Days challenge Join Our Job Prep group


is a small-sized chip-based computer memory that lies between the
CPU and the main memory. It is a faster, high performance and
temporary memory to enhance the performance of the CPU. It stores
all the data and instructions that are often used by computer CPUs.
It also reduces the access time of data from the main memory. It is
faster than the main memory, and sometimes, it is also called CPU
memory because it is very close to the CPU chip. The following are
the levels of cache memory.
1. L1 Cache: The L1 cache is also known as the onboard, internal, or primary cache. It is built
with the help of the CPU. Fastest cache size of the L1 cache varies from 8 KB to 128 KB.
2. L2 Cache: It is also known as external or secondary cache. It is built into a separate chip in a
motherboard, not built into the CPU. The size of the L2 cache may be 128 KB to 1 MB.
3. L3 Cache: L3 cache levels are generally used with high performance and capacity of the
computer. It is built into a motherboard. Slowest cache and the maximum size up to 8 MB.

The register memory is a temporary storage area for storing and transferring the data and the
instructions to a computer. It is the smallest and fastest memory of a computer. It is a part of
computer memory located in the CPU as the form of registers. The register memory is 16, 32 and
64 bits in size. It temporarily stores data instructions and the address of the memory that is
repeatedly used to provide faster response to the CPU.

Cookies: Cookies in a system refer to the small files of data and information (that might be)
useful to the visited websites. These include passwords, used browser, visited pages and
preferences, IP address, and many more. This way, every time a user loads any website, the
browser will immediately send the cookies to the server (the user is in). This way, the website
stays aware of the previous activities of the user on the internet. This step helps the websites
display favorable ads, cut the login time, load pages faster, display relatable content, and many
more.

©100 Days challenge Join Our Job Prep group


যেভাবে প্রস্তুতি তিবেি
এই সপ্তাবে আপতি এই চাপ্টার টি পড়বেি টিক কবর যেবেি। সপ্তাবের ৫ তিবি ৫ টি মতিউে যেষ কবর
যেবেি। সপ্তাবের োতক ২ তিি তেতভন্ন যকাবেি আন্সার সেভ করুি। তরতভেি তিি ।

Inviting You to Join Our Group Study Platform

“We believe these resources will be unparalleled”

Click Icon to Join with Us

Open Your Camera → Select QR code Scanner

◼◼◼ Let’s Start the Module!!!


Module 5: Cache Memory

Whenever any program has to be executed, it is first loaded in the main memory. The portion of the
program that is mostly probably going to be executed in the near future is kept in the cache memory. This
allows CPU to access the most probable portion at a faster speed.

Step-01: Register Step-02: Cache

Whenever CPU requires any word of When the required word is not found in the CPU
memory, it is first searched in the CPU registers, it is searched in the cache memory.
registers. Now, there are two cases Now, there are two cases possible-
possible-
• If the required word is found in the cache
• If the required word is found in the CPU memory, the word is delivered to the CPU.
registers, it is read from there. This is known as Cache hit.
• If the required word is not found in the • If the required word is not found in the cache
CPU registers, Step-02 is followed. memory, Step-03 is followed. This is known
as Cache miss.

Step-03: Main Memory

When the required word is not found in the cache memory, it is searched in the main memory. Now,
there are two cases possible-

• If the page containing the required word is found in the main memory, The page is mapped from
the main memory to the cache memory. This mapping is performed using cache mapping
techniques. Then, the required word is delivered to the CPU.
• If the page containing the required word is not found in the main memory, A page fault occurs.
The page containing the required word is mapped from the secondary memory to the main
memory. Then, the page is mapped from the main memory to the cache memory. Then, the
required word is delivered to the CPU.

©100 Days challenge Join Our Job Prep group


Cache Hit Ratio (CHR) is a measurement that monitors the efficiency of a cache system. This
term quantifies the proportion of requests a cache is able to fulfill successfully out of the total
requests it receives.
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑐ℎ𝑒 ℎ𝑖𝑡
𝐶𝑎𝑐ℎ𝑒 𝐻𝑖𝑡 𝑅𝑎𝑡𝑖𝑜 =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑐ℎ𝑒 ℎ𝑖𝑡 + 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑐ℎ𝑒 𝑚𝑖𝑠𝑠

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑐ℎ𝑒 𝑚𝑖𝑠𝑠


𝐶𝑎𝑐ℎ𝑒 𝑀𝑖𝑠𝑠 𝑅𝑎𝑡𝑖𝑜 =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑐ℎ𝑒 ℎ𝑖𝑡 + 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑐ℎ𝑒 𝑚𝑖𝑠𝑠

If an Execution of Program registers 39 cache hits and 2 cache misses in a given timeframe,
then the CHR is 39 divided by 41, or 0.951. As a percentage, this would be a cache hit ratio of
95.1%.
This indicates that the Execution of Program was able to deliver requested data from its cache
95.1% of the time.

©100 Days challenge Join Our Job Prep group


When the CPU required word is not found in the cache memory, it is searched in the main
memory. If the page containing the required word is found in the main memory, The page is
mapped from the main memory to the cache memory. Cache mapping defines how a block
from the main memory is mapped to the cache memory in case of a cache miss.

The following diagram illustrates the mapping process-

• Main memory is divided into equal size partitions called as blocks or frames.
• Cache memory is divided into same size as that of blocks called as lines or Words.
• During cache mapping, block of main memory is simply copied to the cache.

Cache mapping is performed using following three different techniques

Cache Mapping Techniques-

Direct Mapping Fully Associative Mapping K-way Set Associative Mapping

©100 Days challenge Join Our Job Prep group


In direct mapping, A particular block of main memory can map only to a particular line of
the cache.

Division of Physical Address-

In direct mapping, the physical address is divided as-

Formula

𝑁𝑜 𝑜𝑓 𝑏𝑖𝑡(𝑚𝑎𝑖𝑛 𝑀𝑒𝑚𝑜𝑟𝑦) = log2 (main memory size)

𝑁𝑜 𝑜𝑓 𝑏𝑖𝑡(𝑚𝑎𝑖𝑛 𝑀𝑒𝑚𝑜𝑟𝑦) = 𝑇𝑎𝑔 + 𝑙𝑖𝑛𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 + 𝐵𝑙𝑜𝑐𝑘 𝑜𝑓𝑓𝑠𝑒𝑡

𝑁𝑜 𝑜𝑓 𝑏𝑖𝑡(𝑏𝑙𝑜𝑐𝑘 𝑜𝑓𝑓𝑠𝑒𝑡) = 𝑙𝑜𝑔2 (block size) [𝐵𝑙𝑜𝑐𝑘 𝑠𝑖𝑧𝑒 = 𝐹𝑟𝑎𝑚𝑒 𝑠𝑖𝑧𝑒 = 𝐿𝑖𝑛𝑒 𝑠𝑖𝑧𝑒]

cache size cache size


𝑙𝑖𝑛𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 = = Block size → 𝐿𝑖𝑛𝑒 𝐼𝑛𝑑𝑒𝑥 = log2 (Line number)
line size

𝑇𝑎𝑔 𝑑𝑖𝑟𝑒𝑐𝑡𝑜𝑟𝑦 𝑠𝑖𝑧𝑒 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑎𝑔𝑠 𝑥 𝑇𝑎𝑔 𝑠𝑖𝑧𝑒(𝑏𝑖𝑡𝑠)

𝑇𝑎𝑔 𝑑𝑖𝑟𝑒𝑐𝑡𝑜𝑟𝑦 𝑠𝑖𝑧𝑒 = 𝐶𝑎𝑐ℎ𝑒 𝐿𝑖𝑛𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑥 𝑇𝑎𝑔 𝑠𝑖𝑧𝑒(𝑏𝑖𝑡𝑠)

©100 Days challenge Join Our Job Prep group


Problem-01: Consider a direct mapped cache of size 16 KB with block size 256 bytes. The size
of main memory is 128 KB. Find-
1. Number of bits in tag
2. Tag directory size
Solution-
Given-
• Cache memory size = 16 KB
• Block size = Frame size = Line size = 256 bytes
• Main memory size = 128 KB
We consider that the memory is byte addressable.

Number of Bits in Physical Address-


We have,
Size of main memory = 128 KB = 217 bytes
Thus, Number of bits in physical address = 17 bits

Number of Bits in Block Offset-


We have,
Block size = 256 bytes = 28 bytes
Thus, Number of bits in block offset = 8 bits

Number of Bits in Line Number-


Total number of lines in cache = Cache size / Line size
= 16 KB / 256 bytes
= 214 bytes / 28 bytes = 26 lines
Thus, Number of bits in line number = 6 bits

Number of Bits in Tag-


Number of bits in tag
= physical address – (line number + block offset)
= 17 bits – (6 bits + 8 bits) = 17 bits – 14 bits = 3 bits

Tag Directory Size-


Tag directory size
= Number of tags x Tag size
= Number of lines in cache x Number of bits in tag
= 26 x 3 bits = 192 bits = 24bytes

©100 Days challenge Join Our Job Prep group


Problem-02: Consider a direct mapped cache of size 512 KB with block size 1 KB. There are 7
bits in the tag. Find-
1. Size of main memory
2. Tag directory size
Solution-
Given-

• Cache memory size = 512 KB


• Block size = Frame size = Line size = 1 KB
• Number of bits in tag = 7 bits
We consider that the memory is byte
addressable.

Block size = 1 KB = 210 bytes


Total number of lines in cache = Cache size / Line size = 512 KB / 1 KB= 29 lines
Thus, Number of bits in line number = 9 bits
Number of bits in physical address = tag bit + line index + block offset
Thus, Number of bits in physical address = = 7 bits + 9 bits + 10 bits = 26 bits Ans
Thus, Size of main memory = 226 bytes = 64 MB Ans
Tag directory size = Number of tags x Tag size = Number of lines in cache x Tag size
Tag directory size = = 29 x 7 bits = 3584 bits = 448 bytes Ans

Problem 3: Suppose, we have a 16 KB of data in a direct mapped cache with 4-word


blocks. Determine the size of the tag, index and offset fields if we are using a 32-bit
architecture.
Solution-
Given-

• Cache memory size = 16 KB


• Block size = 4 word = 4 Byte [assume 1 word =1 byte]
• Number of bits in physical address = 32 bits
𝑁𝑜 𝑜𝑓 𝑏𝑖𝑡(𝑏𝑙𝑜𝑐𝑘 𝑜𝑓𝑓𝑠𝑒𝑡) = 𝑙𝑜𝑔2 (block size) = log222 = 2 bit
Total number of lines in cache = Cache size / Line size = 16 KB / 4 B= 212 lines
Number of bits in line index = 12 bits
Number of bits in physical address = tag bit + line index + block offset
Number of bits in Tag = 32 bits - 12 bits - 2 bits- = 18 bits

©100 Days challenge Join Our Job Prep group


In fully associative mapping, A block of main memory can map to any line of the cache that is
freely available at that moment. This makes fully associative mapping more flexible than direct
mapping.

Example-
Consider the following scenario-

Here,

• All the lines of cache are freely available.


• Thus, any block of main memory can map to any line of the cache.
• Had all the cache lines been occupied, then one of the existing blocks will have to be
replaced.

Division of Physical Address


In fully associative mapping, the physical address is divided as-

©100 Days challenge Join Our Job Prep group


Problem-01: Consider a fully associative mapped cache of size 16 KB with block size 256
bytes. The size of main memory is 128 KB. Find- 1. Number of bits in tag 2. Tag directory size

Solution-
Given-
• Cache memory size = 16 KB
• Block size = Frame size = Line size = 256 bytes
• Main memory size = 128 KB
We consider that the memory is byte addressable.
Number of Bits in Physical Address-
We have,
Size of main memory = 128 KB = 217 bytes
Thus, No. of bits in physical address = 17 bits
Number of Bits in Block Offset-
We have,
Block size = 256 bytes = 28 bytes
Thus, Number of bits in block offset = 8 bits
Number of Bits in Tag-
Tag bit = Number of bits in physical address – Number of bits in block offset
= 17 bits – 8 bits = 9 bits
Thus, Number of bits in tag = 9 bits

Number of Lines in Cache-


Total number of lines in cache
= Cache size / Line size = 16 KB / 256 bytes = 214 bytes / 28 bytes = 26 lines
Tag Directory Size-
Tag directory size
= Number of tags x Tag size
= Number of lines in cache x Number of bits in tag = 26 x 9 bits = 576 bits = 72 bytes
Thus, size of tag directory = 72 bytes

©100 Days challenge Join Our Job Prep group


Problem-02: Consider a fully associative mapped cache of size 512 KB with block size 1 KB.
There are 17 bits in the tag. Find-Size of 1. Main memory 2. Tag directory size

Solution-
Given-

• Cache memory size = 512 KB


• Block size = Frame size = Line size = 1 KB
• Number of bits in tag = 17 bits
We consider that the memory is byte addressable.

Number of Bits in Block Offset-


We have,
Block size = 1 KB = 210 bytes
Thus, Number of bits in block offset = 10 bits
Number of Bits in Physical Address-
Number of bits in physical address
= Tag bit + Block offset bit
= 17 bits + 10 bits = 27 bits
Thus, Number of bits in physical address =
27 bits

Size of Main Memory-


We have,
Number of bits in physical address = 27 bits
Thus, Size of main memory = 227 bytes = 128 MB
Number of Lines in Cache-
Total number of lines in cache = Cache size / Line size = 512 KB / 1 KB
= 512 lines = 29 lines
Tag Directory Size-
Tag directory size
= Number of tags x Tag size = Number of lines in cache x Number of bits in tag
= 29 x 17 bits = 8704 bits = 1088 bytes
Thus, size of tag directory = 1088 bytes

©100 Days challenge Join Our Job Prep group


In the case of k-way set associative mapping, the cache lines get grouped into various sets
where all the individual sets consist of k number of lines. Here, a certain main memory block
can map to only a particular cache set.

The formula for K-way is as follows:


• Physical address = set + tag + block offset
• K-set associative cache size = number of sets x lines per set x size of line.
• The total number of sets = The total number of lines/K

Problem 1. When the cache is 64 kilobytes in size and the size of Block/line is 8 bytes, how
many bits would be required in order to represent the lines of a 4-way set-associative memory in
cache?
Solution: Total number of lines = Size of Cache / Size of Block = 64 kilobytes / 8 bytes = 213
bytes
Thus, 13 bits are required in order to represent various lines in the cache.
The total number of bits for a set = Total number of lines / K-way = 213 bytes / 22 = 211
Thus, a total number of 11 bits are required in order to represent various sets in a cache.

Problem 2. In case there are 10 bits for a given set in a 4-way set-associative where the block
size happens to be 16 kilobytes, then the cache size would be:
Solution:
The block size = 16 kilobytes = 214
K-set associative cache size = number of sets x total number of lines per set x size of line.
Size of cache = 210 x 4 x 214 bytes = 64 megabytes

©100 Days challenge Join Our Job Prep group


“Resources Will be updated Regularly”

Click Join

Get Update Resources and Join with Us

Open Your Camera → Select QR code Scanner

You might also like