PCI Bus Architecture
Introduction
• A computer bus is used to transfer data from one location or device on the
motherboard to the central processing unit where all calculations take place.
• Two different parts of a Bus
– Address bus‐transfers information about where the data should go.
– Data bus‐transfers the actual data
History
• PCI (Peripheral Component Interconnect) bus, originating from ISA
(Industry Standard Architecture) and VL (VESA Local) buses, was
introduced by Intel in 1992.
• It evolved into version 2.1; a 64-bit standard.
• Notable for being an industry standard, it grants direct system memory
access.
• Devices connect via a bridge to the front side bus, enhancing performance.
• Initially at 33MHz, it later increased to 66MHz.
• Its prominence surged with Windows 95 and the advent of "Plug and Play"
technology, leveraging PCI's architecture.
PCI System Bus Performance
• What makes the PCI bus one of the fastest I/O bus used today?
• Three features make this possible:
– Burst Mode: allows multiple sets of data to be sent
– Full Bus Mastering: the ability of devices on the PCI bus to perform
transfers directly.
– High Bandwidth Options: allows for increased speed of the PCI .
How PCI Compares to Other Buses
Table 1: How PCI compares to other buses
Bus Bus
Bus Type MB/sec Advantages Disadvantages
Width Speed
low speed,
low cost, Jumpers
ISA 16 bits 8MHz 16 MBps compatibility, & DIP switches.
widely used becoming
obsolete
very high speed,
incompatible with
Plug & Play,
PCI 64 bits 133 MHz 1 GBps older systems,
dominant board
can cost more
level bus
lower speed than
designed for
PCI, need adapter
industrial use, hot
for
swapping/Plug &
Compact PCI 64 bits 33MHz 132 MBps PC use,
Play, ideal for
incompatible
embedded
with older
systems
systems
Plug and Play
• Requirements for full implementation:
– Plug and Play BIOS
– Extended System Configuration Data (ESCD)
– Plug and Play operating system
• Tasks it automates:
– Interrupt Requests (IRQ)
– Direct Memory Access (DMA)
– Memory Addresses
– Input/Output (I/O) Configuration
How PCI Works:
Installing A New Device
Once a new device has been inserted into a PCI slot on the motherboard
Operating System Basic Input/Output System (BIOS) initiates Plug and Play
(PnP) BIOS.
1. PnP BIOS scans PCI bus for new hardware, requests identification if found.
2. Device sends ID to BIOS.
3. PnP checks Extended System Configuration Data (ESCD) for existing
config data.
4. PnP assigns settings to card stores in ESCD.
5. Windows checks PCI bus and ESCD for new hardware, alerts user.
6. Windows attempts driver installation prompts user for driver if needed.
How a Device Works
Example: PCI-based sound card
1. The sound card converts analog signals to digital.
2. Digital audio data travels through the PCI bus to the bus controller, which
prioritizes devices for CPU access or system memory.
3. In recording mode, the bus controller prioritizes sound card data, sending it
to the system bus via the bus bridge.
4. The system bus stores the data in system memory. After recording, the user
chooses to save data to the hard drive or keep it in memory for further
processing.
Future of PCI:
Requirements
• Support multiple market segments.
• Backwards compatible
• Scalable performance
• Advanced features including QoS, power management, and data integrity.
PCI C/BE commands:
All commands except for the Configuration commands are optional. All devices
other than host bridges are _required_ to respond to configuration commands.
C/BE (command/byte enables) Value Name
0b0000 Interrupt Acknowledge
0b0001 Special Cycle
0b0010 I/O Read
0b0011 I/O Write
0b0100 Reserved
0b0101 Reserved
0b0110 Memory Read
0b0111 Memory Write
0b1000 Reserved
0b1001 Reserved
0b1010 Configuration Read
0b1011 Configuration Write
0b1100 Memory Read Multiple
0b1101 Dual Address Cycle
0b1110 Memory Read Line
0b1111 Memory Write and Invalidate
Computer Memory System
Contents of Lecture:
Computer Memory System Overview
Characteristics of Memory Systems
The Memory Hierarchy
Cache Memory Principles
References for This Lecture:
William Stallings, Computer Organization and Architecture Designing For Performance,
9th Edition, Chapter 4 : Cache Memory , pages 112 to 123
Characteristics of Memory Systems:
The complex subject of computer memory is made more manageable if we classify memory
systems according to their key characteristics.
The most important of these are listed in following Table ( Table 4.1).
Location
Location refers to whether memory is internal and external to the computer.
Internal memory:
internal memory is often equated with main memory.
The processor requires its own local memory, in the form of registers
The control unit portion of the processor may also require its own internal
memory.
Cache is another form of internal memory.
External memory consists of peripheral storage devices, such as disk and tape, that
are accessible to the processor via I/O controllers.
Capacity
For internal memory, this is typically expressed in terms of bytes (1 byte = 8 bits) or
words. Common word lengths are 8, 16, and 32 bits.
External memory capacity is typically expressed in terms of bytes.
Unit of transfer
For internal memory, the unit of transfer is equal to the number of electrical lines
into and out of the memory Module. This may be equal to the word length, but is
often larger, such as 64, 128, or 256 bits.
For main memory, this is the number of bits read out of or written into memory at a
time. The unit of transfer need not equal a word or an addressable unit.
For external memory, data are often transferred in much larger units than a word,
and these are referred to as blocks.
Access Methods
Sequential access:
Start at the beginning and read through in order
Access time depends on location of data and previous location
The time to access an arbitrary record is highly variable (need high space to
store data).
e.g. tape
Direct access:
Individual blocks have unique address
Access is by jumping to vicinity plus sequential search
Access time depends on location and previous location
e.g. disk
Random access:
Each location has its own addressing mechanism
Individual addresses identify locations exactly
Access time is independent of location or previous access
e.g. RAM
Associative:
Each location has its own addressing mechanism
Data is located by a comparison with contents of a portion of the store
A word is retrieved based on a portion of its contents rather than its address
Access time is independent of location or previous access
e.g. cache
Performance:
Access time (latency): Time between presenting the address and getting the valid
data
For random-access memory, this is the time it takes to perform a read or
write operation, that is, the time from the instant that an address is presented
to the memory to the instant that data have been stored or made available for
use.
For non-random-access memory, access time is the time it takes to position
the read–write mechanism at the desired location.
Memory cycle time:
Time may be required for the memory to “recover” before next access
Cycle time = Access Time + recovery
Transfer rate:
This is the rate at which data can be (moved) transferred into or out of a
memory unit
For random-access memory, it is equal to 1/(cycle time).
Physical Types:
A variety of physical types of memory have been employed.
The most common today are semiconductor memory, magnetic surface memory, used
for disk and tape, and optical and magneto-optical
Physical Characteristics:
In a volatile memory, information decays naturally or is lost when electrical power is
switched off.
In a nonvolatile memory, information once recorded remains without deterioration
until deliberately changed; no electrical power is needed to retain information.
Magnetic-surface memories are nonvolatile.
Semiconductor memory (memory on integrated circuits) may be either
volatile or nonvolatile.
Nonerasable memory cannot be altered, except by destroying the storage unit.
Semiconductor memory of this type is known as read-only memory (ROM). Of
necessity, a practical nonerasable memory must also be nonvolatile.
Organization:
For random-access memory, the organization is a key design issue.
In this context, organization refers to the physical arrangement of bits to form words.
The obvious arrangement is not always used.
The Memory Hierarchy:
To achieve greatest performance, the memory must be able to keep up with the processor.
That is, as the processor is executing instructions, we would not want it to have to pause
waiting for instructions or operands.
For a practical system, the cost of memory must be reasonable in relationship to other
components.
As might be expected, there is a trade-off among the three key characteristics of memory:
capacity, access time, and cost.
A variety of technologies are used to implement memory systems, and across this spectrum
of technologies, the following relationships hold:
Faster access time, greater cost per bit
Greater capacity, smaller cost per bit
Greater capacity, slower access time
A typical hierarchy is illustrated in following Figure (Figure 4.1). As one goes down the
hierarchy, the following occur:
Decreasing cost per bit
Increasing capacity
Increasing access time
Decreasing frequency of access of the memory by the processor
Memory Hierarchy – Diagram:
During the course of the execution of a program, memory references tend to cluster.
e.g: In Loops or subroutine there are repeated references to a small set of instructions.
Accordingly, it is possible to organize data across the hierarchy such that the percentage of
accesses to each successively lower level is less than that of the level above.
Hierarchy List:
Registers
Cache
Main memory
Disk
Optical
Tape
Cache Memory Principles:
Cache memory is designed to combine the memory access time of expensive, highspeed
memory combined with the large memory size of less expensive, lower-speed memory.
Small amount of fast memory
Sits between main memory and CPU
Used to hold segments of program and data of the main memory.
May be located on CPU chip or module
It is transparent to the programmers.
Multi-level cache:
L2 cache is slower and typically larger than the L1 cache,
L3 cache is slower and typically larger than the L2 cache.
cache/main-memory structure:
A bove Figure (Figure 4.4) depicts the structure of a cache/main-memory system:
Main memory consists of up to 2 addressable words, with each word having a
unique n-bit address.
For mapping purposes, this memory is considered to consist of a number of fixed-
length blocks of K words each.
That is, there are M = 2 /K blocks in main memory.
The cache consists of m blocks, called lines. Each line contains K words, plus a tag
of a few bits.
The line size may be as small as 32 bits, with each “word” being a single byte; in this
case the line size is 4 bytes.
The number of lines is less than the number of main memory blocks (m << M).
At any time, some subset of the blocks of memory resides in lines in the cache.
If a word in a block of memory is read, that block is transferred to one of the lines of
the cache. Because there are more blocks than lines, an individual line cannot be
uniquely and permanently dedicated to a particular block.
Thus, each line includes a tag that identifies which particular block is currently being
stored.
The tag is usually a portion of the main memory address, as described later in this
section.
Cache Read operation:
CPU requests contents of memory location
Check cache for this data
If present (called a cache hit), get from cache (fast).
If not present (called a cache miss) , read required block from main memory to
cache.(slow)
Then deliver from cache to CPU
Cache includes tags to identify which block of main memory is in each cache slot