Lecture 05 ARM Processors
Lecture 05 ARM Processors
Outline
• Introduction
• ARM Architecture Overview
• ARMv7-AR Architecture
• Programmer’s Model
• Memory Systems
• ARM System Design
• AMBA bus protocol
ARM Ltd
http://en.wikipedia.org/wiki/ARM_architecture
-
RISC Design Philosophy
“The architectural simplicity of ARM processors leads to very
small implementations, and small implementations mean
devices can have very low power consumption.
Implementation size, performance, and very low power
consumption are key attributes of the ARM architecture.”
ARM is RISC
• Uniform register file
• Load/store architecture
• Simple addressing
Outline
• Introduction
• ARM Architecture Overview
• ARMv7-AR Architecture
• Programmer’s Model
• Memory Systems
• ARM System Design
• AMBA bus protocol
Development of the ARM Architecture
v4 v5 v6 v7 v8
• Introduction
• ARM Architecture Overview
• ARMv7-AR Architecture
• Cortex A9 processor
• Programmer’s Model
• Memory Systems
• ARM System Design
• AMBA bus protocol
Cortex A9
• The ARM Cortex-A9 processor is the high
performance choice in a family of low power, cost-
sensitive devices.
ecode
struction
etch
Memory
Instruction Fetch
• Register Renaming
- Resolving data dependencies and unroll small loops by
hardware
Issue
• What is NEON?
• NEON is a wide SIMD data processing architecture
• 32 registers, 64 bit wide or 16 registers, 128 bit wide
• NEON instructions perform “Packed SIMD” processing
• Registers can be considered as “vector” of same data type
• Instructions perform the same operation in all lanes
Execute (4)
• Data prefetcher
• monitor cache line requests by processor and cache misses to
determine how much data to prefetch
• can prefetch up to 8 independent data streams
• prefetch and allocate data in the L1 data cache, as long as it keeps
hitting in the prefetched cache line
• When stop prefetching?
Memory Hierarchy
Cortex A9 MPcore
Accelerator
Snoop Control Unit (SCU) Coherence Port
L2 Cache
Main Memory
L1 caches
Cortex A9 MPcore
• Non-unified
CPU CPU CPU CPU - 32 bytes line length
- can be disabled independently
D$ I$ D$ I$ D$ I$ D$ I$ • 16, 32 or 64KB
• 4 - way associative
SCU ACP • support for Security Extensions
• I cache: VIPT
AXI RW
64-bit bus
AXI RW
64-bit bus • D cache: PIPT
L2 Cache - reduce number of caches flushes and refills
and save energy
Main Memory
L2 cache
Cortex A9 MPcore
L2 Cache
Main Memory
Snoop Control Unit (1)
Cortex A9 MPcore
AXI RW AXI RW
64-bit bus 64-bit bus
L2 Cache
Main Memory
Snoop Control Unit (2)
• SCU functions :
- maintain data cache coherency
- initiate L2 memory accesses
- arbitrate between processors’ simultaneous request for L2
accesses
- manages accesses from ACP
- Provides access to on-chip ROM and RAM
• does not support instruction cache coherency
Accelerator Coherence Port
• Introduction
• ARM Architecture Overview
• ARMv7-AR Architecture
• Cortex A9 processor
• Programmer’s Model
• Memory Systems
• ARM System Design
• AMBA bus protocol
Data Sizes and Instruction Sets
• ARM is a 32-bit load / store RISC architecture
• The only memory accesses allowed are loads and stores
• Most internal registers are 32 bits wide
• Most instructions execute in a single cycle
Mode Description
Supervisor Entered on reset and when a Supervisor call
(SVC) instruction (SVC) is executed
Entered when a high priority (fast) interrupt is
Exception modes
FIQ
raised
cpsr
spsr spsr spsr spsr spsr
• Syntax:
• LDR{<size>}{<cond>} Rd, <address>
• STR{<size>}{<cond>} Rd, <address>
• Example:
• LDRB r0, [r1] ; load bottom byte of r0 from the
; byte of memory at address in r1
Multiple Register Data Transfer
These instructions move data between multiple registers and memory
Syntax
<LDM|STM>{<addressing_mode>}{<cond>} Rb{!}, <register list>
4 addressing modes (IA) IB DA DB
Increment after/before r4
Decrement after/before r4 r1
r1 r0 Increasing
Base Register (Rb) r10 r0 r4 Address
r1 r4
r0 r1
r0
Example
LDM r10, {r0,r1,r4} ; load registers, using r10 base
Subroutines
• Implementing a conventional subroutine call requires two steps
• Store the return address
• Branch to the address of the required subroutine
• These steps are carried out in one instruction, BL
• The return address is stored in the link register (lr/r14)
• Branch to an address (range dependent on instruction set and width)
• Return is by branching to the address in lr
func1 func2
void func1 (void)
{
: :
BL func2 :
func2(); BX lr
:
:
}
Supervisor Call (SVC)
SVC{<cond>} <SVC number>
Dd
Destination
Register
Lane
NEON Coprocessor registers
• NEON has a 256-byte register file
• Separate from the core registers (r0-r15)
• Extension to the VFPv2 register file (VFPv3)
D2
Q1
• Enables register trade-offs D3
• Introduction
• ARM Architecture Overview
• ARMv7-AR Architecture
• Cortex A9 processor
• Programmer’s Model
• Memory Systems
• ARM System Design
• AMBA bus protocol
Memory Types
• Each defined memory region will specify a memory type
Off-chip
ARM Core Memory
On-chip
BIU
SRAM
D-Cache RAM
L1 L2 L3
• Cache Lockdown
• Prevents line Eviction from a specified Cache Way (discussed later)
• Streaming, Critical-Word-First
• Cache data is forwarded to the core as soon as the requested word is received in the
Linefill buffer
• Any word in the cache line can be requested first using a ‘WRAP’ burst on the bus
19 8 3
Cache line
7 6 5 4 3 2 1 0 d
Tag v Data d
Tag
Tag v v
DataLine 0 d
Tag
DataLine 0
v Data
d
d Cache has 8 words of data in each line
Counter
Line 1 Line 0
Each cache line contains Dirty bit(s)
Victim
LineLine
1 0
Line 1
Line 1
Line 254 Indicates whether a particular cache line
Line 30
LineLine
255 30 was modified by the ARM core
LineLine
31 30
Line 31
Line 31 Each cache line can be Valid or invalid
An invalid line is not considered when
v - valid bit d - dirty bit(s)
performing a Cache Lookup
Interrupt Controller
• Introduction
• ARM Architecture Overview
• ARMv7-AR Architecture
• Cortex A9 processor
• Programmer’s Model
• Memory Systems
• ARM System Design
• AMBA bus protocol
Example ARM-based system
• ARM core deeply embedded within an
Clocks and DMA
SoC Reset Controller Port
• Design can have both external and
FLASH
internal memories ARM
External
Processor
• Varying width, speed and size – core
Memory
AMBA AXI
depending on system requirements Interface
DEBUG
• Can include ARM peripherals SDRAM
nIRQ
On chip
• Can include on-chip memory from ARM CoreLink
nFIQ
memory
Artisan Physical IP Libraries Interrupt
Controller APB
• Elements connected using AMBA Other Bridge
AMBA APB
(Advanced Microcontroller Bus CoreLink
Architecture) Peripherals
Custom
• External debug and trace via JTAG or Peripherals
ARM based SoC
CoreSight interface
Buses 101
• A bus is a multiwire path on which related information is
delivered
– Address, data, and control buses
• Processor and peripherals communicate through buses
• Peripherals may be classified as:
– Arbiter, master, slave, or master/slave (bridge)
Master/
Master Slave
• Characteristics:
• Read and Write data channels are separate
• For each channel, address/control phases are separate from data
phases
• byte strobes enable unaligned data transfers
• multiple outstanding addresses can be issued
• transactions can be completed out-of-order
An Example AMBA System
High Performance
APB
ARM processor UART
High
Bandwidth AXI4 Timer
APB
External
Bridge
Memory Keypad
Interface
ARM Master 2
Inter-connection architecture
Master interface
Slave interface
AXI protocols- Write and Read Channels
• Separate Write
and Read
Channels
between master
and slaves
• Separate Address
and Data Buses
• Out of order
completion
• Data buses 8-
1024 bits
• Bursts 1-16 data
transfers
AXI4 transaction (1)