Isas and Y86-64: Samira Khan
Isas and Y86-64: Samira Khan
Isas and Y86-64: Samira Khan
Samira Khan
Agenda
• ISA vs Microarchitecture
• ISA Tradeoffs
• Y86-64 ISA
• Y86-64 Format
• Y86-64 Encoding/Decoding
LEVELS OF TRANSFORMATION
• ISA
• Agreed upon interface between software and
hardware
• SW/compiler assumes, HW promises
• What the software writer needs to know to
write system/user programs Problem
• Microarchitecture Algorithm
Program/Language
• Specific implementation of an ISA
ISA
• Not visible to the software
Microarchitecture
• Microprocessor Logic
• ISA, uarch, circuits Circuits
• “Architecture” = ISA + microarchitecture
3
ISA VS. MICROARCHITECTURE
• What is part of ISA vs. Uarch?
• Gas pedal: interface for “acceleration”
• Internals of the engine: implements “acceleration”
• Add instruction vs. Adder implementation
4
ISA
• Instructions
• Opcodes, Addressing Modes Data Types
• Instruction Types and Formats
• Registers, Condition Codes
• Memory
• Address space, Addressability, Alignment
• Virtual memory management
• Call, Interrupt/Exception Handling
• Access Control, Priority/Privilege
• I/O
• Task Management
• Power and Thermal Management
• Multi-threading support, Multiprocessor support
5
Example ISAs
• x86 — dominant in desktops, servers
• ARM — dominant in mobile devices
• POWER — Wii U, IBM supercomputers and some servers
• MIPS — common in consumer wifi access points
• SPARC — some Oracle servers, Fujitsu supercomputers
• z/Architecture — IBM mainframes
• Z80 — TI calculators
• SHARC — some digital signal processors
• Itanium — some HP servers (being retired)
• RISC V — some embedded
• …
Agenda
• ISA vs Microarchitecture
• ISA Tradeoffs
• Y86-64 ISA
• Y86-64 Format
• Y86-64 encoding/decoding
ISA: INSTRUCTION LENGTH
• Fixed length: Length of all instructions the same
+ Easier to decode single instruction in hardware
+ Easier to decode multiple instructions concurrently
-- Wasted bits in instructions (Why is this bad?)
-- Harder-to-extend ISA (how to add new instructions?)
8
ISA: ADDRESSING MODES
• Addressing mode specifies how to obtain an operand of an instruction
• Register
• Immediate
• Memory (displacement, register indirect, indexed, absolute, memory indirect,
autoincrement, autodecrement, …)
• x86-64: 10(%r11,%r12,4)
• ARM: %r11 << 3 (shift register value by constant)
• VAX: ((%r11)) (register value is pointer to pointer)
9
ISA: Condition Codes
cmpq %r11, %r12
je somewhere
• could do:
/* _Branch if _EQual */
beq %r11, %r12, somewhere
ISA-LEVEL TRADEOFFS: SEMANTIC GAP
• Where to place the ISA? Semantic gap
• Closer to high-level language (HLL) or closer to hardware control
signals? à Complex vs. simple instructions
• RISC vs. CISC vs. HLL machines
• FFT, QUICKSORT, POLY, FP instructions?
• VAX INDEX instruction (array access with bounds checking)
• e.g., A[i][j][k] one instruction with bound check
11
SEMANTIC GAP
High-Level Language
Software
Semantic Gap
ISA
Hardware
Control Signals
12
SEMANTIC GAP
High-Level Language
Software
Semantic Gap
ISA
CISC
RISC
Hardware
Control Signals
13
ISA-LEVEL TRADEOFFS:
SEMANTIC GAP
• Where to place the ISA? Semantic gap
• Closer to high-level language (HLL) or closer to hardware
control signals? à Complex vs. simple instructions
• RISC vs. CISC vs. HLL machines
• FFT, QUICKSORT, POLY, FP instructions?
• VAX INDEX instruction (array access with bounds checking)
• Tradeoffs:
• Simple compiler, complex hardware vs. complex compiler, simple
hardware
• Burden of backward compatibility
• Performance?
• Optimization opportunity: Example of VAX INDEX instruction: who
(compiler vs. hardware) puts more effort into optimization?
• Instruction size, code size
14
SMALL SEMANTIC GAP EXAMPLES IN VAX
• FIND FIRST
• Find the first set bit in a bit field
• Helps OS resource allocation operations
• SAVE CONTEXT, LOAD CONTEXT
• Special context switching instructions
• INSQUEUE, REMQUEUE
• Operations on doubly linked list
• INDEX
• Array access with bounds checking
• STRING Operations
• Compare strings, find substrings, …
• Cyclic Redundancy Check Instruction
• EDITPC
• Implements editing functions to display fixed format output
15
CISC vs. RISC
X:
REPMOVS MOV
ADD
x86: REP MOVS DEST SRC COMP
MOV
ADD
JMP X
• RISC motivated by
• Memory stalls (no work done in a complex instruction when
there is a memory stall?)
• When is this correct?
• Simplifying the hardware à lower cost, higher frequency
• Enabling the compiler to optimize the code better
• Find fine-grained parallelism to reduce stalls
17
Typical RISC ISA properties
• fewer, simpler instructions
• separate instructions to access memory
• fixed-length instructions
• more registers
• no instructions with two memory operands
• few addressing modes
Agenda
• ISA vs Microarchitecture
• ISA Tradeoffs
• Y86-64 ISA
• Y86-64 Format
• Y86-64 encoding/decoding
Y86-64 instruction set
• based on x86
• omits most of the 1000+ instructions
addq jmp pushq
subq jCC popq
andq cmovCC movq (renamed)
xorq call hlt (renamed)
nop ret
• much, much simpler encoding
Y86-64: movq
• r12 ß r12 + 1
• Invalid: addq $1, %r12
• Instead, need an extra register:
irmovq $1, %r11
addq %r11, %r12
Y86-64: condition codes
• ZF — value was zero?
• SF — sign bit was set? i.e. value was negative?
• this course: no OF, CF (to simplify assignments)
• set by addq, subq, andq, xorq
• not set by anything else
Y86-64: using condition codes
popq %rbx
%rbx ß memory[%rsp]
%rsp ß %rsp + 8
Agenda
• ISA vs Microarchitecture
• ISA Tradeoffs
• Y86-64 ISA
• Y86-64 Format
• Y86-64 encoding/decoding
Y86-64 Instruction Set #1
Byte 0 1 2 3 4 5 6 7 8 9
halt 0 0
nop 1 0
cmovXX rA, rB 2 fn rA rB
irmovq V, rB 3 0 F rB V
mrmovq D(rB), rA 5 0 rA rB D
OPq rA, rB 6 fn rA rB
ret 9 0
pushq rA A 0 rA F
popq rA B 0 rA F
Y86-64 Instruction Set #2 rrmovq 2 0
Byte 0 1 2 3 4 5 6 7 8 9
cmovle 2 1
halt 0 0
cmovl 2 2
nop 1 0
cmove 2 3
cmovXX rA, rB 2 fn rA rB
cmovne 2 4
irmovq V, rB 3 0 F rB V
cmovge 2 5
rmmovq rA, D(rB) 4 0 rA rB D
cmovg 2 6
mrmovq D(rB), rA 5 0 rA rB D
OPq rA, rB 6 fn rA rB
ret 9 0
pushq rA A 0 rA F
popq rA B 0 rA F
Y86-64 Instruction Set #3
Byte 0 1 2 3 4 5 6 7 8 9
halt 0 0
nop 1 0
cmovXX rA, rB 2 fn rA rB
irmovq V, rB 3 0 F rB V
ret 9 0
pushq rA A 0 rA F
popq rA B 0 rA F
Y86-64 Instruction Set #4
Byte 0 1 2 3 4 5 6 7 jmp
8 97 0
halt 0 0
jle 7 1
nop 1 0
jl 7 2
cmovXX rA, rB 2 fn rA rB je 7 3
irmovq V, rB 3 0 F rB V jne 7 4
rmmovq rA, D(rB) 4 0 rA rB D jge 7 5
mrmovq D(rB), rA 5 0 rA rB D jg 7 6
OPq rA, rB 6 fn rA rB
ret 9 0
pushq rA A 0 rA F
popq rA B 0 rA F
Encoding Registers
• Each register has 4-bit ID
%rax 0 %r8 8
%rcx 1 %r9 9
%rdx 2 %r10 A
%rbx 3 %r11 B
%rsp 4 %r12 C
%rbp 5 %r13 D
%rsi 6 %r14 E
%rdi 7 No Register F
Encoded Representation
addq rA, rB 6 0 rA rB
Exclusive-Or
xorq rA, rB 6 3 rA rB
Move Operations
Register è Register
rrmovq rA, rB 2 0 rA rB
Immediate è Register
irmovq V, rB 3 0 F rB V
Register è Memory
rmmovq rA, D(rB) 4 0 rA rB D
Memory è Register
mrmovq D(rB), rA 5 0 rA rB D
• Decrement %rsp by 8
• Store word from rA to memory at %rsp
• Like x86-64
popq rA B 0 rA F
ret 9 0
• Don’t do anything
halt 0 0
cmovXX rA, rB 2 fn rA rB
irmovq V, rB 3 0 F rB V
rmmovq rA, D(rB)
4 0 rA rB D
mrmovq D(rB), 5rA0 rA rB D
OPq rA, rB 6 fn rA rB
addOne: jXX Dest 7 fn Dest
irmovq $1, %rax call Dest 8 0 Dest
Y86-64 encoding
nop 1 0
cmovXX rA, rB 2 fn rA rB
irmovq V, rB 3 0 F rB V
rmmovq rA, D(rB)
4 0 rA rB D