Risc V Asm Manual
Risc V Asm Manual
LANGUAGE
Programmer Manual
Part I
[Link]
contact @ shakti[dot]iitm[@]gmail[dot]com
2
Copyright
c 2020, Shakti @ IIT Madras.
All rights reserved. Information in this document is provided “as is”, with all faults.
Shakti @ IIT Madras expressly disclaims all warranties, representations, and conditions of
any kind, whether express or implied, including, but not limited to, the implied warranties or
conditions of merchant ability, fitness for a particular purpose and non-infringement.
Shakti @ IIT Madras does not assume any liability rising out of the application or use of any
product or circuit, and specifically disclaims any and all liability, including without limitation
indirect, incidental, special, exemplary, or consequential damages.
Shakti @ IIT Madras reserves the right to make changes without further notice to any products
herein.
3
List of Figures 7
List of Tables 8
1 Introduction 11
1.1 RISC-V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2 Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.1 Stack Pointer Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.2 Global Pointer Register . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.3 Thread Pointer Register . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2.4 Return Address Register . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2.5 Argument Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2.6 Temporary Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3 Privilege mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 Control and Status Registers (CSRs) . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.1 CSR Field Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5 CSR Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5.1 Register to Register instructions . . . . . . . . . . . . . . . . . . . . . . 16
1.5.2 Immediate Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5.3 Machine Information Registers . . . . . . . . . . . . . . . . . . . . . . . 20
3 Bitwise Instructions 43
3.1 RV 32I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.1.1 Register to Register Instructions . . . . . . . . . . . . . . . . . . . . . . 43
3.1.2 Immediate instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2 RV 64I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2.1 Register to Register Instructions . . . . . . . . . . . . . . . . . . . . . . 52
3.2.2 Immediate instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4 Arithmetic Instructions 55
4.1 RV 32I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4
5
6 Trap’s in RISC-V 83
6.1 Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.1.1 Illegal Instruction Exception . . . . . . . . . . . . . . . . . . . . . . . . 84
6.1.2 Instruction Address Misaligned Exception . . . . . . . . . . . . . . . . . 84
6.1.3 Load Address Misaligned Exception . . . . . . . . . . . . . . . . . . . . 84
6.1.4 Store Address Misaligned Exception . . . . . . . . . . . . . . . . . . . . 85
6.1.5 Instruction Access Fault . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.1.6 Load Access Fault . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.1.7 Store Access Fault . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.1.8 Break Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.1.9 Environment Call . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.2 Handling Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.2.1 Exception Handling Registers . . . . . . . . . . . . . . . . . . . . . . . . 89
6.2.2 MSTATUS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.2.3 MRET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.3 Understanding Stack in RISC-V . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.3.1 Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7 Interrupts 93
7.1 Timer Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.1.1 mtime Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.1.2 mtimecmp Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.1.3 Timer Interrupt flow chart . . . . . . . . . . . . . . . . . . . . . . . . . 94
7.2 External Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.3 Software Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
8 Assembler Directives 97
8.1 Object File section . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
8.1.1 .TEXT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
8.1.2 .DATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
8.1.3 .RODATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
8.1.4 .BSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
8.1.5 .COMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.1.6 .COMMON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.1.7 .SECTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
8.1.8 Miscellaneous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6
7
List of Tables
1 List Of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
8
9
GP Global Pointer
MIMPID Implementation ID
NA Not Applicable
SP Stack Pointer
TP Thread Pointer
RISC-V currently supports three privilege levels, viz.. Machine/Supervisor/User, with each level
having dedicated Control Status Registers (CSRs) for system state observation and manipulation.
In addition, RISC-V provides 31 read/write registers. While all can be used as general-purpose
registers, they have dedicated functions as well. RISC-V is divided into different categories based
on the maximum width of registers the architecture can support, for example, RV32 (RISC-V 32)
provides registers whose maximum width is 32-bits and RV64 (RISC-V 64) provides registers whose
maximum width is 64-bits. Processors with larger register widths can support instructions and data
of smaller widths. So an RV64 platform supports both RV32 and RV64.
Note: This book uses the term XLEN to refer to the platform register width, in bits.
PART-I of the RISC-V programmer’s manual, details RISC-V assembly instructions, registers in
use and the machine privilege level. Advanced concepts on Privilege levels, Memory Management
unit and Trap delegation will be dealt with in PART-II of the manual.
The objective of the RISC-V ASM (assembly language) programmer manual is to aid users in
writing extensive assembly programs and provide necessary information to write simple embedded
applications.
11
12
1.2 Registers
RISC-V architecture provides 31 user modifiable general-purpose (base) registers, namely, x1 to x31,
and with an additional read-only register x0, hard-wired to zero. One common use of x0 register is
to aid in initializing other registers to zero.
In comparison to other ISAs, RISC-V uses a larger number of integer registers which helps in
performance, where extensive use of loop unrolling and software pipelining is required.
• There is one stack pointer, one global pointer and one thread pointer register.
• A return address register (x1) to store the return address in a function call.
• One program counter (pc). pc holds the address of the current instruction.
The Base registers can hold either data or a valid address and are usually identified with the letter
’x’ prefixing the register number. A brief description of the registers and their additional functions
are as follows.
In RISC-V architecture, the x2 register is used as Stack Pointer (sp) and holds the base address
of the stack. When programming explicitly in RISC-V assembly language, it is mandatory to load
x2 with the stack base address while the C/C++ compilers for RISC-V, are always designed to use
x2 as the stack pointer. In addition, stack base address must aligned to 4 bytes. Failing which, a
load/store alignment fault may arise.
• As a base register for load and store instruction. In this case, the load/store address must be
4 byte aligned.
Data is allocated to the memory when it is globally declared in an application. Using pc-relative
or absolute addressing mode leads to utilization of extra instructions, thus increasing the code size.
In order to decrease the code size, RISC-V places all the global variables in a particular area which
is pointed to, using the x3 (gp) register. The x3 register will hold the base address of the location
where the global variables reside.
13
In multi-threaded applications, each thread may have its own private set of variables which are
called “thread specific variables”. This set of variables will be pointed to by the register x4 (tp).
Hence, each thread will have a different value in its x4 register.
The x1 (ra) register is used to save the subroutine return addresses. Before a subroutine call is
performed, x1 is explicitly set to the subroutine return address which is usually ‘pc + 4’. The
standard software calling convention uses x1 (ra) register to hold the return address on a function
call.
In RISC-V, 8 argument registers, namely, x10 to x17 are used to pass arguments in a subroutine.
Before a subroutine call is made, the arguments to the subroutine are copied to the argument
registers. The stack is used in case the number of arguments exceeds 8.
As the name suggests, the temporary registers are used to hold intermediate values during instruction
execution. There are seven temporary registers (t0 − t6) in RISC-V.
x1 ra Return Address
x2 sp Stack Pointer
x3 gp Global Pointer
x4 tp Thread Pointer
x9 s1 Saved Register
Inter-process security for a system necessitates the extent to which each process can use the system
resources, to maintain the system and data integrity. These processes are grouped into different
modes/levels, from low to high, and possess varying levels of privilege. Higher privilege modes have
a greater system leveraging capacity in addition to their own. A mode trying to access a region it
has no permission for, causes exceptions/traps. The three privilege levels are listed below,
User mode 0 00 U
Machine mode 3 11 M
Supervisor mode 1 01 S
With reference to the Table 1.2, the value field states the value of a privilege level. Encoding is
used to encode the privilege level in a CSR registers. Machine level has the highest privilege and
is also mandatory. Machine mode is inherently trusted, as it has low level access to the machine
implementation. All software by default start in Machine Mode. This book deals with the Machine
Mode. The other two modes are used for developing conventional applications and system software.
The Control and Status Register (CSR) are system registers provided by RISC-V to control and
monitor system states1 . CSR’s can be read, written and bits can be set/cleared. RISC-V provides
distinct CSRs for every privilege level. Each CSR has a special name and is assigned a unique
function. In addition to the machine level CSRs described in this section, M-mode code can access
the CSRs at lower privilege levels. Other privilege levels and related CSR’s are dealt with in part
2 of the manual.
Reading and/or writing to a CSR will affect processor operation. CSR’s are used in operations,
where a normal register cannot be used. For example, knowing the system configuration, handling
exceptions, switching to different privilege modes and handling interrupts are some tasks for which
a CSR is needed. The CSR cannot be read/written the way a general register can. A special set of
instructions called csr instructions are used to facilitate this process. CSR instructions require
an intermediate base register to perform any operation on CSR registers. Further, it is possible to
write immediate values to CSR registers. table1.3 lists the CSRs present in machine mode.
An attempt to access a CSR that is not visible in the current mode of operation results in privilege
violation. Similarly, in the current mode of operation, a privilege violation occurs when an attempt is
1 Here, system/processor refers to a computing system built using RISC-V ISA
15
Register Description
Register Description
misa Machine ISA
mhartid Machine Hardware thread ID
mvendorid Machine Vendor ID
mepc Machine exception program counter
marchid Machine Architecture ID
mie Machine interrupt enable
mimpid Machine Implementation ID
mip Machine interrupt pending
mstatus Machine Status
mtval Machine trap value
mcause Machine trap cause
mscratch Scratch register
mtvec Trap vector base address
made to write to a “read-only” labeled CSR. This attempt results in an illegal instruction exception.
In addition to restrictions on how a CSR register is accessed, fields within some registers come with
their own restrictions which are as listed as follows.
Read-only fields within some read-only and read/write registers, have been reserved for future use.
Such fields have been named as Reserved Writes Ignored, Reads Ignore Values (WIRI). A
read or write to these fields must be ignored. In case the entire CSR is a read-only register, an
attempt to write to the WIRI field will raise an illegal instruction exception.
Although, there are fields labeled “read/write” in some registers, they are reserved for future use and
are not available for software modifications. Such fields are called as Reserved Writes Preserve
Values, Reads Ignore Values (WPRI). Values returned on a reading such fields must be ignored,
while an attempt to write to the whole register containing such fields must preserve the original
value.
Some fields restrict the values that can be read/written to a field. Such values are called “legal”
values and are specified by the processor. Fields with this restriction are labeled as Write/Read
Only Legal Values (WLRL). A read on such a field returns a legal value if legal values are written
to it. Caution should be exercised to write only legal values as illegal writes may not return legal
values.
16
Some read/write fields offer the freedom of writing any value to it while reading them, will only
return values which are legal. Such fields are labeled as Write Any Values, Reads Legal Values
(WARL). Implementations will not raise an exception on writes of unsupported values to an WARL
field. Implementations must always deterministically return the same legal value after a given illegal
value is written.
CSR instructions are used to read and write to CSR registers. These instructions are broadly
classified as register-register and register-immediate instructions.
Register-register instructions perform indicated operations on two registers of the system and leaves
the result in the specified register.
[Link] CSRRC
Syntax
Alias
where,
rd destination register
csr csr register
rs1 source register 1
Description
The CSRRC instruction clears bits of the specified CSR. It can be used to simply read a CSR without
updating it. If (rs1 ) is x0, then no update to the CSR will occur. The previous value of the CSR
is copied to the destination register and then some selected bits of the CSR are cleared to 0, the
value in (rs1 ) is used as a bit mask to select which bits are to be cleared in the CSR. Other bits are
unchanged. This is an atomic operation.
Usage
[Link] CSRR
Syntax
where,
rd destination register
csr csr register
Description
The CSRR instruction is used to read the value of CSR. The previous value of the CSR is copied to
the destination register. This is an atomic read operation.
Usage
[Link] CSRRW
CSR Read and Write (CSRRW) is used to read from and/or write to a CSR.
Syntax
Alias
where,
rd destination register
rs1 source register 1
csr csr register
Description
The previous value of the CSR is copied to destination register and the value of the source register
(rs1 ) is copied to the CSR, this is an atomic write operation. To read a CSR without writing to it,
the source register (rs1 ) can be specified as x0. To write a CSR without reading it, the destination
register (rd) can be specified as x0. This is an atomic operation.
Usage
Exceptions
In lower privilege modes some of the CSRs are inaccessible. An attempt to read from or write to
those CSR may cause an illegal instruction exception.
18
[Link] CSRRS
CSR Read and Set Bits (CSRRS) sets bits in the specified CSR.
Syntax
Alias
where,
rd destination register
csr csr register
rs1 source register 1
Description
The CSRRS instruction can be used to simply read a CSR without updating it. If (rs1 )is x0, then no
update to the CSR will occur. The previous value of the CSR is copied to the destination register
and then some selected bits of the CSR are set to 0. The value in (rs1 ) is used as a bit mask to
select which bits are to be set in the CSR. Other bits are unchanged. This is an atomic operation.
Usage
[Link] CSRRCI
CSR Read and Clear Immediate (CSRRCI) clears any CSR using a zero-extended immediate value
(imm[4:0]) encoded in the rs1 field, instead of a value from an integer register.
Syntax
Alias
where,
rd destination register
csr csr register
imm immediate value
Description
The CSRRCI instruction makes bits[4:0] in any CSR particularly easy to modify. The previous value
of the CSR is copied to the destination register and then the CSR is cleared using immediate value.
The 5-bit field that is normally used for rs1 is zero-extended and used as the source value that is
moved into the CSR. This is an atomic operation.
19
Usage
Syntax
Alias
where,
rd destination register
csr csr register
imm immediate value
Description
The CSRRSI instruction makes bits[4:0] in any CSR particularly easy to set to “1”. The previous
value of the CSR is copied to the destination register and then some selected bits of the CSR are
set to 1. The 5-bit field that is normally used for rs1 is zero-extended and used as a bit mask to
select which bits are to be set in the CSR. This is an atomic operation.
Usage
[Link] CSRRWI
CSR Read and Write bits Immediate (CSRRWI) copies the old value of a csr, then overwrites the
csr with the specified immediate value.
Syntax
Alias
where,
rd destination register
csr csr register
imm immediate value
Description
The CSRRWI is a variant of the CSRRW instruction, which is used to overwrite to a csr with the
specified immediate value. The previous valueof the csr is copied to the destination register and
then the entire csr is written to. The 5-bit field that is usually used for source register (rs1 ) is
20
zero-extended and used as the immediate value that is moved into the register. This is an atomic
operation.
Usage
[Link] MISA
Machine Instruction Set Architecture (MISA) register lists the basic architecture of the RISC-
V processor.
2 XLEN-28 26
Description
MISA also informs the register width and the implementation of RISC-V extensions. Individual bits
in this CSR indicate the various options and extensions detailed by the RISC-V specification have
been implemented.
The register width of the machine is encoded in the most significant two bits of this CSR. The MISA
register shows the widest register width, the core is capable of running. For example, an RV64
machine may be capable of running as an RV32 machine.
Off the 32 bits, the lower-order 26 bits correspond to the letters A, B, . . . , Y, Z (“A”=bit 0, “B”=bit
1, etc.). Each bit will be set to indicate whether a particular RISC-V extension is implemented in
the core. For example, bit 5 will be set if the core supports the “F” extension.
21
Write NA NA
Set NA NA
[Link] MVENDORID
XLEN-1 7 6 0
Bank Offset
XLEN-7 7
Description
MVENDORID stores the Identity number assigned to a vendor by the semiconductor engineering trade
organization called JEDEC. Research and non-commercial implementations will have zero encoded.
Write NA NA
Set NA NA
Clear NA NA
[Link] MARCHID
Machine Architecture Id (MARCHID) identifies the particular architecture of the part and is es-
sentially the “part number” or “model number”.
XLEN-1 0
Architecture ID
XLEN
Figure 1.3: Machine Architecture ID Register (marchid).
Description
For commercial designs, this number is assigned by the vendor. For some non-commercial or open-
source projects, a number may be assigned by the RISC-V Foundation. Otherwise, this register will
contain zero.
22
Write NA NA
Set NA NA
Clear NA NA
[Link] MIMPID
XLEN-1 0
Implementation
XLEN
Description
Given a particular vendor (as identified in mvendorid) and a part/model number (as identified in
marchid), there may be several versions. It may be zero.
Write NA NA
Set NA NA
Clear NA NA
[Link] MHARTID
Machine Hardware Thread Id (MHARTID) identifies which core is executing.
XLEN-1 0
Hart ID
XLEN
Description MHARTID register does not reflect a higher level (eg., operating system) concept of thread.
In a single-core system with a single, simple FETCH-DECODE-EXECUTE pipeline, there only one
HART. In a multi-core system, where each core will execute a single flow-of-control, each core will
have its own HART. Each core’s HART will execute concurrently with the other cores’ HARTs.
23
It may be important to identify one thread as a “master thread”. One HART must be given an
ID of zero. The number of hardware threads is fixed but the application software will need an
unpredictable and changing number of threads. The OS will map traditional OS threads onto the
available hardware threads.
Write NA NA
Set NA NA
Clear NA NA
[Link] MSTATUS
Machine STATUS (MSTATUS) register details the machine status and helps in manipulating the state
of the machine. The mstatus register has several bits to operate the different states of the machine.
63 11 10 9 8 7 6 5 4 3 2 1 0
31 11 10 9 8 7 6 5 4 3 2 1 0
2 1 1 1 1 1 1 1 1 1
Description
MSTATUS contains a number of fields that can be read and updated. By modifying these fields, the
software can do things like enable/disable interrupts and change the virtual memory model.
For example, by writing to this CSR, the software can turn on virtual memory and page-table
translation. Two of the fields are only used for 64 and/or 128 bit machines. These two fields reside
in bits positions [35:32], so they are not even present in 32-bit machines.
24
[Link] MCAUSE
Machine CAUSE (MCAUSE) register contains the reason for the exception or interrupt that happened
in the system.
XLEN-1 XLEN-2 0
1 XLEN-1
When a trap is taken into Machine mode, MCAUSE is written by hardware with a code indicating the
event that caused the trap. The list of numeric codes are listed below,
[Link] MTVEC
Machine Trap Vector Base Address (MTVEC) register is used to store the address of the Trap
handler.
XLEN-1 2 1 0
The MTVEC register has the address of the trap handler. When a trap occurs (and is to be handled,
not ignored), the Hardware set’s the program counter (PC) set to the value in the MTVEC register.
This causes a jump to the first instruction in the trap handler routine.
[Link] MEPC
Machine Exception Program Counter (MEPC) is an XLEN-bit read/write register, which holds
the address of the instruction which resulted in a trap.
XLEN-1 0
mepc
XLEN
Description
When a trap (exception) is taken into machine mode, the virtual address of the instruction which
resulted in an exception, is written into the mepc register. It serves the same purpose for the
exception handler that the return address (ra) register serves for subroutine calls. There can be
certain traps, which can lead to system halt. In that case, MEPC cannot be used to return back.
Exceptions
MEPC register cannot hold a program counter (pc) value that would cause an Instruction Address
Misaligned exception.
[Link] MIE
Machine Mode Interrupt Enable (MIE) is an XLEN read/write register, containing interrupt en-
able bits. Bits which are read-only, are hardwired to 0.
15... ...12 11 10 9 8 7 6 5 4 3 2 1 0
4 1 1 1 1 1 1 1 1 1 1 1 1
Description
The MIE register has a list of bits to enable/disable interrupts. Using this register, individually
Timer, Software and External interrupts can be controlled. MIE. For the bits in the MIE register
to take effect, the MIE bit in MSTATUS register has to be set. In general, the MIE bit in MSTATUS
controls the interrupt at global level. The bits in MIE register control interrupt at local level.
[Link] MIP
Machine Mode Interrupt Pending (MIP) is an XLEN-bit read/write register which hols the in-
formation regarding interrupts which are pending.
15 12 11 10 9 8 7 6 5 4 3 2 1 0
4 1 1 1 1 1 1 1 1 1 1 1 1
Description
The MIP pending interrupt requests. The interrupt cause number, as reported in the MCAUSE,
corresponds with the same bit in the MIP register. An interrupt will be considered if the particular
bit is set both in MIP and MIE, and when the interrupts are globally enabled. Individual bits in MIP
maybe writable or read-only. When the bit is writable, the pending interrupt can be cleared once
the interrupt is addressed. In case the bits are read-only, the implementation must provide means
to clear the pending interrupt.
Exceptions
Since the non-maskable interrupt is implicit, when executing the non-maskable interrupt (NMI)
handler, it is not made visible in MIP.
[Link] MTVAL
The Machine Trap Value (MTVAL) register holds exception specific information.
XLEN-1 0
mtval
XLEN
Description
When an exception is encountered, this register can hold exception-specific information to assist
software in handling the trap. In the case of errors in the load-store unit MTVAL holds the address of
the transaction causing the error. If this transaction is misaligned, the MTVAL holds the address of
the missing transaction part. In the case of illegal instruction exceptions, it holds the actual faulting
instruction. For all other exceptions, MTVAL register is 0.
28
[Link] MSCRATCH
A Scratch Register (MSCRATCH) for Machine Mode Trap Handler. This register allows us to store
the context of trap handlers in other privilege levels. This is of much use only in case of system
switching privilege modes.
XLEN-1 0
mscratch
XLEN
Description
• In order to prevent overwrite and lose of the previous values, when a machine mode trap
handler is invoked, the use of at least one general purpose register is needed.
• MSCRATCH gives the software a register loaded with a base value, which can subsequently be
used to save all remaining processor state.
• Mostly, it may contain a frame or stack pointer to the “register save area”.
Exceptions
MSCRATCH is a read/write Register, which is never used directly by the hardware. It only serves as
an XLEN bit temporary scratch space to be used by the machine mode software. It is protected
from other privilege modes and can be accessed without destroying contents of any register using
CSR swap instructions.
Load and Store instructions
2 chapter
This section of manual covers the memory access instructions available in RISC-V Architecture.
There are different instructions available for 8 bit, 16 bit, 32 bit and 64 bit access.
2.1 RV 32I
RV32I deals with the 32 bit instruction that are used for load and store operations. The instructions
are broadly classified as register-register and immediate instructions
Load-store instructions transfer data between memory and processor registers. The LW instruction
loads a 32-bit value from memory into the destination register (rd). LH loads a 16-bit value from
memory, then sign-extends to 32-bits before storing in rd. LHU loads a 16-bit value from memory
but then zero extends to 32-bits before storing in rd. LB and LBU are for 8-bit values. The SW, SH,
and SB instructions store 32-bit, 16-bit, and 8-bit values from the low bits of register to memory.
The load or store address should always aligned for each data type (i.e., on a four-byte boundary
for 32-bit accesses, and a two-byte boundary for 16-bit accesses). The processor will generate a
misaligned access, if the addresses are not aligned properly. If the load or store instruction tries
to access an invalid memory, a load/store access fault is generated. An invalid memory can arise
because of PMP access controls or unavailable memory address.
29
30
[Link] LB
The Load Byte (LB) instruction, moves a byte from memory to register. The instruction is used
for signed integers.
Syntax
lb rd, imm(rs1 )
where,
rd destination register
imm immediate data
rs1 source register 1
Description
The LB is a data transfer instruction, defined for 8-bit values. It works with signed integers and
places the result in the LSB of rd and fills the upper bits of rd with copies of the sign bit.
Usage
[Link] LBU
The Load Byte, Unsigned (LBU) instruction, moves a byte from memory to register. The instruc-
tion is used for unsigned integers.
Syntax
where,
rd destination register
imm immediate data
rs1 source register 1
Description
The LBU instruction, is defined for 8-bit values. It works with unsigned integers and places the result
in the LSB of rd and zero-fills the upper bits of rd.
Usage
[Link] LH
In RISC-V 16-bit numbers are known as half-words and the Load Half-Word signed (LH) instruc-
tion, loads a half-word from memory to register. The instruction is used for signed integers.
31
Syntax
lh rd, imm(rs1 )
where,
rd destination register
imm immediate data
rs1 source register
Description
The LH instruction, treats the half-word as a signed number and loads a half-word from memory,
placing it in the rightmost 16-bits of a register rd while the leftmost 48-bits of the register rd are
sign extended.
Usage
[Link] LHU
Load Half-Word Unsigned (LHU) instruction, loads a half-word from memory to register. The
instruction is used for unsigned numbers.
Syntax
where,
rd destination register
imm immediate data
rs1 source register 1
Description
The LHU instruction, treats the half-word as an unsigned number and loads it from memory, placing
it in the rightmost 16-bits of a register rd while the leftmost 48-bits of the register rd are filled with
zeros.
Usage
[Link] LW
The Load Word (LW) instruction, moves a word, 32-bit value, from memory to register. The in-
struction is used for signed values.
Syntax
lw rd, imm(rs1 )
where,
rd destination register
imm immediate data
rs1 source register 1
Description
The LW instruction, is defined for 32-bit values. It works with signed integers and places the result
in the LSB of rd and fills the upper bits of rd with copies of the sign bit.
Usage
[Link] SB
Store Byte (SB) instruction, stores 8-bit values from a register to memory.
Syntax
sb rs2 , offset(rs1 )
where,
Description
The SB is a store type instruction which stores 8-bit values from the low bits of a register rs2 to
memory. The low-order byte of the register rs2 is copied to memory while the rest of the register
is ignored and is unchanged. The address to which the byte will be stored to in the memory, is
calculated at run time by adding an offset to a rs1 .
Usage
[Link] SH
Store Half-word (SH) instruction, stores 16-bit values from a register to memory.
Syntax
sh rs2 , offset(rs1 )
where,
Description
The SH is a store type instruction which stores 16-bit values from the low bits of a register rs2
to memory. The low-order half-word of the register rs2 is copied to memory while the rest of the
register is ignored and is unchanged. The address to which the half-word will be stored to in the
memory, is calculated at run time by adding an offset to a base register.
Usage
[Link] SW
Store Word (SW) instruction, stores 32-bit values from a register to memory.
Syntax
sw rs2 , offset(rs1 )
where,
Description
The SW is a store type instruction which stores 32-bit values from the low bits of register rs2 to
memory. The word from the register rs2 is copied to memory. The address to which the word will
be stored to in the memory, is calculated at run time by adding an offset to a base register.
Usage
Immediate instructions are those which contain the actual data to be operated upon, rather than
the addresses of the data. It is directly encoded as part of an instruction.
[Link] LUI
The Load Upper Immediate (LUI) instruction, copies the 20-bit immediate value to the upper 20
bits of the destination register (rd) and resets the lower 12 bits to zero.
Syntax
where,
rd destination register
imm immediate Data
Description
The LUI instruction, copies the immediate value to the upper 20 bits of the destination register
(rd). The lower 12 bits of the destination register is reset to zero. This instruction is usually used,
when a register needs to be populated with a large value. The immediate value can be represented
in hexadecimal or decimal format. In a RV64 systems, the most significant bit is sign extended to
fill the most significant 32 bits (bits 63 - 32) [Link]. The destination registers can be any of the
31 base registers. The x0 register can be used as a source register only, but not as a destination
register.
Usage
# imm = 0x11000
lui x5, 0x11000 # x5 ←− 0x11000
Assuming x5 was zero before this instruction. x5 will have a value 0x11000000, after executing
above instruction.
# imm = 0x80011
lui x5, 0x80011 # x5 ←− 0x80011
Assuming x5 was zero before this instruction. In RV64 systems, x5 will have a value
0xffffffff80011000, after executing above instruction. This example, further demonstrates that
least 12 bits are always reset to zero.
[Link] AUIPC
Add Upper Immediate to PC (AUIPC) adds the 20-bit immediate value to the upper 20 bits of the
program counter (pc) and stores the result in the destination register (rd).
Syntax
where,
rd destination register
imm immediate value
Description
AUIPC is used to build pc-relative addresses. AUIPC forms a 32-bit temporary offset, by adding the
20-bit immediate value to the upper 20 bits of temporary offset, filling in the lower 12 bits with
zeros. The temporary offset is added to the pc, to form the pc-relative address. The result is placed
in the destination register (rd). In a 64 bit architecture, the temporary offset is sign extended and
added to pc. The destination registers can be any of the 31 base registers. The x0 register can be
used as a source register only, but not as a destination register.
Usage
Assuming pc is at 0x800000ff.
Another example needed, which demonstrates that least 12 bits are unaffected is needed.
2.2 RV 64I
RV 64I deals with the 64 bit instructions that are used for load and store operations. The instructions
are broadly classified as register-register and immediate instructions
Load-store instructions transfer data between memory and processor registers. The LD instruction
loads a 64-bit value from memory into the destination register (rd). The SD instructions store 64-bit
value in the register to memory.
The load or store address should always aligned for 64 bits. The processor will generate a misaligned
access, if the addresses are not aligned properly.
[Link] LD
The Load Double word (LD) instruction does the fetching of 64-bit value from memory and loads
into the destination register (rd).
Syntax
ld rd, offset(rs1 )
Description
A 64-bit value is fetched from memory and loaded into destination register, the memory address is
formed by adding the offset to the contents of (rs1 ). This instruction is available only for 64-bit
and 128-bit machines.
36
Usage
[Link] SD
The Store Double word (SD) instruction does the copying of 64-bit value from register (rs2 ) and
loads into the memory(rs1 ).
Syntax
sd rs2 , offset(rs1 )
Description
A 64-bit value is copied from register (rs2 ) and loaded into memory. The memory address is formed
by adding the offset to the contents of (rs1 ). For a 128-bit machine the upper bits of the register
are ignored. This instruction is available only for 64-bit and 128-bit machines.
Usage
2.2.2 LWU
The Load Word Unsigned (LWU) instruction does the fetching of 32-bit value from memory and
loads into the destination register (rd).
Syntax
Description
A 32-bit value is fetched from memory and moved into destination register, the memory address is
formed by adding the offset to the contents of (rs1 ). 32-bit registers machine don’t require either
signextension or zeroextension is necessary for value that is already 32 bits wide, therefore the
“signed load” instruction LW does the same thing as the “unsigned load” instruction LWU, making
LWU redundant. This instruction is available only for 64-bit and 128-bit machines.
Usage
RISC-V provides several pseudo-instructions which are simple to understand, easy to use and trans-
late or expand to their base instructions. Pseudo instructions supported by RISC-V have the format
shown as follows.
Where content of the source register is copied into the destination register, and is read as,
[Link] MV
Syntax
mv rd, rs1
Translation
where,
Usage
mv x6, x5 # x6 ←− x5
Description
Move (MV) instruction is a simple “Copy Register”, assembler pseudo-instruction which copies the
contents of one register to another register. This assembler pseudo-instruction translates to add
immediate ADDI instruction. This instruction translates to addi x6, x5, 0. Assuming x5 has a value
3 and x6 is initialized to 0, after move instruction, x6 will have the value 3.
[Link] LI
The Load Immediate (LI) loads a register (rd) with an immeidate value given int the instruction.
Syntax
li rd, CONSTANT
Description
The LI instruction loads a register (rd) with an integer value. With this instruction both positive
and negative values can be loaded into the register.
38
Usage
li x5,100 # x5 ←−100
li x5,-170 # x5 ←−-170
[Link] LA
The Load Address (LA) loads the location address of the specified SYMBOL.
Syntax
la rd, SYMBOL
Description
where SYMBOL[31:12] is the upper 20 bits of SYMBOL, and SYMBOL[11:0] is the lower 12 bits of
SYMBOL.
Usage
.data
NumElements: .byte 6
.text
la x5, NumElements # x5 ←− addr[NumElements]
As an example, ’NumElements’ SYMBOL has a location address ’10010074’. When LA is given, this
address, ’10010074’ is loaded into register x5.
[Link] SEXT.W
Sign Extend Word (SEXT.W) instruction sign extends a 32-bit value to 64-bits or 128-bits.
Syntax
where,
Translation
Description
SEXT.W is an assembler pseudo-instruction which is available only for 64-bit and 128-bit machines.
This instruction sign extends the lower 32 bits of value in rs1 to 64 or 128 bits with the result being
placed in the register rd. SEXT.W is useful when a 32-bit signed value must be extended to a larger
value on 64-bit or 128-bit machine.
Usage
sext.w x6, x5 # x6 ←− x5
Assuming register x5 is loaded with value 0xfda961a6e88e974d, SEXT.W sign extends this value to
0xffffffffe88e974d, and is stored in x6. As this instruction translates to ADDIW, the sign extension
translates to, x6 = x5+0
[Link] NEG
Syntax
Translation
where,
Description
NEG instruction arithmetically negates the contents of rs1 and places the result in register rd. This
instruction translates to instruction Subtraction (SUB) where the contents of rs1 is subtracted
from zero.
Usage
neg x6, x5 # x6 ←− x5
Exception
Overflow can only occur when the most negative value is negated. Overflow is ignored.
[Link] NEGW
Negate Word (NEGW) instruction computes the two’s complement of a 32-bit value.
Syntax
Translation
where,
Description
Similar to instruction NEG, the NEGW is used to negate a 32-bit number stored in rs1 with the result
being stored in register rd. NEGW translates to SUBW where the 32-bit number in rs1 is subtracted
from zero.
Usage
negw x6, x5 # x6 ←− x5
Assuming register x5 is initialized to the value 168496141, negating x5 results in -168496141 which
is stored in x6. As this instruction translates to SUBW, the negation is computed as, x6 = 0-x5.
[Link] SEQZ
Set If Equal to Zero (SEQZ) instruction provides an indication if a register’s content is zero.
Syntax
Translation
where,
Description
RISC-V provides a simple pseudo-assembler instruction, SEQZ, to check if the contents of the register
rs1 , is zero or not. Indication is provided by a single bit value 0 if the register content is not 0 or
value 1, if the register content is zero. SEQZ performs an unsigned comparison against 1. Since the
comparison is unsigned, the only value less than 1 is 0. Hence if the comparison holds true, register
rs1 must contain 0.
Usage
Assuming register x5 contains 0, SEQZ instruction writes value 1 into register x6.
41
[Link] SNEZ
Set If Not Equal to Zero (SNEZ) instruction provides an indication if a register contains non-
zero value.
Syntax
Translation
where,
Description
SNEZ is a pseudo-assembler instruction that is used to check if the contents of a rs1 , is a non-zero
value. This instruction sets value of register rd to 1 if the rs1 is a non-zero value or sets rd to 0
otherwise. This instruction is implemented with an unsigned comparison against 0 using its base
instruction SLTU. Since it is an unsigned comparison, the only value less than 0 is 0 itself. Therefore,
if the less-than condition holds, the value in rs1 must not be 0.
Usage
Assuming rs1 (x5) is initialized to value 5, since this is greater than 0 value 1 is written into rd
(x6).
[Link] SLTZ
Set If Less Than Zero (SLTZ) is a signed instruction which examines if a register’s content is
less then zero and indicates accordingly.
Syntax
Translation
where,
Description
SLTZ is a signed pseudo-assembler instruction which translates to SLT, examines if the value in
register rs1 is less than zero. If register value found to be less than zero, a value 1 is stored in
register rd. Otherwise the value 0 is stored.
Usage
Assuming rs1 (x5) is initialized with the value -2. Since the value -2 is less than 0, rd (x6) is entered
with a value 1.
[Link] SGTZ
Set If Greater Than Zero (SGTZ) instruction examines if a register contains a value is greater
than zero and indicates it accordingly.
Syntax
Syntax
where,
Description
SGTZ is a signed pseudo-assembler instruction which examines if the value in register rs1 is greater
than zero. If found true, value 1 is stored to register (rd) or value 0 is stored otherwise.
Usage
Assume rs1 (x5) is initialized to 9, since this is greater than 0. Value 1 will be stored in rd (x6).
3.1 RV 32I
3
Bitwise Instructions
chapter
RV 32I deals with the 32 bit instruction that are used for bit manipulation. The instructions are
broadly classified as register-register and immediate instructions
Register operations involve both the operands as registers. The operation is performed on the value
in the register and result is stored in destination register (rd). The source and destination registers
can be any of the 31 base registers. The x0 register can be used as a source register only, but not
as a destination register. 32 bits of result is written to the destination register.
[Link] SLL
Shift Logical Left (SLL) performs logical left on the value in register (rs1 ) by the shift amount
held in the register (rs2 ) and stores in (rd) register.
Syntax
where,
rd destination register
rs1 source register 1
rs2 source register 2
43
44
Description
A SLL of one position moves each bit to the left by one. The low-order bit (the right-most bit) is
replaced by a zero bit and the high-order bit (the left-most bit) is discarded.
Usage
li x5, 4 # x5 ←− 2
li x3, 2 # x3 ←− 2
sll x1, x5, x3 # x1 ←− x5 << x3
[Link] SRL
Shift Logically Right (SRL) performs logical Right on the value in register (rs1 ) by the shift
amount held in the register (rs2 ) and stores in (rd) register.
Syntax
where,
rd destination register
rs1 source register 1
rs2 source register2
Description
A SRL of one position moves each bit to the Right by one. The high-order bit (the left-most bit) is
replaced by a zero bit and the low-order bit (the Right-most bit) is discarded.
Usage
li x5, 4 # x5 ←− 4
li x3, 2 # x3 ←− 2
srl x1, x5, x3 # x1 ←− x5 >> x3
[Link] SRA
Shift Right Arithmetic (SRA) performs right shift on the value in register (rs1 ) by the shift
amount held in the register (rs2 ) and stores in (rd) register.
Syntax
where,
rd destination register
rs1 source register 1
rs2 source register 2
45
Description
SRA directive performs an arithmetic shift right by 0 to 32 places. The vacated bits at the most
significant end are filled with zeros if the original value (the source operand) was positive. The
vacated bits are filled with ones if the original value was negative. This is known as “sign extending”
because the most significant bit of the original value is the sign bit for 2’s complement numbers,
i.e. 0 for positive and 1 for negative numbers. Arithmetic shifting therefore preserves the sign of
numbers.
Usage
li x5, 4 # x5 ←− 4
li x3, 2 # x3 ←− 2
sra x1, x5, x3 # x1 ←− x5 >> x3
[Link] OR
OR directive performs bit-wise logical OR operation between contents of register (rs1 ) and contents
of register (rs2 ) and stores in (rd) register.
Syntax
where,
rd destination register
rs1 source register 1
rs2 source register 2
Description
A bit-wise OR is a binary operation that takes two bit patterns of equal length and performs the
logical inclusive OR operation on each pair of corresponding bits.
Usage
[Link] XOR
XOR performs bit-wise binary Exclusive-OR operation on the source register operands.
Syntax
where,
rd destination register
rs1 source register 1
rs2 source register 2
Description
A bit-wise XOR is a binary operation that takes two bit patterns of equal length and performs the
logical inclusive XOR operation on each pair of bits.
Usage
[Link] NOT
Syntax
Translation
where,
Description
NOT instruction flips each bit of a register. This instruction translates to an exclusive OR operation
XORI and implements the negation. The result is loaded into the destination register (rd).
Usage
not x6, x5 # x6 ←− ∼ x5
Assuming register x5 (rs1 ) is initialized to value 1, on applying the NOT instruction on x5, 1 will be
xored (since XORI is the base instruction for XORI) with -1, resulting to -2 (stored in x6). Now let’s
assume x5 is initialized to value -1, on applying NOT to it results in a value 0.
[Link] SLT
Set Less Than (SLT) perform the signed and unsigned comparison between (rs1 ) and (rs2 ) and
stores the result in (rd).
Syntax
where,
rd destination register
rs1 source register 1
rs2 source register 2
Description
SLT perform signed and unsigned compares respectively, writing 1 to rd if rs1 < rs2 , 0 otherwise.
Usage
li x5, 3 # x5 ←− 3
li x3, 5 # x3 ←− 5
slt x1, x5, x3 # x1 ←− x5 < x3
[Link] SLTU
Set Less Than Unsigned (SLTU) perform the signed and unsigned comparison between (rs1 ) and
(rs2 ) and stores the result in (rd).
Syntax
where,
rd destination register
rs1 source register 1
rs2 source register 2
Description
SLTU sets rd to 1 if rs2 is not equal to zero, otherwise sets rd to zero .SLTU perform signed and
unsigned compares respectively, writing 1 to rd if rs1 ¡rs2 , 0 otherwise.
li x5, 3 # x5 ←− 3
li x3, 5 # x3 ←− 5
slt x1, x5, x3 # x1 ←− x5 < x3
48
Any instruction which contains an operand that is directly encoded as part of an instruction is
called an immediate instruction and the operand as immediate operand. This section covers shift
and logical operations with immediate operands as part of the instruction.
[Link] SLLI
Shift Logically Left Immediate (SLLI) performs logical left on the value in register (rs1 ) by
the shift amount held in the register (imm) and stores in (rd) register.
Syntax
where,
rd destination register
rs1 source register 1
imm immediate data
Description
A SLLI of one position moves each bit to the left by one. The low-order bit (the right-most bit) is
replaced by a zero bit and the high-order bit (the left-most bit) is discarded.
Usage
slli x1, x1, 1 # x1 ←− x1<<1
[Link] SRLI
Shift Logically Right Immediate (SRLI) performs logical Right on the value in register (rs1 )
by the shift amount held in the register (imm) and stores in (rd) register.
Syntax
where,
rd destination register
rs1 source register 1
imm immediate data
Description
A Shift Right Logical Immediate (SRLI) of one position moves each bit to the Right by one.
The most significant bit is replaced by a zero bit and the least significant bit is discarded.
Usage
[Link] SRAI
Shift Right Arithmetic Immediate (SRAI) performs right shift on the value in register (rs1 ) by
the shift amount held in the (imm) and stores in (rd) register.
Syntax
where,
rd destination register
rs1 source register 1
imm Immediate data
Description
SRAI is arithmetic shift right of a number by ’N’ places. The vacated bits at the most significant
end are filled with value of sign bit (0 for +ve sign and 1 for -ve sign). This is known as “sign
extending”.The most significant bit of the original value is the sign bit for 2’s complement numbers.
Usage
[Link] ANDI
AND Immediate (ANDI) performs binary operation between contents of register (rs1 ) and immediate
data (imm) and stores in (rd) register.
Syntax
where,
rd destination register
rs1 source register 1
imm immediate data
Description
A Bitwise ANDI is a binary operation that takes two bit patterns of equal length and performs the
logical inclusive AND Immediate operation over each bits. The source and destination registers can
be any of the 31 base registers. The x0 register can be used as a source register only, but not as a
destination register. 32 bits of result is written to the destination register.
Usage
[Link] ORI
OR Immediate (ORI) performs binary operation between register (rs1 ) and Immediate data (imm)
and stores in (rd) register.
Syntax
where,
rd destination register
rs1 source register 1
imm Immediate data
Description
A bitwise ORI is a binary operation that takes two bit patterns of equal length and performs the
logical inclusive OR operation on each pair of corresponding bits.
Usage
[Link] XORI
Exclusive-OR Immediate (XORI) performs bit-wise binary operation between register contents
(rs1 ) and Immediate data (imm) and stores in (rd) register.
Syntax
where,
rd destination register
rs1 source register 1
imm Immediate data
Description
A bitwise XORI is a binary operation that takes two bit patterns of equal length and performs
logical inclusive XOR operation on each pair of corresponding bits.
Usage
[Link] SLTI
Set Less than Immediate (SLTI) compares contents of register (rs1 ) and Immediate data (imm)
and sets value in (rd) register.
Syntax
where,
rd destination register
rs1 source register 1
imm Immediate data
Description
A SLTI is a signed comparison between contents of the specified registers. If the value in register is
less than the immediate value, value 1 is stored in destination register, otherwise, value 0 is stored
in the destination register.
Usage
[Link] SLTIU
Set Less Than Immediate Unsigned (SLTIU) does comparison between register contents (rs1 )
and Immediate data (imm) and sets value in (rd) register.
Syntax
where,
rd destination register
rs1 source register 1
imm Immediate data
Description
A SLTIU is a comparison to the contents of register using unsigned comparison. If the value in
register is less than the immediate value, the value 1 is stored in destination Register, otherwise,
the value 0 is stored in destination register.
Usage
3.2 RV 64I
RV 64I deals with the 64 bit instruction that are used for bit manipulation arithmetic operations.
The instructions are broadly classified as register-register and immediate instructions.
The RV64I register-register operations involve both the operands as 64 bit registers. The operation
is performed on the value in the register and result is stored in a destination register (rd). The
source and destination registers can be any of the 31 base registers. x0 is read only.
[Link] SLLW
Shift Left Logical Word (SLLW) performs logical left on the value in register (rs1 ) by the shift
amount held in the register (rs2 ) and stores in (rd) register.
Syntax
where,
rd destination register
rs1 source register 1
rs2 source register 2
Description
A SLLW of one position moves each bit to the left by one. The low-order bit (the right-most bit) is
replaced by a zero bit and the high-order bit (the left-most bit) is discarded.
Usage
li x3,5 # x3 ←− 5
li x1,3 # x1 ←− 3
sllw x1, x1, x3 # x1 ←− x1<<x3
[Link] SRLW
Shift Right Logically Word (SRLW) performs logical right on the value in register (rs1 ) by the
shift amount held in the register (rs2 ) and stores in (rd) register.
Syntax
where,
rd destination register
rs1 source register 1
rs2 source register 2
53
Description
A SRLW of one position moves each bit to the Right by one. The High-order bit (the left-most bit)
is replaced by a zero bit and the low-order bit (the Right-most bit) is discarded.
Usage
li x1, 3 # x1 ←− 3
li x3, 5 # x1 ←− 5
srlw x1, x1, x3 # x1 ←− x1>>x3
[Link] SRAW
Shift Right Arithmetic Word (SRAW) performs Arithmetic right on the value in register (rs1 )
by the shift amount held in the register (rs2 ) and stores in (rd) register.
Syntax
where,
rd destination register
rs1 source register 1
rs2 source register 2
Description
SRAW is an arithmetic shift right of a word by ’N’ places. The vacated bits at the most significant end
are filled with value of sign bit (0 for +ve sign and 1 for -ve sign). This is known as “sign extending”.
The most significant bit of the original value is the sign bit for 2’s complement numbers. Usage
li x1, 3 # x1 ←− 3
li x3, 5 # x1 ←− 5
sraw x1, x1, x3 # x1 ←− x1>>x3
[Link] SRLIW
Shift Right Logical Immediate Word (SRLIW) performs Logical right on the value in register
(rs1 ) by the shift amount held in the immediate data (imm) and stores in (rd) register.
Syntax
where,
rd destination register
rs1 source register 1
imm immediate data
54
Description
A SRLIW does one position move of each bit to the left by one. The low-order bit (the right-most
bit) is replaced by a zero bit and the high-order bit (the left-most bit) is discarded.
Usage
li x3,5 # x3 ←− 5
li x1,3 # x1 ←− 3
srliw x1, x1, x3 # x1 ←− x1>>x3
[Link] SRAIW
Shift Right Arithmetic Immediate Word (SRAIW) performs Arithmetic right on the value in
register (rs1 ) by the shift amount held in the Immediate (imm) and is stored in (rd) register.
Syntax
where,
rd destination register
rs1 source register 1
imm immediate data
Description
SRAIW is an arithmetic shift right immediate by 0 to 64 places. The vacated bits at the most
significant end are filled with zeros if the original value (the source operand) was positive. The
vacated bits are filled with ones if the original value was negative. This is known as ”sign extending”
because the most significant bit of the original value is the sign bit for 2’s complement numbers,
i.e. 0 for positive and 1 for negative numbers. Arithmetic shifting therefore preserves the sign of
numbers.
Usage
li x1, 3 # x1 ←− 3
sraiw x1, x1, x3 # x1 ←− x1>>x3
4.1 RV 32I
4
Arithmetic Instructions
chapter
RV 32I deals with the 32 bit instruction that are used for arithmetic operations. The source and
destination registers can be any of the 31 base registers. The x0 register can be used as a source
register only, but not as a destination register. The instructions are broadly classified as register-
register and immediate instructions
Register to register instruction involves, both the operands as a register. The contents of the register
holds the content of the operands.
[Link] ADD
Addition (ADD) adds the contents of two registers and stores the result in another register.
Syntax
where,
rd destination register
rs1 source register 1
rs2 source register 2
55
56
Description
The ADD instruction adds content of the two registers rs1 and rs2 and stores the resulting value in
rd register. The source and destination registers can be any of the 31 base registers. The x0 register
can be used as a source register only, but not as a destination register. Overflows are ignored and
the lower 32 bits of result is written to the destination register.
Usage
li x2, 3 # x2 ←− 3
li x3, 4 # x3 ←− 4
add x1, x2, x3 # x1 ←− x2 + x3
Assuming rs1 (x2) and rs2 (x3) contain values 3 and 4 respectively, an addition operation on them
will result in value 7 which will be stored in rd (x1). x1 will have a value 7.
[Link] SUB
Subtraction (SUB) subtracts contents of one register from another and stores the result in another
register.
Syntax
where,
rd destination register
rs1 source register 1
rs2 source register 2
Description
The SUB instruction subtracts content of the source register rs2 from rs1 and stores the value in
the register rd. Overflows are ignored and the lower XLEN bits of the result is written to rd. The
source and destination registers can be any of the 31 base registers. The x0 register can be used as
a source register only, but not as a destination register. The overflows as well as borrow are ignored
and the lower 32 bits of result is written to the destination register.
Usage
li x2, 4 # x2 ←− 4
li x3, 3 # x3 ←− 3
sub x1, x2, x3 # x1 ←− x2 - x3
[Link] MUL
Multiplication (MUL) calculates the product of the multiplier in source register 1 (rs1 ) and mul-
tiplicand in source register 2 (rs2 ), with the resulting product being stored in destination register
(rd).
57
Syntax
where,
rd destination register
rs1 source register 1
rs2 source register 2
Description
MUL calculates the product of two XLEN-bit operands in the source registers 1 and 2 (rs1 , rs2 ). This
instruction stores the less significant part of the result in the destination register and any overflow
is ignored.
Usage
[Link] MULH
Multiply signed and return upper bits (MULH)) calculates the product of signed values in
source registers (rs1 ) and (rs2 ) and stores result in the specified destination register (rd).
Syntax
where,
rd destination register
rs1 source register 1
rs2 source register 2
Description
MULH calculates the product of signed multiplier and signed multiplicand (present in the two source
registers specified respectively), and places the upper XLEN bits of the full 2*XLEN product, into
the destination register. MULH has to be used with MUL to get the complete 2*XLEN bits result.
Usage
li x1,-80 # x1 ←− -80
li x5,20 # x5 ←− 20
mulh x5, x5, x1 # x5 ←− High Bits[x5*x1]
[Link] MULHU
Multiply Unsigned and return upper bits (MULHU)) calculates the product of two unsigned
values in source registers rs1 and rs2 . The resulting value is placed in the specified destination
register (rd).
Syntax
where,
58
rd destination register
rs1 source register 1
rs2 source register 2
Description
MULHU multiplies two unsigned operands in the source registers and the most significant part of
result is stored in the destination register.
Usage
li x1,-80 # x1 ←− -80
li x5,20 # x5 ←− 20
mulhu x5, x5, x1 # x5 ←− High Bits [x5*x1]
[Link] MULHSU
Multiply Signed-Unsigned and return upper bits (MULHSU)) calculates the product of a
signed value in source register rs1 with an unsigned value in source register rs2 and the result-
ing product is stored in destination register, rd.
Syntax
where,
rd destination register
rs1 source register 1
rs2 source register 2
Description
MULHSU computes the product of the signed, most significant word of the multiplier and the unsigned,
least significant word of the multiplicand. The most significant part of the resulting product is stored
in the specified destination register. The resulting value is a signed value.
Usage
li x1,-80 # x1 ←− -80
li x5,20 # x5 ←− 20
mulhsu x5, x5, x1 # x5 ←− High Bits[x5*x1]
[Link] DIV
Division (DIV) performs division on the value in source register (rs1 ) with the value in the source
register (rs2 ) and stores quotient in (rd) register.
Syntax
where,
59
rd destination register
rs1 source register 1
rs2 source register 2
60
Description
DIV does the division of operands in source registers and stores quotient in the destination register.
Both operands and the result are signed values.
Usage
[Link] DIVU
Division Unsigned (DIVU) performs unsigned Division on the value in source register (rs1 ) by the
value in the source register (rs2 ) and stores quotient in the destination register (rd).
Syntax
where,
rd destination register
rs1 source register 1
rs2 source register 2
Description
DIVU does the division of unsigned operands in source registers and stores quotient in the destination
register. Both operands and the result are unsigned values.
Usage
[Link] REM
Reminder (REM) performs division on the value in source register (rs1 ) with the value in the source
register (rs2 ) and stores remainder in (rd) register.
Syntax
where,
rd destination register
rs1 source register 1
rs2 source register 2
Description
REM does the signed division of operands in source registers and stores the remainder in the desti-
nation register. Both operands and the result are signed values.
61
Usage
NOTE:
Sometime’s a programmer needs both quotient and remainder. In such cases it is recommended to
perform DIV first and REM later.
[Link] LI
Load Immediate (LI) load register rd with a value that is immediately available
Syntax
li rd, imm
where,
rd destination register
imm Immediate data
Description
The LI instruction loads a positive or negative value that is immediately available, without going
into memory. The value maybe a 16-bit or a 32-bit integer.
Usage
li x5, 24 # x5 ←− 24
[Link] ADDI
Add Immediate (ADDI) adds content of the source registers rs1 , immediate data (imm) and store
the result in the destination register (rd).
Syntax
where,
rd destination register
rs1 source register 1
imm Immediate data
Description
The ADDI instruction adds content of a source register with an absolute value and stores the result
in the destination register. Overflows are ignored and the lower 32 bits of result is written to the
destination register.
62
Usage
li x2,24 # x2 ←− 24
addi x1, x2,64 # x1 ←− x2 + 64
4.2 RV 64I
RV 64I deals with the 64 bit integer instructions that are used for arithmetic operations. The
instructions are broadly classified as register-register and immediate instructions.
The register operations involve both the operands as registers. The operation is performed on the
value in the register and result is stored in destination register (rd).
[Link] ADDW
Add Word (ADDW) adds content of the source registers (rs1 , rs2 ) and stores the result in the desti-
nation register (rd).
Syntax
where,
rd destination register
rs1 source register 1
rs2 source register 2
Description
The ADDW instruction adds content of the two source registers and stores the value in the destination
register. The overflows are ignored and the lower 64 bits of result is stored in destination register.
Usage
[Link] SUBW
Subtract Word (SUBW) subtracts content of the source registers (rs1 , rs2 ) and store the result in
the destination register (rd).
Syntax
where,
rd destination register
rs1 source register 1
rs2 source register 2
63
Description
The SUBW instruction subtracts content of the source register rs2 from rs1 and stores the value in
the destination register (rd). The overflows as well as borrow are ignored and the lower 64 bits of
result is written to the destination register.
Usage
[Link] REMU
Reminder Unsigned (REMU) performs division on the value in source register (rs1 ) with the value
in the source register (rs2 ) and stores remainder in (rd) register.
Syntax
rd destination register
rs1 source register 1
rs2 source register 2
Description
REMU does the division of operands in source registers and stores remainder in the destination register.
Both operands and the result are unsigned values.
Usage
Note:
Sometime’s a programmer needs both quotient and remainder. In such cases it is recommended to
perform DIV first and REM later.
[Link] MULW
Multiplication Word (MULW) directive multiplies contents of register rs1 with that of register rs2
and stores result in register rd. Only the lower order 32-bits of the result are used, which is sign
extended to the full length of the register.
Syntax
where,
rd destination register
rs1 source register 1
rs2 source register 2
64
Description
MULW does the multiplication of operands in source registers and stores result in the destination
register. Only the lower order 32-bits of the result are used the lower 32 bits are signed extended
to the full length of the register. This instruction is used to properly emulate 32-bit multiplication
on a 64-bit or 128-bit machine. Only the least-significant 32 bits of Reg1 and Reg2 can possibly
affect the result. If you want the upper 32-bits of the full 64-bit result use the MUL instruction on
a 64-bit machine.
Usage
[Link] DIVW
Divide Word (DIVW) performs Division on the value in source register (rs1 ) with the value in the
source register (rs2 ) and stores quotient in (rd) register.
Syntax
where,
rd destination register
rs1 source register 1
rs2 source register 2
Description
DIVW does the division of operands in source registers and stores quotient in the destination register.
Both operands and the result are signed values, only the low-order 32 bits of the operands are used
and the 32-bit result is signed-extended to fill the destination register.
Usage
[Link] DIVUW
Divide Unsigned Word (DIVUW) performs division on the value in source register (rs1 ) with the
value in the source register (rs2 ) and stores quotient in (rd) register.
Syntax
where,
rd destination register
rs1 source register 1
rs2 source register 2
65
Description
DIVUW does the division of operands in source registers and stores quotient in the destination register.
Both operands and the result are unsigned values, only the low-order 32 bits of the operands are
used and the 32-bit result is signed-extended to fill the destination register.
Usage
[Link] REMW
Reminder Word (REMW) performs Division on the value in source register (rs1 ) with the value in
the source register (rs2 ) and stores remainder in (rd) register.
Syntax
where,
rd destination register
rs1 source register 1
rs2 source register 2
Description
REMW does the division of operands in source registers and stores remainder in the destination register.
Both operands and the result are signed values. Only the low-order 32 bits of the operands are used
and the 32-bit result is signed-extended to fill the destination register.
Usage
NOTE:
Sometime, a programmer might need both quotient and remainder. In such cases it is recommended
to perform DIV first and REM later.
[Link] REMUW
Reminder Unsigned Word (REMUW) performs Division on the value in source register (rs1 ) with the
value in the source register (rs2 ) and stores remainder in (rd) register.
Syntax
where,
rd destination register
rs1 source register 1
rs2 source register 2
66
Description
REMUW does the division of operands in source registers and stores remainder in the destination
register. Both operands and the result are unsigned values. The least significant 32 bits of the
operands are used and the 32-bit result is signed-extended.
Usage
NOTE:
Sometime, a programmer might need both quotient and remainder. In such cases it is recommended
to perform DIV first and REM later.
Instructions which involve a 32-bit constant operand have the ”W” to specify 32-bit operations to
be performed on them.
[Link] ADDIW
Add Immediate Word (ADDIW) adds content of the source registers rs1 , imm and store the result
in the destination register (rd).
Syntax
where,
rd destination register
rs1 source register 1
imm Immediate data
Description
The ADDIW instruction adds content of the two source registers and stores the value in the destination
register. This instruction is only present in 64-bit and 128-bit machines. The operation is performed
using 32-bit arithmetic. The result is then truncated to 32-bits, signed-extended to 64 or 128-bits
and placed in destination register. The overflows are ignored and the lower 64 bits of result is
written to the destination register.
Usage
li x9,456 # x9 ←− 456
addiw x4, x9,123 # x4 ←− x9 + 123
5.1 Branch Instructions
Control Transfer Instructions
5 chapter
A branch instruction in a program causes the system to execute a different instruction sequence,
making the system deviate from its normal course of action of executing instructions in sequence.
Branches are useful for implementing logical constructs since the architecture allows compares and
dependent branches to be scheduled in the same cycle.
[Link] BEQ
Branch If Equal (BEQ) the contents of source register rs1 is compared with source register rs2 , if
found equal, the control is transferred to the specified label.
Syntax
where,
Description
The BEQ instruction compares contents of (rs1 ) is compared to the contents of (rs2 ). If equal,
control jumps. The target address is given as a PC-relative offset. More precisely, the offset is
sign-extended, multiplied by 2, and added to the value of the PC. The value of the PC used is the
67
68
address of the instruction following the branch, not the branch itself. The offset is multiplied by 2,
since all instructions must be half word aligned.
Usage
[Link] BNE
Branch If Not Equal (BNE) the contents of source register rs1 , is compared with source register
rs2 if they are not equal control is transferred to the label as mentioned.
Syntax
where,
Description
The BNE instruction compares contents of (rs1 ) is compared to the contents of (rs2 ). If not equal,
control jumps. The target address is given as a PC-relative offset.
Usage
[Link] BLT
Branch If Less Than (BLT) the contents of source register rs1 , is compared with contents of
source register rs2 . If (rs1 ) is less than (rs2 ) control is transferred to the label as mentioned.
Syntax
where,
Description
The BLT instruction compares contents of (rs1 ) is compared to the contents of (rs2 ). If (rs1 ) contents
is less than (rs2 )(signed comparison), control jumps. The target address is given as a PC-relative
offset.
69
Usage
[Link] BLTU
Branch If Less Than Unsigned (BLTU) the contents of source register rs1 , is compared with con-
tents of source register rs2 if (rs1 ) is less than (rs2 ) control is transferred to the label as mentioned.
Syntax
where,
Description
The BLTU instruction compares contents of (rs1 ) is compared with the contents of (rs2 ). If (rs1 )
contents is less than (rs2 ), (unsigned comparison) control jumps. The target address is given as a
PC-relative offset.
Usage
[Link] BGE
Branch If Greater Than or Equal, signed (BGE) the contents of source register rs1 , is com-
pared with contents of source register rs2 if (rs1 ) is greater than (rs2 ) control is transferred to the
label as mentioned.
Syntax
where,
Description
The BGE instruction compares contents of (rs1 ) with the contents of (rs2 ). If (rs1 ) contents is greater
than or equal to contents of (rs2 ), (signed comparison) control jumps to the specified location. The
target address is given as a PC-relative offset.
70
Usage
[Link] BGEU
Branch If Greater Than or Equal, Unsigned (BGEU) the contents of source register rs1 , is com-
pared with contents of source register rs2 . If rs1 is greater than or equal to rs2 , control is transferred
to the label as mentioned.
Syntax
where,
Description
The BGEU instruction compares contents of (rs1 ) is compared with the contents of (rs2 ). If (rs1 )
contents is greater than (rs2 ), (unsigned comparison) control jumps. The target address is given as
a PC-relative offset.
Usage
Branching instructions in this section are pseudo or convenient instructions to be used in place of
the base instructions.
[Link] BEQZ
Branch if Equal to Zero (BEQZ) instruction jumps to a specified location in the program if the
condition, equal to zero is met.
Syntax
Translation
where,
Description
The BEQZ translates to beq rs1 , x0, label, as the expansion reveals, the (rs1 ) contents is com-
pared with the zero register (x0) and the program counter branches to the specified label if the
condition equal to zero is met.
Usage
li x6, 0 # x6 = 0
loop: li x5, x5, 100 # Example operation
beqz x6, loop # x6 = 0 branch to loop
Assume rs1 (x6) is initialized to 0 and there is an example operation within the specified label
(loop). BEQZ on register rs1 (x6) will shift the program counter to the specified label since the
contents of rs1 (x6) is indeed 0.
[Link] BNEZ
Branch if Not Equal to Zero (BNEZ) jumps to a specified location in the program if the condi-
tion, not equal to zero is met.
Syntax
Translation
where,
Description
The BNEZ instruction translates to BNE. As the translation reveals, the contents of rs1 is compared
with the zero register (x0) and branches to the specified label, if the condition that the contents of
rs1 register is not equal to zero, is met.
Usage
li x6, 50 # x6 = 50
loop: addi x5, x6, 100 # Example operation
bnez x6, loop 6 0 jump to loop
# x6 =
Assume rs1 (x6) is initialized to 50 and there is an example operation within the specified label
(loop). BNEZ on register rs1 (x6) will shift the program counter to the specified label since the
contents of rs1 (x6) is indeed not equal to 0.
[Link] BLEZ
Branch if Less Than or Equal to Zero (BLEZ) the program counter branches to the specified
location if the condition, less than or equal to zero.
72
Syntax
Translation
where,
Description
The BLEZ expands to BGE. This instruction is a signed comparison instruction which shifts the
program counter to the specified location if value in rs1 is less than or equal to 0.
Usage
Assuming rs1 (x6) is initialized to -50, BLEZ, shifts the program counter to label (loop) since the
condition that rs1 (x6) should to either less than or equal to 0, is met.
[Link] BGEZ
Branch if greater than or equal to Zero (BGEZ) checks if register rs1 is greater than or equal
to zero, if the condition is met, the program counter branches to the specified label.
Syntax
Translation
where,
Description
The BGEZ expands to BGE. This instruction compares if contents of rs1 is greater than or equal to
zero (x0). If the conditions are met, the program counter branches to the specified label.
Usage
li x6, 50 # x6 = 50
loop: addi x5, x6, 100 # Example operation
bgez x6, loop # x6 ≥ 0 jump to loop
Assuming that rs1 (x6) is initialized to a value 50, BGEZ instruction shifts the program counter to
label (loop) since the condition, rs1 (x6) must be greater than or equal to 0, is satisfied.
73
[Link] BLTZ
Branch if Less Than Zero (BLTZ) shifts the program counter to a specified location if the value
in a register is less than zero.
Syntax
Translation
where,
Description
BLTZ is a signed comparison instruction with its base instruction being BLT. The value in rs1 is
compared with x0 and shifts the program counter to the specified location in case its contents are
less than 0.
Usage
Assuming rs1 (x6) is initialized to -20, BLTZ shifts the program counter to label (loop) since the
contents of rs1 (x6) is indeed less than 0. The program then executes the instructions within the
label (loop).
[Link] BGTZ
Branch if Greater Than Zero (BGTZ) shifts the program counter to a specified location, if the
contents of a register is found to be greater than zero.
Syntax
Syntax
where,
Description
The BGTZ is a signed comparison instruction which translates to its base instruction BLT. If the
contents of rs1 is greater than x0, the program counter shifts and continues its execution with the
instructions in the location specified.
74
Usage
li x6, 5 # x6 = 5
loop: addi x5, x6, 100 # Example instruction
bgtz x6, loop # x6 > 0 jump to label
Assuming that rs1 (x6) is initialized to value 5, the BGTZ instruction shifts the program counter to
label (loop), since rs1 (x6) is greater than 0. Program execution continues with what label (loop)
contains.
[Link] BGT
Branch if Greater Than (BGT) instruction shifts the program counter to the specified location if
the value in a register is greater than that of another.
Syntax
Translation
where,
Description
The BGT is a signed comparison instruction which translates to BLT. In this instruction, it is examined
if the contents of rs2 is less than the contents of register rs1 . If the condition is satisfied, program
counter branches to the location specified.
Usage
li x5, 30 # x5 = 30
li x6, -25 # x6 = −25
loop: addi x7, x6, 100 # Example instruction
bgt x5, x6, loop # x6 < x5 jump to loop
Assuming rs1 (x5) is initialized to 30 and rs2 (x6) is initialized to -25. Since the condition rs2 (x6)
should be less than rs1 (x5) to branch, is true (BGT translates to BGT), the program branches to
label (loop) and continues execution
[Link] BLE
Branch if Less Than or Equal (BLE) instruction shifts the program counter to the specified lo-
cation if the value in a register is less than or equal to that of another.
Syntax
Translation
where,
Description
The BLE is a signed comparison instruction which examines if the contents of rs1 is less than or
equal to the contents of register rs2 . If the condition is satisfied, program counter branches to the
location specified.
Usage
Assume rs1 (x5) is initialized to -25 and rs2 (x6) is initialized to 30, the program branches to the
specified label (loop) since rs1 (x5) is less than rs2 (x6).
[Link] BGTU
Syntax
Translation
where,
Description
The BGTU is an unsigned comparison instruction which examines if the contents of rs1 is greater than
rs2 . If the condition is satisfied, the program counter shifts to the specified location and continues
executing instructions from there on.
Usage
li x6, 50 # x6 = 50
li x7, 10 # x7 = 10
loop: bgtu x6, x7, loop # x6 > x7 Jump to loop
76
Assume rs1 (x6) is initialized to 50 and rs2 (x7) is initialized to 10. The program shifts to the
specified label (loop) as rs1 is greater than rs2 .
[Link] BLEU
Branch if Less Than or Equal, Unsigned (BLEU) instruction examines whether the of one reg-
ister is less than or equal to the other and the program counter shifts accordingly.
Syntax
Translation
where,
Description
BLEU is an unsigned comparison instruction which examines if contents of rs1 is less than or equal
to that of rs2 . If the condition is satisfied, the program counter branches to the specified label.
Usage
li x6, 20 # x6 = 20
li x7, 25 # x7 = 25
loop: addi x5, x7, 100 # Example instruction
bleu x6, x7, loop # x6 ≤ x7 Jump to loop
Assuming rs1 (x6) is initialized to 20 and rs2 (x7) is initialized to 25. Since rs1 (x6) is less than
rs2 (x7), the BLEU instruction branches the program counter to the specified label (loop).
[Link] RET
Return from Subroutine (RET) pseudo-instruction used at the end of a subroutine to return to
its caller.
Syntax
label: ret
where,
label sub-routine
Description
The RET translates to jalr x0, 0(ra). This instruction jumps to the address in the ra, but does
not save a return address. The instruction will ensure that execution continues from where the call
was made.
77
Usage
li x6, 50
li x7, 20
addi x5, x7, 100
ret # Return back to caller
Unconditional Jump Instructions transfers the program sequence to the specified memory address
without a condition.
Syntax
where,
rd destination register
of f set offset value
Description
The JAL instruction is used to call a subroutine (i.e., function). The return address (i.e., the PC,
which is the address of the instruction following the JAL) is saved in the destination register. The
target address is given as a PC-relative offset, more precisely, the offset is sign-extended, multiplied
by 2, and added to the value of the PC. The value of the PC used is the address of the instruction
following the JAL, not the JAL itself. The offset is multiplied by 2, since all instructions must be
half word aligned.
Usage
[Link] JALR
Jump and Link Register (JALR) is used to invoke a subroutine call (i.e., function/method/pro-
cedure).
Syntax
where,
rd destination register
of f set offset value
78
Description
The JALR instruction is used to call a subroutine (i.e., function). The return address (i.e., the PC,
which is the address of the instruction following the JALR) is saved in the destination register. The
target address is given as a PC-relative offset, more precisely, the offset is sign-extended and added
to the value of the destination register. The offset is not multiplied by 2.
Usage
[Link] J
Jump (J) is a pseudo-instruction which uses Jump and Link (JAL) instead and sets the destination
register to zero to discard return address.
Syntax
j label
where,
j Jump
label A string that points to an instruction
Description
J is a plain unconditional jump (UJ-type) instruction used to jump to anywhere in the code memory.
This instruction translates to jal x0, label, which sets the return address to zero thus discarding
the return address.
Usage
[Link] JR
Jump Register (JR) is a pseudo-instruction which translates to Jump and Link Register (JALR)
which jumps to the address and places the return address in a general purpose register (GPR).
Syntax
jr rs1
where,
jr Jump Register
rs1 Return Address
79
Description
JR is translated to jalr rd, rs1 , imm where, rd is zero register, rs1 contains the target address
and imm is given the value 0. In this instruction, the rd field is set to zero thereby performing the
jump to the address in ra register but does not save a return address.
Usage
SYSTEM instructions are used to access system functionality that might require privileged access
and are encoded using the I-type instruction format. These can be divided into two main classes:
those that atomically read-modify-write control and status registers (CSRs), and all other potentially
privileged instructions. CSR instructions are described in this
5.3.1 ECALL
Environment Call (ECALL) instruction is used to implement system calls. Also, ECALL is used
to transfer control from lower privilege level to higher privilege level.
Syntax
ecall
Description
The ECALL instruction is used to implement system calls. System calls are subroutine calls made
from a lower privilege code to a higher privilege code. The execution happens in the higher privilege
level and result is given back to the lower privilege code. Once the desired operation is over, the
control returns back to the lower privilege level. Generally, if an operation needs to be done at
a higher privilege level, ECALL is used. For example, the implementations of libraries for FILE
operations in a Unix operating system, uses ECALL. On execution of ECALL, one of the following
exception arise:
As described in the section “mcause”, the above exceptions have a dedicated exception code. The
trap handler in higher privilege level handles the exception and redirects the call to the corresponding
subroutine. The arguments are passed through argument registers (ai ) and result is saved in Saved
register (si ).
80
Usage
5.3.2 EBREAK
Environment Break (EBREAK) is an assembly instruction that is used to stop the execution sud-
denly.
Syntax
ebreak
Description
The EBREAK instruction is used to invoke a debugger, by causing a “Breakpoint” exception. Typically
the debugging software will insert this instruction at various places in the application code sequence,
in order to gain control from an executing program.
Usage
5.3.3 WFI
Wait For Interrupt (WFI) instruction causes the processor to suspend instruction execution. The
processor will wake up when an asynchronous interrupt occurs and resumes execution.
Syntax
WFI
Description
On execution of WFI trap handler will be invoked and upon return to the code sequence containing
the WFI instruction, the next instruction following the WFI will be executed.
5.3.4 NOP
The No Operation (NOP) instruction executes silently. It does not change registers, memory or
processor statues. Only the program counter is advanced.
Syntax
nop
81
Description
NOP is a pseudo instruction that expands to addi x0, x0, 0. The x0 is a read-only register holding
the value zero. Anything, written to x0 register is discarded. The NOP instruction does not
change any architecturally visible state, except for advancing the pc and increment any applicable
performance counters. As RISC-V has no arithmetic flags (i.e., carry, overflow, zero, sign flags),
any arithmetic operation whose destination register is x0 will endup as a no operation instruction
regardless of the source registers.
Usage
nop # pc ←− pc + 2
Trap is a specific scenario caused by a exceptional condition or interrupt. In RISC-V, the term
trap refers to, transfer of control to a trap handler caused either by an exception or an interrupt.
Exception is an unusual condition occurring at run time of an instruction in the current RISC-
V hart. An exception disrupts the normal flow of instruction execution. Exceptions are usually
synchronous. Interrupts are another form of a trap, where the origin of interrupt is from Timer
or peripherals. Interrupt is a scenario designed to service a specific external input. All the Traps
can be handled or ignored. It is upto the software to decide. A “trap handler” is a subroutine that
handles the trap in a software. The way of handling a trap is left to the software designer and varies
from one type of trap to another.
6.1 Exceptions
Exceptions are usually synchronous and always tied to an assembly instruction. A exception can
arise at any stage of execution of an instruction. For example, during instruction decode stage, the
hardware may detect a bad opcode field. This will trigger a “illegal instruction” exception. When
an exception happens, the hardware sets the mcause register with the corresponding exception code.
The pc is set to the trap handler base address. The exception code helps to identify the type of
exception. The possible exceptions in RISC-V are listed in Table
• Illegal instruction
• Environment call
• Break point
83
84
The exception occurs when the programs tries to execute any illegal instruction. For example trying
to write on a read-only CSR register will generate a illegal instruction exception.
Example:
li t0, 8 # t0 ←− 8
csrrs x0, mhartid, t0 # Attempt to write to a read-only CSR, generates exception
The exception occurs when the programs tries to execute an unconditional jump or take a branch,
wherein the target address is not 4 byte aligned. For example, executing a program with start address
as 0x80000001. This will generate a instruction address misalignment exception on a unconditional
jump.
Note:
Instruction address misaligned exceptions are not possible on machines that support extensions with
16-bit aligned instructions, such as the compressed instruction-set extension, C.
Example:
# start address set to 0x80000001 ( start not aligned to 4 byte boundary.
The exception occur when the programs tries to execute an load instruction to access data from
misaligned address or an address that is not 4 byte aligned. For example, trying to access a data
section without using a properly aligning it would cause this exception.
Example:
The exception occurs when the programs tries to execute an store instruction at a misaligned address
(Address that is not four byte aligned). For example trying to store data into a data section without
using proper alignment, would cause this exception.
Example:
The exception occurs when the programs tries to access an instruction on a invalid memory location.
For example executing unconditional jump instruction to a memory location which is out of bounds
of the physical memory.
Example:
data1:
.word 100
.word 99
In the above case, data1 holds data values. The data values are aligned at word boundary. Now,
we jump to a location, that is data1 - 1 byte memory location. Here, when we execute ‘jalr’, an
instruction access fault happens. The jump should have happened at 4 byte aligned address.
The exception occurs when the programs attempt to do a load on a invalid memory location. For
example trying to load from address which is more than the bound of memory or inaccessible by
memory. Certain registers are 32 bits of size. A 64 bit load operation might thrown an error.
Example:
start:
la x15, start # x15 ←− Address ( start)
ld x16, -16 ( x15 ) # x16 ←− Content(x15-16) -Exception generated
86
The exception occurs when the programs attempts to do a store on an invalid memory location.
For example, trying store to address which is more than the bound of memory or inaccessible by
memory.
Example:
start:
la x15, start # x15 ←− Address ( start)
sd x16, -16 ( x15 ) # x16 −→ Content(x15-16) -Exception generated
The exception occurs when the programs executes a break-point set in the program to enter debug
mode.
This exception occurs when the programs executes a system call. The system call is realized in
RISC-V using ecall instruction. The ecall instructions can also used to switch from lower privilege
modes to higher privilege modes. An example ecall instruction is demonstrated below.
Example:
Once an exception happens the processor stops execution and passes the control the trap handler.
Inbetween this, the processor privilege is set to Machine mode and processor sets the mcause register
with exception code. The mepc is set with the pc of the instruction that caused the exception. All
exception’s come to the Machine Mode trap handler first. This applies for exceptions that arise
from different privilege levels. The Machine Mode trap handler executes in Machine Mode. In the
trap handler, first the context of the registers are saved in stack. Then the trap is serviced. After
this the saved context in stack is restored back. This way, the trap is handled without causing much
trouble to the execution flow.
Now, a question may arise on how the hardware jumps to the trap handler. This is established by
setting the mtvec register with Tap handler’s physical address. Usually the value in mtvec is called
as “Trap entry”.
Incase, we may not want to handle the exception in Machine Mode. we might want to handle it in
Supervisor Mode or even User Mode. As such, there is a facility to “delegate” some or all exceptions
to the lower privilege levels. These things will be seen in PART II.
87
Start
System init
User applica-
loop
tion running
T RAP EV EN T
Trapped state
priv - M, mepc - pc
mie ←− 0
pc - mtvec base addr
Trap entry
Trap handler
yes
Interrupt handler
control transferred
control
mret
transferred
Exception handler
yes Instruction
exception 0
address misaligned
no
yes Instruction
exception 1
access fault
list of exception
yes Store/AMO
exception 16
page fault
Unknown
no MRET
exception
The trap handler must begin on word aligned address boundary. This means that any address
stored in the mtvec CSR must have “00” as the least significant two bits. Secondly, The
RISC-V spec makes use of the last two bits in mtvec as follows.
• If the last two bits are “00”, then it means the CSR contains the address of a single trap
handler.
• If the last two bits are “01”, then it means there is a collection of trap handlers, one for
each type of asynchronous interrupt (Vectored Trap handler).
• The remaining bit patterns “10” and “11” are not used.
Things to remember:
• The MEPC is set to the last instruction that was executing when system Trapped.
• The PC is set to MTVEC value. Incase of Vectored Traps handling, the PC is set mtvec
base address + 4x(mcause).
The exception handling mechanism uses 4/5 registers to know all the information of a Trap.
Those registers are CSR registers. A separate set of register is made available for each privilege
level. Mstatus register has the Trap related information as bit information. Mepc register holds
the physical address of the instruction, when exception happened. Mtvec has the base address
of the Trap handler. It is usually referred to as the entry point of the Trap. Mcause has the
exception of the Trap.
6.2.2 MSTATUS
Machine Status Register (MSTATUS) is used to enable/disable the interrupts. The mstatus
register has many more bits. But these are the bits used with respect to a Trap.
Description
63 ... ... 13 12 11 10 9 8 7 6 5 4 3 2 1 0
MSTATUS contains a number of fields that can be read and updated. By modifying these fields,
the software can do things like enable/disable interrupts and change the virtual memory model.
90
31... ... 13 12 11 10 9 8 7 6 5 4 3 2 1 0
2 2 1 1 1 1 1 1 1 1 1
We use MSTATUS register while handling exceptions to read and set the MPP and SPP bits
based on the requirement to switch privilege modes. This will be discussed in PART II.
Example:
li t0,0x800
csrrs zero, mstatus, t0 # Setting MPP bits on mstatus register
6.2.3 MRET
We were discussing earlier that mtvec register helps the hardware to locate the base address of
the Trap handler. If there is an entry to a Trap, there should also be an exit. In the following
section, we will be dealing with this part exactly.
Machine Mode Trap Handler Return (MRET) is used to return from a trap handler that is
executing in the Machine Mode.
Syntax
MRET
Description
Once the trap is serviced and the saved context is restored. The mret instruction can be called.
This instruction basically tells the processor to pass control back to the address in the mepc
register. Incase of exception originating from a lower privilege level. The MRET instruction
transfers control to that privilege level. The MPP field of the status register will be referred, to
determine which mode to return to (either m, s, or u). The return will be effected by copying
the saved program counter from mepc to the Program Counter (pc).
Exceptions
6.3.1 Stack
Stack is an abstract data structure used to implement function calls in a program and holds
data temporarily during a function call. Being a linear data-structure, a stack grows and
91
shrinks during calls to function and is based on the last-in-first-out (LIFO) concept. The
implementation of stack on an architecture is entirely at the software designer’s disposal.
Availability of limited registers in an architecture, restricts the number of variables that can
be used in a program. A stack serves the purpose of holding data temporarily during function
calls. It is specifically used to store variables when a function or procedure call is made.
A stack is famously used for “UNDO” i.e., holding the history of an activity. For example,
before switching over to a function, a stack is called upon to store the contents of the necessary
registers as it may be modified during the execution of the function. After the function is
executed, all registers can be restored with their values prior to the function call. This action
of store and retrieval is called “PUSH and POP”. Some architectures support the use of
“PUSH” and “POP” keywords, while others use “LOAD” “STORE” instructions to do the
same.
A program that implements a stack, sets aside a certain portion of the memory for its use. A
register called “Stack Pointer” stores the address of the last program request in a stack. A
program’s stack is not generally hardware, but the Stack Pointer which points to the current
area, is a CPU register. In RISC-V the stack is always kept 16-byte aligned.
• Allocate space for Stack, by decrementing the sp by the number of locations required
multiplied by XLEN1 bytes. This will allocate memory for stack temporarily in memory.
• PUSH data onto stack. This essentially writes the register values to the stack.
* sd x1, 1*XLEN(sp)
* sd x2, 2*XLEN(sp)
* sd x4, 2*XLEN(sp)
• POP data from stack. This essentially restores the register values back from the stack.
* ld x1, 1*XLEN(sp)
* ld x2, 2*XLEN(sp)
* ld x4, 2*XLEN(sp)
• To free the stack, increment sp by the same number of locations used earlier ( ‘n locations’
multiplied by XLEN bytes). This will reset the stack pointer to the bottom of the caller
stack.
Interrupts are asynchronous events triggered by external source. The processor may tend
to process or ignore interrupts. Interrupts can be both software and hardware. In RISC-V
chapter
interrupts are classified into timer, software and external interrupts. The external interrupts are
also called as global interrupts. Timer interrupts are handled in the core. Software interrupts
are internal to the processor, and external interrupts are handled by the PLIC module. In this
chapter, we are going to see about handling Timer and External interrupts in RISC-V.
A “timer interrupt” is caused when a separate timer circuit indicates that a predetermine
interval has ended. The timer subsystem will interrupt the currently executing code. The timer
interrupts are handled by the OS which uses them to implement time-sliced multi threading.
mtime register is a synchronous counter. It starts running from the time the processor is
powered on and provides the current real time in ticks.
This register is used to store the time period after which a timer interrupt should happen.
The value of mtimecmp is compared with mtime register. When mtime value becomes greater
mtimecmp, a timer interrupt happens. Both the mtime and mtimecmp registers are 64 bit
memory mapped registers.
93
94
Start
Configure
timer interval
enable interrupt
user applica-
loop
tion running
TRAP EVENT
Handle trap
no Other inter-
Timer interrupt
rupt handler
yes
timer inter-
rupt handler
write mtimecmp
register
Each of the Timer, Software, and External Interrupts can be enabled individually. Globally,
all the interrupts can be enabled/disabled using the MIE bit in MSTATUS register. The MTIE,
MSIE, MEIE bit enable’s/disable’s Timer, Software, and External interrupts individually.
When an interrupt occurs the MPIE bit will be set to hold the interrupt enable state. And the
MIE bit is set to 0. This taken care by Hardware. This way the interrupt’s are blocked and
states are maintained.
95
An “External Interrupt” comes from outside the processor and the precise nature of the cause
will depend on the application. Such interrupts are asynchronous and are generated by external
sources through the hardware, which maybe serviced by the processors. For example, a RISC-V
processor used in an embedded process control system might receive external interrupts from
various sensors demanding for appropriate action(s) to be taken. These interrupts are handled
by the Platform Level Interrupt Controller (PLIC). The source of interrupts for PLIC are the
devices connected to the SoC (IO, UART, SPI, etc...). As per the RISC-V specification these
are termed as global interrupt sources, with each prioritised and routed by PLIC to the core.
For more detailed information on PLIC, kindly refer to the PLIC document
provided in the link: [Link]
A “software interrupt” is caused by setting a bit in the machine status word. This can be useful
in a multi-core chip where a thread running on one core needs to send an interrupt signal to
another core.
Some traps are “maskable” and others are “non-maskable”. A maskable interrupt can either be
handled, or can be ignored, or can be passed from a higher privilege level to a lower privilege
level.
96
8.1 Object File section
8
Assembler Directives
chapter
Object files contain instructions and data. The instructions and data are stored in appropriate
sections according to their use.
8.1.1 .TEXT
Syntax
Description
This portion of the object file or virtual address space is also known as the code segment or
simply the text segment of the program. It contains executable instructions which cannot
be modified at run-time. Any attempt to store into the .TEXT section will produce a “Seg-
mentation” error and the program is terminated immediately. The code segment can contain
constants in addition to instructions.
Usage
.text
li x5, 100
addi x5, x0, 100
97
98
8.1.2 .DATA
A read-write portion of the object file which contains data for the variables of the program.
Syntax
.section .data or .data
Variables
Description
The .DATA section contains initialized static variables that is global and static local variables.
Usage
.data
.word 1
helloworld: .ascii "Hello World!"
8.1.3 .RODATA
Syntax
.section .rodata or .rodata
data
Description
This section consists of read-only data for the program. But is not really enforced.
Usage
.rodata
mydata: .asciz "Hello World!"
8.1.4 .BSS
The Basic Service Set (.BSS) is a read-write section containing uninitialized data.
Syntax
where,
Description
The .BSS directive is used for local common variable storage. When the program starts running,
all the contents of this section are zeroed bytes. Since this section starts out containing zeroed
bytes there is no need to store explicit zero bytes in the object file. The .BSS section was
99
invented to eliminate those explicit zeros from object files. In the program the .BSS section
follows the data section.
Usage
.bss label1, 8, 4
8.1.5 .COMM
The Common (.COMM) common object to .BSS section, declares a common symbol named sym-
bol.
Syntax
where,
Description
The .COMM declares a common symbol named symbol. When linking, a common symbol in
one object file may be merged with a defined or common symbol of the same name in another
object file. The size of an object in the .BSS section is set by the .COMM directive.
Usage
.comm label1, 8
8.1.6 .COMMON
Syntax
where,
Description
The .COMMON declares a common symbol named symbol. When linking, a common symbol in
one object file may be merged with a defined or common symbol of the same name in another
object file. This directive behaves somewhat like .comm directive, but the syntax is different.
Usage
.common label1, 8
100
8.1.7 .SECTION
Section (.SECTION) directive assembles the following code into a section named "name".
Syntax
.section name
where,
Description
.SECTION instruction is only supported for targets that support arbitrarily named sections, on
"[Link]" targets.
Usage
.section A
8.1.9 .OPTION
The .OPTION directive has a statically defined list of arguments with RISC-V options.
Syntax
.option argument
where,
Description
The .OPTION directive modifies RISC-V specific assembler options inline with the assembly
code. This is used when particular instruction sequences must be assembled with a specific set
of options.
Usage
.option push
8.1.10 .FILE
The .FILE directive to start a new logical file.
Syntax
.file string
where,
Description
The .FILE directive, in general, the filename is recognized whether or not it is surrounded by
quotes. But to specify an empty file name, the quotes must be given.
Usage
.file Hello
8.1.11 .IDENT
Syntax
.ident "string"
where,
Description
The .IDENT directive is used by some assemblers to place tags in object files. It simply accepts
the directive for source-file compatibility with such assemblers, but does not actually emit
anything for it. At times it is used to place tags in object files. The behavior of this directive
varies depending on the target.
Usage
8.1.12 .SIZE
Syntax
Description
Usage
memcpy:
mv x4, x5 # x4 ←− x5
beqz x7, 1b # if x7 = 0; goto 1b
1: add t1, t1, 1 # t1 ←− [t1+1]
add t2, t2, -1 # t1 ←− [t2-1]
.size memcpy, .-memcpy
102
[Link] .TYPE
Syntax
where,
Description
The .TYPE directive allows you to tell the assembler what type a symbol is.
Usage
[Link] .GLOBAL
Syntax
where,
Description
Usually, a defined symbol is visible only to partial program, only to the portion where it is
defined. With the .GLOBAL directive its value is made available to other partial programs that
are linked with it.
Usage
i: word 5
.global i # Variable i is made global
[Link] .LOCAL
Syntax
.local symbol
where,
Description
The .LOCAL directive marks each symbol in the comma separated list of names as a local
symbol, so that it will not be externally visible. If the symbols do not already exist, they will
be created.
Usage
i: word 5
.local i # Variable i is made local
[Link] .EQU
Syntax
where,
Description
The .EQU directive has two operands separated by a comma. Wherever the first operand
appears in the program, the assembler replaces it with the second operand. Used only while
assembling your code, once the symbol is defined, its value can not be changed in the remaining
part of the source code.
Usage
[Link] .ALIGN
The .ALIGN directive aligns the next instruction by a given byte boundaries.
Syntax
.align size
where,
Description
The .ALIGN directive gives the location counter desired alignment in bytes.
Usage
[Link] .BALIGN
Syntax
.balign size
where,
Description
Usage
[Link] .P2ALIGN
The .P2ALIGN directive directive aligns member byte boundaries with padding. Alias for
.ALIGN directive.
Syntax
.p2align size
where,
Description
The .P2ALIGN directive pads location counter to a particular storage boundary. Alignment
done to the power of 2.
Usage
Assembler directives are instructions to the assembler to perform various bookkeeping tasks,
storage reservation, and other control functions.
[Link] .2BYTE
Syntax
.2byte value
105
where,
Description
The .2BYTE directive initializes the specified value to 2 bytes or 16-bit unaligned integers. It
can also store multiple comma-separated values. The operands specified can be decimal, hex,
binary, or character constants, but not labels.
Usage
.2byte 0x1000
[Link] .4BYTE
Syntax
.4byte value
where,
Description
The .4BYTE directive initializes the specified value to 4 bytes or 32-bit unaligned integers. It
can also store multiple comma-separated values. The operands specified can be decimal, hex,
binary, or character constants, but not labels.
Usage
.4byte 0x1000000
[Link] .8BYTE
Syntax:
.8byte value
where,
Description
The .8BYTE directive initializes the specified value to 8 bytes or 64-bit unaligned integers. It
can also store multiple comma-separated values. The operands specified can be decimal, hex,
binary, or character constants, but not labels.
Usage
.8byte 0x1000000000000000
106
[Link] .HALF
The .HALF directive for naturally aligned 2byte or 16-bit comma separated words.
Syntax
.half value
where,
Description
The .HALF directive initializes the specified value to 2 bytes or 16-bit aligned integers. It
can also store multiple comma-separated values. The operands specified can be decimal, hex,
binary, or character constants, but not labels.
Usage
.half 0x1000
[Link] .WORD
The .WORD directive for naturally aligned 4-bytes or 32-bit comma separated words.
Syntax
.word value
where,
Description
The .WORD directive initializes the specified value to 4 bytes or 32-bit aligned integers. It can
also store multiple comma-separated values and the operands specified can be decimal, hex,
binary, or character constants, but not labels.
Usage
.word 0x1000000
[Link] .DWORD
The Double Word (.DWORD) directive for naturally aligned 8-bytes or 64-bit comma separated
words.
Syntax
.dword value
where,
Description
The .DWORD directive creates a double word constant. They can also store multiple comma
separated values. The operands specified can be decimal, hex, binary, or character constants,
but not labels.
Usage
.dword 0x7000000000000000
[Link] .BYTE
Syntax
.byte value
where,
Description
The .BYTE directive initializes the specified value to 1 bytes or 8-bit unaligned integers. It
can also store multiple comma-separated values. The operands specified can be decimal, hex,
binary, or character constants, but not labels.
Usage
.byte 0x10
8.3.1 .ASCIZ
ASCIZ (.ASCIZ) instruction is similar to the ascii instruction and emits the specified string
within double quotes.
Syntax
.asciz "string"
where,
Description
The .ASCIZ instruction is like the ascii instruction, but each string is followed by a zero byte.
The “z” in .ASCIZ stands for zero. For this directive, the assembler increments the location
counter by the length of the string, including the null character at the end. This directive is
easier to read for text strings.
Usage
8.3.2 .STRING
Syntax
.string "String"
where,
Description
For the .STRING directive, the assembler increments the location counter by the length of the
string, including the null character at the end.
Usage
8.3.3 .INCBIN
Include Binary (.INCBIN) instruction emits the included file as a binary sequence of octets.
Syntax
.incbin "file"
where,
Description
The .INCBIN instruction takes any file and includes it within the file being compiled. The file
is included as it is, without being assembled.
Usage
This instruction includes the file “hello.c” into the file “File. ”.
8.3.4 .ZERO
Syntax
.zero integer
where,
Description
.ZERO instruction reserves a block of memory as an input buffer, it reserves and initializes a
block of memory to zero.
Usage
This instruction reserves 100 bytes of memory and stores zeros in them.
110
9.1 Important Prerequisites
Example Programs and Practice
exercises
9 chapter
1. The necessary files to compile and simulate ASM programs in spike environment, are
hosted inside the spiking folder. Do the following in a terminal:
(a) cd $HOME
(b) git clone [Link]
(a) cd spiking
4. Debugging, Loading and Executing an ASM program. Open three separate terminals,
ensuring each are within the spiking folder. Run the following commands individually in
each terminal.
111
112
Note: All programs illustrated here have been tested on the spike simulator with a BRAM-
memory starting address set to 0x10010000.
start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
andi t2, t2, 0 # Clear register t2
andi t3, t3, 0 # Clear register t3
li t0, 0xFF # Load a 8-bit number to t0
li t1, 0xFFFF # Load a 16-bit number to t1
li t2, 0xFFFFFFFF # Load a 32-bit number to t2
li t3, 0x7FFFFFFFFFFFFFFF # Load a 64-bit number to t3
start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
li t0, 0x4A # Load register t0 with a value
mv t1, t0 # Copy contents of register t0 to register t1
start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
li t0, 0x10011000 # Load register t0 with an address
li t1, 0x71 # Load register t1 with a 1-Byte value
sb t1, 0(t0) # Store the byte in t1 into first byte slot of
address specified in t0
113
start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
li t0, 0x10011000 # Load register t0 with an address
li t1, 0x7971 # Load register t1 with a 2-Byte (half-word)
value
sh t1, 0(t0) # Store the half-word in t1 to the first
half-word slot of address specified in t0
li t1, 0x7B7A # Load register t1 with another 2-Byte
(half-word) value
sh t1, 2(t0) # Store the half-word in t1 to the second
half-word slot of address specified in t0
start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
li t0, 0x10011000 # Load register t0 with an address
li t1, 0x7B7A7971 # Load register t1 with a 4-Byte (1 word) value
sw t1, 0(t0) # Store the word in t1 to the first-word slot of
address specified in t0
li t1, 0x7F7E7D7C # Load register t1 with another 4-Byte (1-word)
value
sw t1, 4(t0) # Store the word in t1 to the second word slot
of address specified in t0
start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
andi t1, t1, 0 # Clear register t1
li t0, 0x10011000 # Load register t0 with an address
li t1, 0x7F7E7D7C7B7A7971 # Load register t1 with double word
(8-bytes = 2 words) value
sd t1, 0(t0) # Store the double word in t1 to
address specified in t0
114
start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
li sp, 0x10012000 # Setting the stack pointer register to an
address
li t0, 0x7776757473727170 # Load a 64-bit (8-bytes) value to register t0
li t1, 0x7F7E7D7C7B7A7978 # Load a 64-bit (8-bytes) value to register t1
[Link] Addition - Illustrating addition operation between contents of two registers and con-
tents of a register with an immediate value
start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
andi t2, t2, 0 # Clear register t2
andi t3, t3, 0 # Clear register t3
li t0, 0x1A352A9C # Loading register t0 with a value
li t1, 0x1B2D4C6A # Loading register t1 with a value
addi t2, t0, 0x1CB # Add t0 with an immediate value
add t2, t0, t1 # Add -- t0 with t1 and place the result in t2
addw t3, t0, t1 # Add -- t0 with t1 and place the 32-bit result
in t3
[Link] Subtraction - Illustration the subtraction operation between contents of two registers
start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
andi t2, t2, 0 # Clear register t2
andi t3, t3, 0 # Clear register t3
li t0, 0x1A03533A12054021 # Load register t0 with a value
li t1, 0x3B14875C35286142 # Load register t1 with a value
sub t2, t1, t0 # Subtract t0 from t1 and place the result in t2
subw t3, t1, t0 # Subtract t0 from t1 and place the 32-bit
result in t3
115
start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
andi t2, t2, 0 # Clear register t2
andi t3, t3, 0 # Clear register t3
andi t4, t4, 0 # Clear register t4
andi t5, t5, 0 # Clear register t5
li t0, -43 # Load register t0 with a negative value
li t1, 187 # Load register t1 with a positive value
mulh t3, t0, t1 # Signed Multiplication of t0 with t1 and place
the most significant half of the result in t3
mul t2, t0, t1 # Multiplication of t0 with t1 and place the
lower half of the result in t2
mulhu t4, t0, t1 # Unsigned Multiplication of t0 with t1 and
place the most significant half of the result
in t4
mulw t5, t0, t1 # Multiply-word, multiply t0 with t1 and place
the result in t5
[Link] Division - Illustrating different division operations between contents of two registers
and procuring the quotient of the division operation into a register
start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
andi t2, t2, 0 # Clear register t2
andi t3, t3, 0 # Clear register t3
andi t4, t4, 0 # Clear register t4
andi t5, t5, 0 # Clear register t5
li t0, -2516 # Load register t0 with a negative value
li t1, 74 # Load register t1 with a positive value
div t2, t0, t1 # Divide t0 by t1 and place quotient in t2
li t3, 1332 # Load register t3 with a positive value
li t4, 18 # Load register t4 with a positive value
divu t5, t3, t4 # Unsigned division of t3 by t4 and place
quotient in t5
[Link] Remainder - Illustrating different division operations between contents of two regis-
ters and procuring the remainder of the division operation into a register
start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
andi t2, t2, 0 # Clear register t2
andi t3, t3, 0 # Clear register t3
andi t4, t4, 0 # Clear register t4
116
[Link] ANDI
start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
[Link] AND
start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
andi t2, t2, 0 # Clear register t2
[Link] ORI
start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
[Link] OR
start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
andi t2, t2, 0 # Clear register t2
[Link] X-ORI
start:
andi t0, t0, 0 # Clear register t0
[Link] X-OR
start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
andi t2, t2, 0 # Clear register t2
[Link] NOT
start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
li t0, 0xFFFFFFFFFFFFFFD3 # Load t0 register with a number
not t1, t0 # Logical NOT operation on the
contents of t0, result is placed in
register t1
If statement
start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
li t0, -2 # Load t0 register with a negative
value
slt t1, t0, x0 # Set t1 to 1 if t0 is less than 0
j Endif # Short jump to end of statement
119
If-Else statement
start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
andi t2, t2, 0 # Clear register t2
andi t3, t3, 0 # Clear register t3
li t0, 2 # Load t0 with a number
li t3, -2 # Load t3 with a number
slt t1, t0, x0 # Set t1 to 1 if t0<0
beq t1, x0, Else # If t1=0, goto "Else" statement
j Endif # End If statement
Else: sgt t2, t3, x0 # Else statement, t2=1 if t3>0
Endif: j Endif # End of If-Else conditional
statements
If-ElseIf-Else statement
start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
andi t2, t2, 0 # Clear register t2
andi t3, t3, 0 # Clear register t3
andi t4, t4, 0 # Clear register t4
andi t5, t5, 0 # Clear register t5
li t0, 2 # Load t0 with a positive value
li t3, -2 # Load t3 with a negative value
slt t1, t0, x0 # Set t1 to 1 if t0 < 0
beq t1, x0, ElseIf # Goto ElseIf statement if t1 = 0
j Endif # End If statement
ElseIf: sgt t4, t3, x0 # Set t4 to 1 if t3 > 0
beq t4, x0, Else # Goto Else statement if t4 = 0
j Endif # End "Else" statement
Else: seqz t5, t4, x0 # Set t5 to 1 if t4 = 0
Endif: j Endif # End of If-ElseIf-Else conditional
statements
start:
andi t0, t0, 0 # Clear register t0
andi t1, t1, 0 # Clear register t1
andi t2, t2, 0 # Clear register t2
andi t3, t3, 0 # Clear register t3
andi t4, t4, 0 # Clear register t4
li t0, 100 # Load t0 with a value
li t1, 200 # Load t1 with a value
If: beq t0, t1, Else # Goto Else if t0 = t1
120
While Loop
start:
andi t0, t0, 0 # Clearing contents of register t0
# Functions as index "i" for the loop
andi t1, t1, 0 # Clearing contents of register t1
# Holds value to compare index with
andi t2, t2, 0 # Clearing contents of register t2
# Functions as variable "sum"
li t1, 100 # Load t1 with value 100
loop: add t2, t2, t0 # Sum = Sum+i
addi t0, t0, 1 # Increment index "i"
blt t0, t1, loop # Iterate if t0<t1
End: j End # End of WHILE loop
For Loop
start:
andi t0, t0, 0 # Clear register t0
# Functions as index "i" for the loop
andi t1, t1, 0 # Clearing contents of register t1
# Functions as variable "sum"
Switch Case
start:
# Clearing/Initializing contents of
five registers to 0
mv a0, x0
mv a4, x0
mv a5, x0
mv a7, x0
mv t3, x0
121
9.2.5 Exercises
[Link] A Program to find the number of even and odd elements in an array
start:
.data # Data for the program
Array: .byte 12,19,45,69,98,23 # Array of even and odd numbers
.text # Code section of the program
andi t0, t0, 0 # Even number count
andi t1, t1, 0 # Odd number count
andi t2, t2, 0 # Holds the address and elements of the
Array
andi t3, t3, 0 # For loop index i
andi t4, t4, 0 # Holds size of Array
andi t5, t5, 0 # Holds value to divide Array numbers with,
to determine even or odd
li t4, 6 # Size of array
li t5, 2 # Value to divide array elements with
FOR loop: bge t3, t4, END # Condition to control loop iterations
la t2, Array # Load address of Array
add t2, t2, t3 # Increment Array index
lb t2, 0(t2) # Load an element from the Array
rem t2, t2, t5 # Divide the Array element by t5 and store
remainder in t2
ELSE:
addi t1, t1, 1 # Increment odd number count
addi t3, t3, 1 # Increment index i
j FOR loop # Iterate FOR loop
start:
andi t0, t0, 0 # Even number count
andi t1, t1, 0 # Odd number count
andi t2, t2, 0 # Holds the address and elements of the
Array
andi t3, t3, 0 # For loop index i
andi t4, t4, 0 # Holds size of Array
andi t5, t5, 0 # Holds value to divide Array numbers with,
to determine even or odd
FOR loop: bge t3, t4, END # Condition to control loop iterations
la t2, Array # Load address of Array
add t2, t2, t3 # Increment Array index
lb t2, 0(t2) # Load an element from the Array
and t2, t2, 1 # Mask t2 with 1 to check whether LSB is 1
or not
[Link] Program to find the Fibonacci series for a specified range, without recursion
start:
andi t0, t0, 0 # Will hold address for an array
andi t1, t1, 0 # Number of elements in the series
andi t2, t2, 0 # First number in the series
andi t3, t3, 0 # Second number in the series
andi t4, t4, 0 # Third number in the series
andi t5, t5, 0 # Variable to control loop
li t0, 0x10010 # Setting an address to store
elements in array
li t1, 7 # Number terms required in the series
li t2, 0 # Load first element in the series
li t3, 1 # Load second element in the series
li t5, 1 # Initializing loop index
sb t2, 0(t0)
124
sb t3, 1(t0)
start:
.data # Data section of bubble-sort program
Array: .byte 6,7,3,2,9,8 # Array of unsorted data
Arraysize: .byte 6 # Defining size of array
.text # Commands section of the program
andi t0, t0, 0 # Clear contents of register t0; Holds
array location
andi t1, t1, 0 # Clear contents of register t1; Holds
index of inner FOR loop
andi t3, t3, 0 # Clear contents of register t3; Holds
content of current array location
andi t4, t4, 0 # Clear contents of register t4
andi t5, t5, 0 # Clear contents of register t5; Holds
content of adjacent array location
andi t6, t6, 0 # Clear contents of register t6; Acts as
temporary variable during swaps
la t0, Array # Load address where unsorted Array is
stored
la t1, Arraysize # Load address where size of array is
stored
lb t1, 0(t1) # Load a number from the array
addi t1, t1, -1 # Number of swaps to be made
andi x1, x1, 0 # Clear contents of x1
outerloop: # Outer FOR loop
bge x0, t1, outerend # Jump to end if t1=0
andi t2, 0 # Clear contents of register t2
innerloop: # Inner FOR loop
bge t2, t1, innerend # Jump to end of inner FOR loop if t2=t1
lb t3, 0(t0) # Load the first number from unsorted array
to t3
lb t5, 1(t0) # Load the second number from unsorted
array to t5
125
start:
andi t0, t0, 0 # Address of array to be sorted
andi t1, t1, 0 # Number of elements in array
andi t2, t2, 0 # Variable to hold minimum value
during comparison with array elements
andi t3, t3, 0 # Position of minimum value in array
andi t4, t4, 0 # Temporary variable
andi t5, t5, 0 # Outer FOR loop Counter i
andi t6, t6, 0 # Inner FOR loop counter j
INNER FOR LOOP: bgt t6, t1, SWAP # GoTo swap, if condition true
start:
# Initializing registers
mv t0, x0
mv t1, x0
mv t2, x0
mv t3, x0
mv t4, x0
mv t5, x0
mv t6, x0
For Loop: la t0, nums size # Load t0 with unsorted array size
lw t1, 0(t0) # Load t1 with value in 0 offset of
t0
lw t2, 4(t0) # Load t2 with value in 4 offset of
t0
addiw t1, t1, 4 # Add a constant value to t1
sw t1, 0(t0) # Store t1 value to t0
# With an offset 0 of t0
bgt t1, t2, End # GoTo End if t1 value > t2 value
la t2, nums # Load array address to t2
addw t2, t2, t1 # Add t1 with t2 and store answer in
t2
127
start:
.data
Array: .byte 1,2,3,4,5,6,7,8,9,10
.text
andi t0, t0, 0 # Holds sorted Array
andi t1, t1, 0 # Holds the ’low’ value
128
ELSE:
add t3, t1, t2
div t3, t3, t6
la t0, Array
add t0, t0, t3
lb t0, 0(t0)
find key if:
bne t4, t0, find key if else
j END
a. Without Recursion
start:
la x5, data1 # Load data section address to x5
lwu a0, 0(x5) # Load a0 with number "n" to
calculate its factorial
addi a4, x0, 1 # Initialize a4 to 1, a4 will keep
track of the calculated factorial
addi a5, x0, 1 # Initialize "index" a5 to 1, used in
FOR loop
129
FOR LOOP: bgt a5, a0, End # GoTo "End" if "index" greater than
"n"
mul a4, a4, a5 # Multiply a4 and a5, store answer in
a4
addi a5, a5, 1 # Increment "index" by 1
j FOR LOOP # Iterate
b. With Recursion
start:
la x5, data1 # Load data section address to x5
lwu sp, 0(x5) # Set sp to address specified in
first 4 bytes of x5
# Initializing four registers to zero
mv a0, x0
mv a4, x0
mv a5, x0
mv a7, x0
lw a0, 4(x5) # Load a0 with data from second 4
bytes of x5
jal ra, fact # Store address of recursive function
in ra
mv a7, a0 # Move answer from a0 to a7
sw a7, 8(x5) # Store answer in third 4 byte slot
of address present in x5
ebreak #
j start # Loop back to start
fact:
addi sp, sp, -32 # Allocate 4 locations each of size 2
words
sd ra, 24(sp) # Store return address(ra) to
Memory[24+sp]
sd s0, 16(sp) # Store contents of s0 to
Memory[16+sp]
addi s0, sp, 32 # Making s0 as frame pointer
mv a5, a0 # Move a0 contents to a5
sw a5, -20(s0) # Store a copy of a5 to onto stack at
location = Memory[s0-20]
beqz a5, J1 # Branch to Function J1 if a5 is 0
addiw a5, a5, -1 # Decrement a5 by 1
130
mv a0, a5 # Move a5 to a0
jal ra, fact # Update return address(ra) to
recursive function
mv a4, a0 # Move a0 temporarily to a4
lw a5, -20(s0) # Load a5 with data in Memory[s0-20]
mul a5, a5, a4 # Multiply a5 and a4, store answer in
a5
mv a0, a5 # Move a5 to a0, as return value
ld ra, 24(sp) # Move up the stack, update return
address(ra) with address stored in
Memory[24+sp]
ld s0, 16(sp) # Update frame pointer
addi sp, sp, 32 # Reduce stack height
ret # Return to function
J1:
addi a0, x0, 1 # Initialize a0 to 1
# Prepare to pop values from
stack, update respective registers
accordingly and reduce stack height
ld ra, 24(sp)
ld s0, 16(sp)
addi sp, sp, 32
start:
# Shift right arithmetic immediate -
Shifting X0 right by 1 bit and store it
to x17
srai x17, x0,1
srai x12, x0,1
srai x10, x0,1
srai x15, x0,1
srai x6, x0,1
# Adding constant to source register and
saving it in destination register
addi x10, x10, 1
addi x12, x10, 13
131
start:
# Shift right arithmetic immediate -
Shifting X0 right by 1 bit and store it
to x17
srai x17, x0,1
srai x12, x0,1
srai x10, x0,1
srai x15, x0,1
srai x6, x0,1
# Adding constant to source register and
saving it in destination register
addi x10, x10, 1
addi x12, x10, 13
addi x17, x10, 64
# Loading constants from data section
la x15, data1 # Store data1 location to x15
addi x17,x0, 0x10 # Comparing register for end of loop
addi x14,x0, 0x0 # Index
# Instruction to cause load access fault
la x13, start
ld x16,-16 (x13)
loop: lw x16, 0(x15) # Load value from x15 pointing location to
x16 register
addi x15, x15, 0x04 # GoTo next location
132
start:
# Shift right arithmetic immediate -
Shifting X0 right by 1 bit and store it
to x17
srai x17, x0,1
srai x12, x0,1
srai x10, x0,1
srai x15, x0,1
srai x6, x0,1
# Adding constant to source register and
saving it in destination register
addi x10, x10, 1
addi x12, x10, 13
addi x17, x10, 64
# Loading constants from data section
la x15, data1 # Store data1 location to x15
addi x17,x0, 0x10 # Comparing register for end of loop
addi x14,x0, 0x0 # Index
loop: lw x16, 0(x15) # Load value from x15 pointing location to
x16 register
addi x15, x15, 0x04 # GoTo next location
addi x14, x14, 0x04
bne x14,x17,loop # Check for equality
sw x17, 0x60(x15) # Store x17 value to x15+0x60 location
lw x12, 0x60(x15) # Load x15+0x60 location value to x12
bnez x10, start # GoTo start of the program if x10 value is
not NULL
start:
# Shift right arithmetic immediate -
Shifting X0 right by 1 bit and store it
to x17
srai x17, x0,1
srai x12, x0,1
srai x10, x0,1
srai x15, x0,1
srai x6, x0,1
# Adding constant to source register and
saving it in destination register
addi x10, x10, 1
addi x12, x10, 13
addi x17, x10, 64
# Loading constants from data section
la x15, data1 # Store data1 location to x15
addi x17,x0, 0x10 # Comparing register for end of loop
addi x14,x0, 0x0 # Index
# Instruction to cause store access fault
la x13, start
sd x17,-16 (x13)
loop: lw x16, 0(x15) # Load value from x15 pointing location to
x16 register
addi x15, x15, 0x04 # GoTo next location
addi x14, x14, 0x04
bne x14,x17,loop # Check for equality
sw x17, 0x60(x15) # Store x17 value to x15+0x60 location
lw x12, 0x60(x15) # Load x15+0x60 location value to x12
bnez x10, start # GoTo start of the program if x10 value is
not NULL
start:
# Shift right arithmetic immediate -
Shifting X0 right by 1 bit and store it
to x17
srai x17, x0,1
srai x12, x0,1
srai x10, x0,1
srai x15, x0,1
134
[Link] PLIC: A simple code to illustrate the working of PLIC with UART as the peripheral
interrupt: li t0, 8 # t0 ←− 8
csrrs x0, mstatus, t0 # mstatus ←− t0
li t0, 0x800 # t0 ←− 0x800
csrrs x0, mie, t0 # mie ←− t0
csrr s8, mstatus # mstatus ←− s8
andi t1, s8, 8 # t1 ←− (s8 ∧ 8)
bnez t1, uart base addr # If t1 6= 0, GoTo uart base addr
begin:
andi t5, t5, 0 # Clear t5
# t5 ←− (t5 ∧ 0)
andi t6, t6, 0 # Clear t6
# t6 ←− (t6 ∧ 0)
addi t5, t5, 96 # t5 ←− (t5+96)
andi t4, t4, 0 # Clear t4
# t4 ←− (t4 ∧ 0)
addi t4, t4, 2 # t4 ←− (t4+2)
.p2align 2
trap handler: li s3, 0x0c010010
csrr t0, mcause
li t3, 0x10010000
and t0,t0,t3
beqz t0, exception handler
beq t0, t3, interrupt handler
1: ret
.p2align 2
exception handler: csrr t0, mcause
la t1, data1
lw t2, 0(t1)
addi t2, t2, 4
sw t2, 0(t1)
add t1, t1, t2
sw t0, 0(t1)
j 1b
sd x9, 9*8(sp)
sd x10, 10*8(sp)
sd x11, 11*8(sp)
sd x12, 12*8(sp)
sd x13, 13*8(sp)
sd x14, 14*8(sp)
sd x15, 15*8(sp)
sd x16, 16*8(sp)
sd x17, 17*8(sp)
sd x18, 18*8(sp)
sd x19, 19*8(sp)
sd x20, 20*8(sp)
sd x21, 21*8(sp)
sd x22, 22*8(sp)
sd x23, 23*8(sp)
sd x24, 24*8(sp)
sd x25, 25*8(sp)
sd x26, 26*8(sp)
sd x27, 27*8(sp)
sd x28, 28*8(sp)
sd x29, 29*8(sp)
sd x30, 30*8(sp)
sd x31, 31*8(sp)
jal trap handler # Return here after handling
trap
ld x1, 1*8(sp)
ld x2, 2*8(sp)
ld x3, 3*8(sp)
ld x4, 4*8(sp)
ld x5, 5*8(sp)
ld x6, 6*8(sp)
ld x7, 7*8(sp)
ld x8, 8*8(sp)
ld x9, 9*8(sp)
ld x10, 10*8(sp)
ld x11, 11*8(sp)
ld x12, 12*8(sp)
ld x13, 13*8(sp)
ld x14, 14*8(sp)
ld x15, 15*8(sp)
ld x16, 16*8(sp)
ld x17, 17*8(sp)
ld x18, 18*8(sp)
ld x19, 19*8(sp)
ld x20, 20*8(sp)
ld x21, 21*8(sp)
ld x22, 22*8(sp)
ld x23, 23*8(sp)
ld x24, 24*8(sp)
ld x25, 25*8(sp)
ld x26, 26*8(sp)
138
ld x27, 27*8(sp)
ld x28, 28*8(sp)
ld x29, 29*8(sp)
ld x30, 30*8(sp)
ld x31, 31*8(sp)
mret
.p2align 0x2
.section .data
data1:
.word 0
.word 0
.word 0
.word 0