EE282: Computer Systems Architecture Problem Set
Stanford University Spring 2025 SOLUTION
Problem 4: Synchronizations & Memory Consistency Model
(12 points)
Exercise 1 (4 points) Hardware Synchronization Primitives
Adapted from Shared-Memory Synchronization by Michael L. Scott
Below shows a shared linked-list stack implemented using CAS to synchronize the top of the stack.
Familiarize yourself with the two operations shown below.
Figure 4: Linked-list stack
Also observe that this snippet is prone to the famous ABA problem: Consider thread 1 and thread
2 share the stack and variable top. Initially top = &A pointing to A as shown in step (a) of figure
below.
Figure 5: Linked-list stack ABA issue.
Suppose thread 1 executes pop(&top) and completes line 6 but has not yet started line 7. Now sup-
pose thread 2 executes and completes pop(&top) followed by push(&top,&B) and push(&top,&A).
At this point, thread 1 resumes, and its CAS will succeed, leaving the stack in broken state as shown
in (c) in the figure above.
a) (2 points) Explain why the issue can be resolved if one use load-reserved/store-conditional
to emulate compare-and-swap (CAS) instruction?
Solution Since these instructions provide atomicity using the address rather than the value,
routines using these instructions are immune to the ABA problem.
b) (2 points) Implement the CAS operation using an LR/SC pair. Suppose CAS takes as input
an address in register x11, an old value in x12, and a new value in x13, and outputs 0/1 on
success/fail in x10 (0 as success, and 1 as fail). LR <rd> <addr> loads the value in rd and
reserves the address addr. SC <rd> <val> <addr> tries to store the value val to addr, and
returns 0 to <rd> if the reservation is valid and the store succeeds, otherwise it sets rd to 1.
11
EE282: Computer Systems Architecture Problem Set
Stanford University Spring 2025 SOLUTION
Any assembly instructions defined in RISC-V can be used, but one will likely find these con-
structs handy: LR/SC, branches, JAL <rd>, label (rd=x0 makes this unconditional jump),
and LI <rs1> <val>.
Solution
The two solutions below are acceptable due to ambiguity of the question as long as one clearly
states one’s interpretation.
(a) Implement the CAS using LR/SC (as an exercise to show di!erent atomic operations
can be implemented using di!erent hardware primitives), although this does not fix the
ABA problem.
1 compare_and_swap:
2 LR x5, x11 # x5 can be any other register
3 BNE x5, x12, fail
4 SC x10, x13, x11 # SC returns 0 if success, 1 if fail
5 JAL x0, ret # some psuedo "JMP label" or "JR" are also ...
acceptable
6 fail:
7 li x10, 1
8 ret:
(b) Modify the code snippet using load-reserved/store-conditional to avoid ABA problem (as
a proof that LR/SC can resolve the ABA problem).
1 # psuedo code/assembly:
2 # As long as one's assembly or code snippts has lr/sc targeting
3 # the top variable and the new value to be set is old->next or null
4 # the answer is acceptable
5
6 pop:
7
8 li t0, (top) # set t0 register to be address of top variable
9 ld t3, 0(t0) # normal load top to get t3 := *top
10 beq zero, t3, ret # if old == null return
11 loop:
12 lr t3, 0(t0) # old (in t3) := *top
13 li t1, (old->next) # new := old->next
14 sc t2, t1, 0(t0) # t2 as destination register
15 # new in t1 is the value to be store upon success
16 # t0 is the register having address
17 bne zero, t2, loop # if SC return fails (0 in t2) then retry
18 ret:
19 mv a0, t3 # move value from t3 to a0 (return register)
12
EE282: Computer Systems Architecture Problem Set
Stanford University Spring 2025 SOLUTION
Exercise 2 (8 points) Memory Consistency Models
Consider the following code segments running on two processors. Assume A and B are initially 0.
1 // P1 1 // P2
2 register1 = A; 2 register2 = B;
3 B = 1; 3 A = 1;
a) (2 points) Assume the segment is executed on processors which adhere to sequential consis-
tency (SC). How many di!erent possible outcomes are there for register1 and register2 at the
end of the segments? List each one of them.
Solution
register1 = 0, register2 = 0.
register1 = 1, register2 = 0.
register1 = 0, register2 = 1.
b) (2 points) Repeat 1, assuming if the processors adhere to the total store order (TSO).
Solution
register1 = 0, register2 = 0.
register1 = 1, register2 = 0.
register1 = 0, register2 = 1.
c) (2 points) Repeat 1, assuming if the processors adhere to the Release Consistency (RC).
Solution
register1 = 0, register2 = 0.
register1 = 0, register2 = 1.
register1 = 1, register2 = 0.
register1 = 1, register2 = 1.
d) (2 points) Recall from lecture that a FENCE can be used to explicitly enforce program order
that may be relaxed by default, according to a processor’s memory consistency model. Please
add minimal number of FENCEs to the program such that the program only has SC outcomes
when executed on processor implementing RC. Assume this FENCE is a full-fence that ensures
all memory operations before the the fence is complete before any memory operations after
the fence.
Solution
The FENCE added in between a read and a write enforce program order for both threads.
1 // P1 1 // P2
2 register1 = A; 2 register2 = B;
3 fence; 3 fence;
4 B = 1; 4 A = 1;
13