Universality of Consensus
Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
Turing Computability
0 1 1 0 1 0
A mathematical model of computation Computable = Computable on a T-Machine
Art of Multiprocessor Programming 2
Shared-Memory Computability
cache cache cache
shared memory
Model of asynchronous concurrent computation Computable = Wait-free/Lock-free computable on a multiprocessor
Art of Multiprocessor Programming
Consensus Hierarchy
1 Read/Write Registers, Snapshots 2 getAndSet, getAndIncrement, . . .
compareAndSet,
Art of Multiprocessor Programming 4
Who Implements Whom?
no
no
1 Read/Write Registers, Snapshots 2 getAndSet, getAndIncrement, . . .
no
compareAndSet,
Art of Multiprocessor Programming 5
Hypothesis
yes?
yes? yes?
1 Read/Write Registers, Snapshots 2 getAndSet, getAndIncrement, . . .
compareAndSet,
Art of Multiprocessor Programming 6
Theorem: Universality
Consensus is universal From n-thread consensus build a
Wait-free Linearizable n-threaded implementation Of any sequentially specified object
Art of Multiprocessor Programming
Proof Outline
A universal construction
From n-consensus objects And atomic registers
Any wait-free linearizable object
Not a practical construction But we know where to start looking
Art of Multiprocessor Programming
Like a Turing Machine
This construction
Illustrates what needs to be done Optimization fodder
Correctness, not efficiency
Why does it work? (Asks the scientist) How does it work? (Asks the engineer)
Would you like fries with that? (Asks the liberal arts major)
Art of Multiprocessor Programming
A Generic Sequential Object
public interface SeqObject { public abstract Response apply(Invocation invoc); }
Art of Multiprocessor Programming
10
A Generic Sequential Object
public interface SeqObject { public abstract Response apply(Invocation invoc); }
Push:5, Pop:null
Art of Multiprocessor Programming
11
Invocation
public class Invoc { public String method; public Object[] args; }
Art of Multiprocessor Programming
12
Invocation
public class Invoc { public String method; public Object[] args; }
Method name
Art of Multiprocessor Programming
13
Invocation
public class Invoc { public String method; public Object[] args; }
Arguments
Art of Multiprocessor Programming
14
A Generic Sequential Object
public interface SeqObject { public abstract Response apply(Invocation invoc); }
Art of Multiprocessor Programming
15
A Generic Sequential Object
public interface SeqObject { public abstract Response apply(Invocation invoc); }
OK, 4
Art of Multiprocessor Programming 16
Response
public class Response { public Object value; }
Art of Multiprocessor Programming
17
Response
public class Response { public Object value; }
Return value
Art of Multiprocessor Programming
18
Universal Concurrent Object
public interface SeqObject { public abstract Response apply(Invocation invoc); }
A universal concurrent object is linearizable to the generic sequential object
Art of Multiprocessor Programming 19
Start with Lock-Free Universal Construction
First Lock-free: infinitely often some method call finishes.
Then Wait-Free: each method call takes a finite number of steps to finish
Art of Multiprocessor Programming 20
Nave Idea
Consensus object stores reference to cell with current state Each thread creates new cell
computes outcome, tries to switch pointer to its outcome
Sadly, no
consensus objects can be used once only
Art of Multiprocessor Programming 21
Nave Idea
enq deq
Art of Multiprocessor Programming
22
Nave Idea
Concurrent Object
enq
Decide which to apply using consensus
head
deq ?
Art of Multiprocessor Programming 23
Queue based consensus public T decide(T value) { propose(value); Ball ball = [Link](); if (ball == [Link]) return proposed[i]; else return proposed[1-i]; }
Once is not Enough?
Solved one-shot 2-consensus. Not clear how to reuse or reread
Art of Multiprocessor Programming 24
Improved Idea: Linked-List Representation
Each node contains a fresh consensus object used to decide on next operation
deq
tail
enq
enq
enq
25
Art of Multiprocessor Programming
Universal Construction
Object represented as
Initial Object State A Log: a linked list of the method calls
Art of Multiprocessor Programming
26
Universal Construction
Object represented as
Initial Object State A Log: a linked list of the method calls
New method call
Find end of list Atomically append call Compute response by replaying log
Art of Multiprocessor Programming 27
Basic Idea
Use one-time consensus object to decide next pointer All threads update actual next pointer based on decision
Challenges
OK because they all write the same value
Lock-free means we need to worry what happens if a thread stops in the middle
Art of Multiprocessor Programming 28
Basic Data Structures
public class Node implements [Link] { public Invoc invoc; public Consensus<Node> decideNext; public Node next; public int seq; public Node(Invoc invoc) { invoc = invoc; decideNext = new Consensus<Node>() seq = 0; }
Art of Multiprocessor Programming 29
Basic Data Structures
public class Node implements [Link] { public Invoc invoc; public Consensus<Node> decideNext; public Node next; public int seq; public Node(Invoc invoc) Standard interface for { class whose invoc = invoc; objects are totally ordered decideNext = new Consensus<Node>() seq = 0; }
Art of Multiprocessor Programming 30
Basic Data Structures
public class Node implements [Link] { public Invoc invoc; public Consensus<Node> decideNext; public Node next; public int seq; public Node(Invoc invoc) { the invocation invoc = invoc; decideNext = new Consensus<Node>() seq = 0; }
Art of Multiprocessor Programming 31
Basic Data Structures
public class Node implements [Link] { public Invoc invoc; public Consensus<Node> decideNext; public Node next; public int seq; public Node(Invoc invoc) { invoc = invoc; Decide on next node decideNext = new Consensus<Node>() (next method applied to object) seq = 0; }
Art of Multiprocessor Programming 32
Basic Data Structures
public class Node implements [Link] { public Invoc invoc; public Consensus<Node> decideNext; public Node next; public int seq; public Node(Invoc invoc) { invoc = invoc; Traversable to next node decideNext =pointer new Consensus<Node>() (needed because you cannot seq = 0; } repeatedly read a consensus object)
Art of Multiprocessor Programming
33
Basic Data Structures
public class Node implements [Link] { public Invoc invoc; public Consensus<Node> decideNext; public Node next; public int seq; public Node(Invoc invoc) { invoc = invoc; decideNext = new Consensus<Node>() seq = 0; Seq number }
Art of Multiprocessor Programming 34
Basic Data Structures
public class Node implements Create new node for this method [Link] { invocation public Invoc invoc; public Consensus<Node> decideNext; public Node next; public int seq; public Node(Invoc invoc) { invoc = invoc; decideNext = new Consensus<Node>() seq = 0; }
Art of Multiprocessor Programming 35
Seq number, Invoc
Universal Object
next
node
tail
decideNext (Consensus Object)
3
head
Ptr to cell w/highest Seq Num
Art of Multiprocessor Programming 36
Universal Object
node
tail
3
head
All threads repeatedly modify headback to where we started?
37
Art of Multiprocessor Programming
The Solution
node tail
Threads find head by finding Max of nodes pointed to by head array
1
head
Make head an array
Thread i updates location i
Art of Multiprocessor Programming
Ptr to node at front
38
Universal Object
public class Universal { private Node[] head; private Node tail = new Node(); [Link] = 1; for (int j=0; j < n; j++){ head[j] = tail }
Art of Multiprocessor Programming
39
Universal Object
public class Universal { private Node[] head; private Node tail = new Node(); [Link] = 1; for (int j=0; j < n; j++){ head[j] = tail }
Head Pointers Array
Art of Multiprocessor Programming
40
Universal Object
public class Universal { private Node[] head; private Node tail = new Node(); [Link] = 1; for (int j=0; j < n; j++){ head[j] = tail }
Tail is a sentinel node with sequence number 1
Art of Multiprocessor Programming 41
Universal Object
public class Universal { private Node[] head; private Node tail = new Node(); [Link] = 1; for (int j=0; j < n; j++){ head[j] = tail }
Initially head points to tail
Art of Multiprocessor Programming
42
Find Max Head Value
public static Node max(Node[] array) { Node max = array[0]; for (int i = 1; i < [Link]; i++) if ([Link] < array[i].seq) max = array[i]; return max; }
Art of Multiprocessor Programming
43
Find Max Head Value
public static Node max(Node[] array) { Node max = array[0]; for (int i = 0; i < [Link]; i++) if ([Link] < array[i].seq) max = array[i]; return max; }
Traverse the array
Art of Multiprocessor Programming
44
Find Max Head Value
public static Node max(Node[] array) { Node max = array[0]; for (int i = 0; i < [Link]; i++) if ([Link] < array[i].seq) max = array[i]; return max; }
Compare the seq nums of nodes pointed to by the array
Art of Multiprocessor Programming 45
Find Max Head Value
public static Node max(Node[] array) { Node max = array[0]; for (int i = 0; i < [Link]; i++) if ([Link] < array[i].seq) max = array[i]; return max; }
Return node with maximal sequence number
Art of Multiprocessor Programming 46
Universal Application Part I
public Response apply(Invoc invoc) { int i = [Link](); Node prefer = new node(invoc); while ([Link] == 0) { Node before = [Link](head); Node after = [Link](prefer); [Link] = after; [Link] = [Link] + 1; head[i] = after; }
Art of Multiprocessor Programming 47
Universal Application Part I
public Response apply(Invoc invoc) { int i = [Link](); Node prefer = new node(invoc); while ([Link] == 0) { Node before = [Link](head); Node after = [Link](prefer); [Link] = after; [Link] = [Link] + 1; head[i] = have after; Apply will invocation as input } return the appropriate response and
Art of Multiprocessor Programming 48
Universal Application Part I
public Response apply(Invoc invoc) { int i = [Link](); Node prefer = new node(invoc); while ([Link] == 0) { Node before = [Link](head); Node after = [Link](prefer); [Link] = after; [Link] = [Link] + 1; head[i] = after; My id }
Art of Multiprocessor Programming 49
Universal Application Part I
public Response apply(Invoc invoc) { int i = [Link](); Node prefer = new node(invoc); while ([Link] == 0) { Node before = [Link](head); Node after = [Link](prefer); [Link] = after; [Link] = [Link] + 1; head[i] = after; } My method call
Art of Multiprocessor Programming 50
Universal Application Part I
public Response apply(Invoc invoc) { int i = [Link](); Node prefer = new node(invoc); while ([Link] == 0) { Node before = [Link](head); Node after = [Link](prefer); As long as I have not been [Link] = after; [Link] = threaded [Link] + 1; into list head[i] = after; }
Art of Multiprocessor Programming 51
Universal Application Part I
public Response apply(Invoc invoc) { int i = [Link](); Node prefer = new node(invoc); while ([Link] == 0) { Node before = [Link](head); Node after = [Link](prefer); [Link] = after; Head of list to which we [Link] = [Link] + 1; will try to append head[i] = after; }
Art of Multiprocessor Programming 52
Universal Application Part I
public Response apply(Invoc invoc) { int i = [Link](); Node prefer = new node(invoc); while ([Link] == 0) { Node before = [Link](head); Node after = [Link](prefer); [Link] = after; [Link] = [Link] + 1; head[i] = after; Propose next node } }
Art of Multiprocessor Programming 53
Universal Application
public Response apply(Invoc invoc) { int i = [Link](); Node prefer = new node(invoc); Set next field to consensus while ([Link] == 0) { winner Node before = [Link](head); Node after = [Link](prefer); [Link] = after; [Link] = [Link] + 1; head[i] = after; } }
Art of Multiprocessor Programming 54
Universal Application Part I
public Response apply(Invoc invoc) { int i = [Link](); Node prefer = new node(invoc); while ([Link] == 0) { Set seq number Node before = [Link](head); (indicating node was appended) Node after = [Link](prefer); [Link] = after; [Link] = [Link] + 1; head[i] = after; }
Art of Multiprocessor Programming 55
Universal Application Part I
public Response apply(Invoc invoc) { add to head array so new int i = [Link](); head will be found Node prefer = new node(invoc); while ([Link] == 0) { Node before = [Link](head); Node after = [Link](prefer); [Link] = after; [Link] = [Link] + 1; head[i] = after; }
Art of Multiprocessor Programming 56
Part 2 Compute Response
Reds method call
tail
null
enq( )
enq( )
deq()
Private copy of object
Art of Multiprocessor Programming
Return
57
Universal Application Part 2
... //compute my response SeqObject MyObject = new SeqObject(); current = [Link]; while (current != prefer){ [Link]([Link]); current = [Link]; } return [Link]([Link]); }
Art of Multiprocessor Programming
58
Universal Application Part II
... //compute my response SeqObject MyObject = new SeqObject(); current = [Link]; while (current != prefer){ [Link]([Link]); current = [Link]; } Compute result by sequentially applying return [Link]([Link]); } method calls in list to a private copy
Art of Multiprocessor Programming
59
Universal Application Part II
... //compute my response SeqObject MyObject = new SeqObject(); current = [Link]; while (current != prefer){ [Link]([Link]); current = [Link]; } return [Link]([Link]); }
Start with copy of sequential object
Art of Multiprocessor Programming 60
Universal Application Part II
... //compute my response SeqObject MyObject = new SeqObject(); current = [Link]; while (current != prefer){ [Link]([Link]); current = [Link]; } return [Link]([Link]); }
new method call appended after tail
Art of Multiprocessor Programming 61
Universal Application Part II
... //compute my response SeqObject MyObject = new SeqObject(); current = [Link]; while (current != prefer){ [Link]([Link]); current = [Link]; } return [Link]([Link]); }
While my method call not linked
Art of Multiprocessor Programming 62
Universal Application Part II
... //compute my response SeqObject MyObject = new SeqObject(); current = [Link]; while (current != prefer){ [Link]([Link]); current = [Link]; } return [Link]([Link]); }
Apply current nodes method
Art of Multiprocessor Programming 63
Universal Application Part II
... //compute my response SeqObject MyObject = new SeqObject(); current = [Link]; while (current != prefer){ [Link]([Link]); current = [Link]; } return [Link]([Link]); }
Return result after my method call applied
Art of Multiprocessor Programming 64
Correctness
List defines linearized sequential history Thread returns its response based on list order
Art of Multiprocessor Programming
65
Lock-freedom
Lock-free because
A thread moves forward in list Can repeatedly fail to win consensus on real head only if another succeeds Consensus winner adds node and completes within a finite number of steps
Art of Multiprocessor Programming
66
Wait-free Construction
Lock-free construction + announce array Stores (pointer to) node in announce
If a thread doesnt append its node Another thread will see it in array and help append it
Art of Multiprocessor Programming
67
Helping
Announcing my intention
Guarantees progress Even if the scheduler hates me My method call will complete
Makes protocol wait-free Otherwise starvation possible
Art of Multiprocessor Programming
68
Wait-free Construction
announce
Ptr to cell thread i wants to append
tail
1
head
4
i
Art of Multiprocessor Programming
69
The Announce Array
public class Universal { private Node[] announce; private Node[] head; private Node tail = new node(); [Link] = 1; for (int j=0; j < n; j++){ head[j] = tail; announce[j] = tail };
Art of Multiprocessor Programming
70
The Announce Array
public class Universal { private Node[] announce; private Node[] head; private Node tail = new node(); [Link] = 1; for (int j=0; j < n; j++){ head[j] = tail; announce[j] = tail };
Announce array
Art of Multiprocessor Programming 71
The Announce Array
public class Universal { private Node[] announce; private Node[] head; private Node tail = new node(); [Link] = 1; for (int j=0; j < n; j++){ head[j] = tail; announce[j] = tail };
All entries initially point to tail
Art of Multiprocessor Programming 72
A Cry For Help
public Response apply(Invoc invoc) { int i = [Link](); announce[i] = new Node(invoc); head[i] = [Link](head); while (announce[i].seq == 0) { // while node not appended to list }
Art of Multiprocessor Programming
73
A Cry For Help
public Response apply(Invoc invoc) { int i = [Link](); announce[i] = new Node(invoc); head[i] = [Link](head); while (announce[i].seq == 0) { // while node not appended to list }
Announce new method call, asking help from others
Art of Multiprocessor Programming 74
A Cry For Help
public Response apply(Invoc invoc) { int i = [Link](); announce[i] = new Node(invoc); head[i] = [Link](head); while (announce[i].seq == 0) { // while node not appended to list }
Look for end of list
Art of Multiprocessor Programming 75
A Cry For Help
public Response apply(Invoc invoc) { int i = [Link](); announce[i] = new Node(invoc); head[i] = [Link](head); while (announce[i].seq == 0) { // while node not appended to list }
Main loop, while node not appended (either by me or helper)
Art of Multiprocessor Programming 76
Main Loop
Non-zero sequence # means success Thread keeps helping append nodes Until its own node is appended
Art of Multiprocessor Programming
77
Main Loop
while (announce[i].seq == 0) { Node before = head[i]; Node help = announce[([Link] + 1 % n)]; if ([Link] == 0) prefer = help; else prefer = announce[i];
Art of Multiprocessor Programming
78
Main Loop
while (announce[i].seq == 0) { Node before = head[i]; Node help = announce[([Link] + 1 % n)]; if ([Link] == 0) prefer = help; else prefer = until announce[i]; Keep trying my cell gets a sequence number
Art of Multiprocessor Programming
79
Main Loop
while (announce[i].seq == 0) { Node before = head[i]; Node help = announce[([Link] + 1 % n)]; if ([Link] == 0) prefer = help; else prefer = announce[i]; Possible end of list
Art of Multiprocessor Programming
80
Main Loop
while (announce[i].seq == 0) { Node before = head[i]; Node help = announce[([Link] + 1 % n)]; if ([Link] == 0) prefer = help; else prefer = announce[i];
Whom do I help?
Art of Multiprocessor Programming
81
Altruism
Choose a thread to help If that thread needs help
Try to append its node Otherwise append your own
Worst case
Everyone tries to help same pitiful loser Someone succeeds
Art of Multiprocessor Programming 82
Help!
When last node in list has sequence number k All threads check
Whether thread k+1 mod n wants help If so, try to append her node first
Art of Multiprocessor Programming 83
Help!
First time after thread k+1 announces
No guarantees
After n more nodes appended
Everyone sees that thread k+1 wants help Everyone tries to append that node Someone succeeds
Art of Multiprocessor Programming 84
Sliding Window Lemma
After thread A announces its node No more than n other calls
Can start and finish Without appending As node
Art of Multiprocessor Programming
85
So all see and help append 4
Helping
Thread 4: Help me! Max head +1 = N+4
announce
4
tail
N+2 N+3
head
Art of Multiprocessor Programming
86
The Sliding Help Window
announce
3 4
tail
Help 3 Help 4
N+2 N+3
head
Art of Multiprocessor Programming
87
Sliding Help Window
while (announce[i].seq == 0) { Node before = head[i]; Node help = announce[([Link] + 1 % n)]; if ([Link] == 0) prefer = help; else prefer = announce[i];
In each main loop iteration pick another thread to help
Art of Multiprocessor Programming
88
Sliding Help Window
Help if help required, but otherwise== its while (announce[i].seq 0)all { about me!
Node before = head[i]; Node help = announce[([Link] + 1 % n)]; if ([Link] == 0) prefer = help; else prefer = announce[i];
Art of Multiprocessor Programming
89
Rest is Same as Lock-free
while (announce[i].seq == 0) { Node after = [Link](prefer); [Link] = after; [Link] = [Link] + 1; head[i] = after; }
Art of Multiprocessor Programming
90
Rest is Same as Lock-free
while (announce[i].seq == 0) { Node after = [Link](prefer); [Link] = after; [Link] = [Link] + 1; head[i] = after; }
Decide next node to be appended
Art of Multiprocessor Programming
91
Rest is Same as Lock-free
while (announce[i].seq == 0) { Update next based on decision Node after = [Link](prefer); [Link] = after; [Link] = [Link] + 1; head[i] = after; }
Art of Multiprocessor Programming
92
Rest is Same as Lock-free
while (announce[i].seq == 0) { Node after = that node is appended Tell world [Link](prefer); [Link] = after; [Link] = [Link] + 1; head[i] = after; }
Art of Multiprocessor Programming
93
Finishing the Job
Once threads node is linked The rest is again the same as in lockfree alg Compute the result by sequentially applying the method calls in the list to a private copy of the object starting from the initial state
Art of Multiprocessor Programming 94
Then Same Part II
... //compute my response SeqObject MyObject = new SeqObject(); current = [Link]; while (current != announce[i]){ [Link]([Link]); current = [Link]; } return [Link]([Link]); }
Art of Multiprocessor Programming
95
Universal Application Part II
... //compute my response SeqObject MyObject = new SeqObject(); current = [Link]; while (current != prefer){ Return result after applying my method [Link]([Link]); current = [Link]; } return [Link]([Link]); }
Art of Multiprocessor Programming
96
Shared-Memory Computability
Universal 10011 Object
Wait-free/Lock-free computable = Solving n-consensus
Art of Multiprocessor Programming 97
Swap (getAndSet) not Universal
public class RMWRegister { private int value; public boolean getAndSet(int update) { int prior = [Link]; [Link] = update; return prior; } }
Art of Multiprocessor Programming
98
Swap (getAndSet) not Universal
public class RMWRegister { private int value; public boolean getAndSet(int update) { int prior = [Link]; [Link] = update; return prior; } Consensus number 2 }
Art of Multiprocessor Programming
99
Swap (getAndSet) not Universal
public class RMWRegister { private int value; public boolean getAndSet(int update) { int prior = [Link]; [Link] = update; return prior; } Not universal for 3 threads }
Art of Multiprocessor Programming
100
CompareAndSet is Universal
public class RMWRegister { private int value; public boolean compareAndSet(int expected, int update) { int prior = [Link]; if ([Link] == expected) { [Link] = update; return true; } return false; }}
Art of Multiprocessor Programming 101
CompareAndSet is Universal
public class RMWRegister { private int value; public boolean compareAndSet(int expected, int update) { int prior = [Link]; if ([Link] == expected) { [Link] = update; return true; } return false; Consensus number }}
Art of Multiprocessor Programming
102
CompareAndSet is Universal
public class RMWRegister { private int value; public boolean compareAndSet(int expected, int update) { int prior = [Link]; if ([Link] == expected) { [Link] = update; return true; } return false; Universal for any number of threads }}
Art of Multiprocessor Programming 103
On Older Architectures
IBM 360
testAndSet (getAndSet)
NYU UltraComputer
getAndAdd (fetchAndAdd)
Neither universal
Except for 2 threads
Art of Multiprocessor Programming
104
On Newer Architectures
Intel x86, Itanium, SPARC
compareAndSet (CAS, CMPXCHG)
Alpha AXP, PowerPC
Load-locked/store-conditional
All universal
For any number of threads
Trend is clear
Art of Multiprocessor Programming 105
Practical Implications
Any architecture that does not provide a universal primitive has inherent limitations You cannot avoid locking for concurrent data structures
Art of Multiprocessor Programming
106
Shared-Memory Computability
Universal 10011 Object
Wait-free/Lock-free computable = Threads with methods that solve nconsensus
Art of Multiprocessor Programming 107
Principles to Practice
We saw how to define concurrent objects We discussed the computational power of our machine instructions Lets use these foundations as a basis for understanding the real world
Art of Multiprocessor Programming
108
This work is licensed under a Creative Commons AttributionNonCommercial-ShareAlike 2.5 License.
You are free: to Share to copy, distribute and transmit the work to Remix to adapt the work Under the following conditions: Attribution. You must attribute the work to The Art of Multiprocessor Programming (but not in any way that suggests that the authors endorse you or your use of the work). Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to [Link] Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights.
Art of Multiprocessor Programming
109