A race condition happens when two or more threads access and change the same resource. As a result, all threads experience data loss or an unexpected state of the resource.
How Does a Race Condition Work
The image below shows two threads (T1, T2) that are both supposed to increment the numerical value of resource A. However, this situation is an example of a race condition since the timing of the reading and writing of both threads results in the value of A being erroneous.
There are two major areas of problems:
- In the beginning, both threads read
Awith a value of 0. Both increment it by 1 and save the new value, but the resulting value ofAis 1 instead of 2. T1received a bit more CPU time and used it a bit more efficiently, and was able to do two increments to 2 and 3 and save them toA.T2was a bit slower, so it was able to increment the value only once and save the result. Thus makingAcontain the value of 2 again.
These are just two examples of discrepancies that can happen with just two threads and a primitive A resource.
If A was an object, it may be the case that T1 would use it and destroy it by setting it equal to null while T2 was about to use it, etc.
You can easily imagine an infinite amount of different cases if you mix in more threads, resources, and time delays between operations.
Race Condition in Code
Here's the race condition written in code:
package scratch.racecondition;
import java.util.Random;
public class RaceCondition {
public static void main(String[] args) throws InterruptedException {
// The resource to be incremented
// Should be an object, since you want it to be sharable as a reference
Integer counter = 0;
// Create an instance that would be executed in parallel threads
Incrementer incrementer = new Incrementer(counter);
// # of Parallel thread that will update the counter
int parallelThreads = 10;
// An array to keep all the threads
Thread[] threads = new Thread[parallelThreads];
// Create all thread
for (int i = 0; i < parallelThreads; i++) {
threads[i] = new Thread(incrementer);
}
// Start all threads with some random delay
for (Thread thread : threads) {
try {
Thread.sleep(100 + new Random().nextInt(100));
} catch (InterruptedException e) {
e.printStackTrace();
}
thread.start();
}
// Ensure all threads finish
for (Thread thread : threads) {
thread.join();
}
System.out.println("Done. Final counter: " + counter);
}
}
class Incrementer implements Runnable {
// A reference to resource to Increment
Integer counter;
public Incrementer(Integer counter) {
this.counter = counter;
}
@Override
public void run() {
// Read current value from the shared resource
Integer counterValue = counter;
try {
// Sleep symbolizes some work that the thread should be doing with some randomness
Thread.sleep(100 + new Random().nextInt(100));
} catch (InterruptedException e) {
e.printStackTrace();
}
// Increase counter
counter = counterValue + 1;
System.out.println(Thread.currentThread().getName() + " increments value: " + counterValue + " -> " + counter);
}
}
Does a Race Condition Create Problems?
Several serious issues can be created with race conditions.
Data loss
The thread that comes last typically overwrites any other results. The example above has a simple resource with an integer variable, but what if it was a user object or a bank transaction?
Hard to detect
There are typically no errors in logs or in the system health dashboards, so investigation most often starts with a ticket from a user (most likely after several tickets) that the system is missing data or is in the wrong state. A reconciliation process can sometimes reveal discrepancies, but reconciliation is rare in production environments.
Randomness
For a race condition to take effect in a complicated system, the time and events should align in a perfect storm, which is rare.
How to Debug and Prevent a Race Condition
Race Conditions are probably even harder to troubleshoot than anything else due to their temporal nature. A developer has to replicate the exact condition with the exact timings of operations.
Steps to Debug
- Add debugging logs to strategic places of the system so that the next time a ticket appears, a developer has more information to work with. Most likely, this step will be repeated several times to narrow down the code surface to investigate.
- Carefully read the code with specific attention to resources affected by data corruption.
- Add sleep delays to replicate the temporal aspect of the problem.
Steps to Prevent
Again, there is no proven recipe, but the next collection of techniques can help prevent the race condition:
- Make sure thread code is independent and encapsulated within the thread, and don’t share any resources or the resource in question.
- If you must share resources, make sure the shareable resources are read-only, so no synchronization is required.
- Try optimistic locking.
Summary: What is a Race Condition
- A race condition is when two threads change the value of a resource at different times
- Race conditions can result in data loss
- Race conditions are hard to detect, debug, and prevent
- Race conditions are often detected after receiving several tickets with complaints from the users
Steps to Debug
- Add debugging logs
- Apply attention while coding to resources affected by data corruption
- Add sleep delays to replicate the problem
Steps to Prevent
- Make thread code independent and encapsulated
- Make shareable resources read-only
- Try optimistic locking