Introduction
In concurrent systems, managing concurrent access to shared memory resources efficiently and safely is of paramount importance. Hazard pointers, like RCU, is a synchronization mechanism that can be used to address this issue. In this post, we will explore how hazard pointers work, provide a simple example to illustrate their usage, and compare them with another popular synchronization technique called Read-Copy-Update (RCU). Additionally, we will dive into the implementation details of hazard pointers, including the per-CPU and per-thread data structures used to maintain them.
Example with a Singly Linked List
Consider a singly linked list that is shared among multiple threads. In this example, some threads may be traversing the list while others may be adding or removing nodes. The main challenge here is to ensure that a node being accessed by a reader is not deleted or modified by a writer simultaneously.
Hazard pointers provide a mechanism to temporarily “protect” a node while it is being accessed. When a thread wants to access a node, it stores the address of the node in a hazard pointer, signaling that the node is in use. Before a writer can remove or modify a node, it must check all hazard pointers to ensure that the node is not currently being accessed by any other thread.
Here’s a step-by-step illustration of how hazard pointers work:
- Thread A starts traversing the list and wants to access node B.
- Thread A stores the address of node B in a hazard pointer, marking it as “in use.”
- Thread B attempts to remove node B from the list but first checks all hazard pointers.
- Thread B sees that node B is in use by Thread A (due to the hazard pointer) and must wait before removing it.
- Thread A finishes accessing node B and clears its hazard pointer.
- Thread B can now safely remove node B from the list.
A reader traversing a list may need to retry
This can happen if the reader steps on a list element that was freed. Let’s go through a simple example step by step to see why this might happen:
Imagine a singly-linked list with three elements: A, B, and C. We have two threads, Thread 0 and Thread 1, which read and modify the list concurrently.
- Thread 0 wants to read element B. To do this safely, it first records a hazard pointer to element B, indicating that it is about to access B and that B should not be deleted while Thread 0 is using it.
- Thread 1 wants to delete element B from the list. It removes B from the list by updating A’s next pointer to point to C instead of B. Thread 1 checks for hazard pointers referencing B and sees that Thread 0 has a hazard pointer to B. Since Thread 0 is still using B, Thread 1 cannot delete it yet.
- Thread 1 now wants to delete element C. It removes C from the list by updating B’s next pointer to be a special value called
HAZPTR_POISON
, marking the deletion. Since no hazard pointers reference C, Thread 1 can immediately delete C. - Thread 0 finishes reading element B and moves on to the next element in the list. It tries to acquire a hazard pointer for B’s successor, which is now marked as
HAZPTR_POISON
due to the deletion of element C. When Thread 0 encounters theHAZPTR_POISON
value, it knows that the list has been modified concurrently and must restart its traversal from the beginning of the list. - Thread 0 starts over, traversing the list again from the beginning. This time, it finds the updated list without elements B and C.
- Once Thread 0 is done using element B and releases its hazard pointer, Thread 1 is free to delete B as there are no longer any hazard pointers referencing it.
Comparing Hazard Pointers with RCU
Both hazard pointers and RCU are synchronization mechanisms that allow for efficient handling of shared resources. However, they differ in several key aspects:
- Read-side performance: RCU typically offers better read-side performance than hazard pointers, as it does not require retries. Hazard pointers, on the other hand, may involve read-side retries due to concurrent modifications, which can slow down the read-side performance.
- Write-side performance: While hazard pointers generally provide better write-side performance than naive reference counting, they may still be slower than RCU. This is because hazard pointers require writers to check all hazard pointers before modifying an object, which can be time-consuming when there are many of them. Furthermore, traversal of per-thread data structures to scan for hazard pointers may result in cache contention when there are lots of readers storing hazard pointers.
- Memory footprint: Hazard pointers have a minimal memory footprint, as any object not currently protected by a hazard pointer can be immediately freed. RCU, on the other hand, may require a larger memory footprint due to its deferred reclamation mechanism.
- Data structure size: Hazard pointers don’t require any additional members in the objects their pointing to. In contract, RCU typically requires an rcu_head structure representing it, and traditional reference counting mechanisms require an integer. This makes Hazard Pointers really useful for objects of small size.
- Grace period: RCU requires a grace period even when there are no readers, which can cause latency issues. With hazard pointers, objects can be freed instantly as soon as they are no longer being used by any thread.
- Reclamation of memory: Hazard pointers allow for the immediate reclamation of any object not currently protected by a hazard pointer. This means that hazard pointers can be freed per-object while not freeing others. However, with RCU, reclamation is independent of specific objects and may require a larger memory footprint due to its deferred reclamation mechanism.
Implementation of Hazard Pointers
Hazard pointers are maintained in per-CPU or per-thread data structures. This design choice allows for improved cache locality, as each thread or CPU only needs to update its own local data structure when accessing shared resources. Consequently, this reduces contention and cache coherency traffic, leading to good read-side performance. However, the write-side performance of hazard pointers may be slower than RCU, as writers need to check all hazard pointers before modifying an object. This can be time-consuming when there are many hazard pointers to check. Furthermore, traversal of per-thread data structures to scan for hazard pointers may result in cache contention when there are many readers storing hazard pointers, leading to additional overhead.
Virtual Reference Counting using Hazard Pointers
Virtual reference counting is a technique used to avoid the overhead of maintaining reference counts for each object in a concurrent system. Instead of incrementing and decrementing object reference counts, threads maintain a list of “virtual” reference counts in the form of hazard pointers.
When a thread wants to access an object, it temporarily protects the object using a hazard pointer. This marks the object as “in use” and guarantees that it won’t be freed or modified by any other thread. By counting the number of hazard pointers in the system, it is possible to determine the “value of the virtual refcount.” The absence of hazard pointers implies that the number of references to an object is zero, which means that the object can be destroyed.
Virtual reference counting eliminates the need to maintain reference counts for each object resulting in the object saving space as a separate refcount need to be stored in the object. This is very useful for small objects.
In summary
Hazard pointers could be an effective synchronization mechanism for managing shared access to objects in concurrent systems. They offer good read-side performance and may offer better memory footprint than RCU due to smaller data structure footprint and quicker memory reclaim. However, they have poor read-side performance and complexity, compared to RCU. On modern systems with ever increasing amounts of memory, RCU makes much more sense to use than HP.