fix a mismanagement of the hash table that could lead to data loss

This commit fixes a data loss bug in our hopscotch table implementation.
Removing values from the table can result in other values becoming
disconnected and lost.

Let A, B, and C be values that all hash to cell 0.
Assume the hopscotch distance factor H = 2.

   0     1     2
+-----+-----+-----+
|     |     |     |
+-----+-----+-----+

After adding A
   0     1     2
+-----+-----+-----+
|  A  |     |     |
+-----+-----+-----+
   |
   +-Neighbors =

After adding B
   0     1     2
+-----+-----+-----+
|  A  |  B  |     |
+-----+-----+-----+
   |
   +-Neighbors = 1

After adding C
   0     1     2
+-----+-----+-----+
|  A  |  B  |  C  |
+-----+-----+-----+
   |
   +-Neighbors = 1, 2

If we then remove B,
   0     1     2
+-----+-----+-----+
|  A  | [X] |  C  |
+-----+-----+-----+
   |
   +-Neighbors = 1, 2

* It is replaced with a placeholder [X].
* A's neighbor table is not updated to reflect the loss.

If we then remove A,
   0     1     2
+-----+-----+-----+
| [X] | [X] | [C] |
+-----+-----+-----+
   |
   +-Neighbors = 2

* The table is rebalanced to promote A's lowest neighbor to the primary
  cell position.
* C from cell 2 remains cell 0's neighbor.

The bug manifests if [X] the placeholder value passes the null check set
out in MAP_TABLE_VALUE_NULL; that is, the placeholder is "effectively
null".

Looking up the key that matches C will first evaluate its base cell, the
one that collided with the key in the first place. Since that is
placeholder [X], and [X] is "effectively null", the lookup stops.

C is never retrieved from the hash table.

---
The expedient solution to this bug is to update cell 0's neighbors when
B is first removed, effectively skipping the hole:

If we remove B as above,
   0     1     2
+-----+-----+-----+
|  A  | [X] |  C  |
+-----+-----+-----+
   |
   +-Neighbors = 2 <<< HERE

but clear the neighbor bit for cell 1, the promotion that happens when A
is later removed places C in cell 0.

   0     1     2
+-----+-----+-----+
|  C  | [X] | [X] |
+-----+-----+-----+
   |
   +-Neighbors =
main
Dustin L. Howett 9 years ago committed by Dustin Howett
parent 14889c540f
commit a0eec52bb8

@ -392,6 +392,22 @@ static void PREFIX(_remove)(PREFIX(_table) *table, void *key)
MAP_LOCK();
PREFIX(_table_cell) cell = PREFIX(_table_get_cell)(table, key);
if (NULL == cell) { return; }
uint32_t hash = MAP_TABLE_HASH_KEY(key);
PREFIX(_table_cell) baseCell = PREFIX(_table_lookup)(table, hash);
if (baseCell && baseCell <= cell && cell - baseCell <= 32)
{
uint32_t jump = 1 << (cell - baseCell - 1);
if ((baseCell->secondMaps & jump))
{
// If we are removing a cell stored adjacent to its base due to hash
// collision, we have to clear the base cell's neighbor bit.
// Otherwise, a later remove can move the new placeholder value to the head
// which will cause further chained lookups to fail.
baseCell->secondMaps &= ~jump;
}
}
// If the cell contains a value, set it to the placeholder and shuffle up
// everything
if (0 == cell->secondMaps)

Loading…
Cancel
Save