From a0eec52bb88f039b5f8a14603a1609caf3609691 Mon Sep 17 00:00:00 2001 From: "Dustin L. Howett" Date: Fri, 17 Feb 2017 14:42:02 -0800 Subject: [PATCH] fix a mismanagement of the hash table that could lead to data loss This commit fixes a data loss bug in our hopscotch table implementation. Removing values from the table can result in other values becoming disconnected and lost. Let A, B, and C be values that all hash to cell 0. Assume the hopscotch distance factor H = 2. 0 1 2 +-----+-----+-----+ | | | | +-----+-----+-----+ After adding A 0 1 2 +-----+-----+-----+ | A | | | +-----+-----+-----+ | +-Neighbors = After adding B 0 1 2 +-----+-----+-----+ | A | B | | +-----+-----+-----+ | +-Neighbors = 1 After adding C 0 1 2 +-----+-----+-----+ | A | B | C | +-----+-----+-----+ | +-Neighbors = 1, 2 If we then remove B, 0 1 2 +-----+-----+-----+ | A | [X] | C | +-----+-----+-----+ | +-Neighbors = 1, 2 * It is replaced with a placeholder [X]. * A's neighbor table is not updated to reflect the loss. If we then remove A, 0 1 2 +-----+-----+-----+ | [X] | [X] | [C] | +-----+-----+-----+ | +-Neighbors = 2 * The table is rebalanced to promote A's lowest neighbor to the primary cell position. * C from cell 2 remains cell 0's neighbor. The bug manifests if [X] the placeholder value passes the null check set out in MAP_TABLE_VALUE_NULL; that is, the placeholder is "effectively null". Looking up the key that matches C will first evaluate its base cell, the one that collided with the key in the first place. Since that is placeholder [X], and [X] is "effectively null", the lookup stops. C is never retrieved from the hash table. --- The expedient solution to this bug is to update cell 0's neighbors when B is first removed, effectively skipping the hole: If we remove B as above, 0 1 2 +-----+-----+-----+ | A | [X] | C | +-----+-----+-----+ | +-Neighbors = 2 <<< HERE but clear the neighbor bit for cell 1, the promotion that happens when A is later removed places C in cell 0. 0 1 2 +-----+-----+-----+ | C | [X] | [X] | +-----+-----+-----+ | +-Neighbors = --- hash_table.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/hash_table.h b/hash_table.h index aaae772..efa5cf6 100644 --- a/hash_table.h +++ b/hash_table.h @@ -392,6 +392,22 @@ static void PREFIX(_remove)(PREFIX(_table) *table, void *key) MAP_LOCK(); PREFIX(_table_cell) cell = PREFIX(_table_get_cell)(table, key); if (NULL == cell) { return; } + + uint32_t hash = MAP_TABLE_HASH_KEY(key); + PREFIX(_table_cell) baseCell = PREFIX(_table_lookup)(table, hash); + if (baseCell && baseCell <= cell && cell - baseCell <= 32) + { + uint32_t jump = 1 << (cell - baseCell - 1); + if ((baseCell->secondMaps & jump)) + { + // If we are removing a cell stored adjacent to its base due to hash + // collision, we have to clear the base cell's neighbor bit. + // Otherwise, a later remove can move the new placeholder value to the head + // which will cause further chained lookups to fail. + baseCell->secondMaps &= ~jump; + } + } + // If the cell contains a value, set it to the placeholder and shuffle up // everything if (0 == cell->secondMaps)