哈希算法题目:设计哈希映射(二次探测法实现)
字数 629 2025-10-30 23:46:49
哈希算法题目:设计哈希映射(二次探测法实现)
题目描述
设计一个哈希映射,使用开放地址法解决哈希冲突,但要求使用二次探测作为冲突解决策略。需要实现以下操作:
put(key, value): 插入键值对,如果键已存在则更新值get(key): 返回键对应的值,如果键不存在返回-1remove(key): 删除键值对
哈希表容量应可动态扩容,当负载因子超过0.75时扩容为原来的2倍。
解题过程
步骤1:理解二次探测法
二次探测是开放地址法的一种,当发生哈希冲突时,探测序列为:h(k), h(k)+1², h(k)+2², h(k)+3²...
公式:h(k,i) = (h(k) + i²) % capacity,其中i为探测次数
步骤2:设计数据结构
需要定义:
- 桶数组:存储键值对
- 容量:当前哈希表大小
- 负载因子阈值:0.75
- 特殊标记:区分空桶、已删除桶、有数据桶
class MyHashMap:
def __init__(self):
self.capacity = 8 # 初始容量
self.size = 0
self.load_factor = 0.75
self.buckets = [None] * self.capacity
self.DELETED = object() # 删除标记
步骤3:哈希函数设计
使用简单的取模哈希函数:
def _hash(self, key):
return key % self.capacity
步骤4:查找键的位置
实现一个内部方法,查找键应该插入的位置或当前所在位置:
def _find_index(self, key, for_insert=False):
index = self._hash(key)
i = 0
first_deleted = None
while True:
current_index = (index + i * i) % self.capacity
# 如果遇到空桶
if self.buckets[current_index] is None:
if for_insert and first_deleted is not None:
return first_deleted, True # 可以插入到已删除的位置
return current_index, True # 找到可插入位置
# 如果遇到已删除的桶,记录第一个删除位置
elif self.buckets[current_index] == self.DELETED:
if first_deleted is None:
first_deleted = current_index
# 如果找到键
elif self.buckets[current_index][0] == key:
return current_index, False # 找到现有位置
i += 1
# 防止无限循环(理论上容量为质数时可避免)
if i >= self.capacity:
break
return first_deleted if first_deleted is not None else -1, False
步骤5:实现put操作
def put(self, key, value):
# 检查是否需要扩容
if self.size >= self.capacity * self.load_factor:
self._resize()
index, is_new = self._find_index(key, for_insert=True)
if index == -1:
self._resize()
return self.put(key, value) # 扩容后重试
if is_new or self.buckets[index] == self.DELETED:
self.size += 1
self.buckets[index] = (key, value)
步骤6:实现get操作
def get(self, key):
index, is_new = self._find_index(key, for_insert=False)
if index != -1 and self.buckets[index] not in (None, self.DELETED):
return self.buckets[index][1]
return -1
步骤7:实现remove操作
def remove(self, key):
index, is_new = self._find_index(key, for_insert=False)
if index != -1 and self.buckets[index] not in (None, self.DELETED):
self.buckets[index] = self.DELETED
self.size -= 1
return True
return False
步骤8:实现动态扩容
def _resize(self):
old_buckets = self.buckets
self.capacity *= 2
self.buckets = [None] * self.capacity
self.size = 0
for item in old_buckets:
if item and item != self.DELETED:
self.put(item[0], item[1])
步骤9:完整代码实现
class MyHashMap:
def __init__(self):
self.capacity = 8
self.size = 0
self.load_factor = 0.75
self.buckets = [None] * self.capacity
self.DELETED = object()
def _hash(self, key):
return key % self.capacity
def _find_index(self, key, for_insert=False):
index = self._hash(key)
i = 0
first_deleted = None
while i < self.capacity:
current_index = (index + i * i) % self.capacity
if self.buckets[current_index] is None:
if for_insert and first_deleted is not None:
return first_deleted, True
return current_index, True
elif self.buckets[current_index] == self.DELETED:
if first_deleted is None:
first_deleted = current_index
elif self.buckets[current_index][0] == key:
return current_index, False
i += 1
return first_deleted if first_deleted is not None else -1, False
def put(self, key, value):
if self.size >= self.capacity * self.load_factor:
self._resize()
index, is_new = self._find_index(key, for_insert=True)
if index == -1:
self._resize()
return self.put(key, value)
if is_new or self.buckets[index] == self.DELETED:
self.size += 1
self.buckets[index] = (key, value)
def get(self, key):
index, is_new = self._find_index(key, for_insert=False)
if index != -1 and self.buckets[index] not in (None, self.DELETED):
return self.buckets[index][1]
return -1
def remove(self, key):
index, is_new = self._find_index(key, for_insert=False)
if index != -1 and self.buckets[index] not in (None, self.DELETED):
self.buckets[index] = self.DELETED
self.size -= 1
return True
return False
def _resize(self):
old_buckets = self.buckets
self.capacity *= 2
self.buckets = [None] * self.capacity
self.size = 0
for item in old_buckets:
if item and item != self.DELETED:
self.put(item[0], item[1])
关键要点总结
- 二次探测使用平方步长避免聚集现象
- 删除操作使用标记法,避免破坏探测序列
- 查找时记录第一个删除位置,提高空间利用率
- 动态扩容保证性能,负载因子控制在0.75以下
- 探测序列要覆盖所有桶位置,避免无限循环