哈希算法题目：设计哈希集合（布谷鸟哈希实现）

字数 1124 2025-11-25 09:25:51

哈希算法题目：设计哈希集合（布谷鸟哈希实现）

我将为你详细讲解如何使用布谷鸟哈希（Cuckoo Hashing）技术设计一个哈希集合。这个算法能提供高效的查找性能，并优雅地处理哈希冲突。

题目描述

设计一个哈希集合，支持以下操作：

add(key)：向集合中插入一个元素
remove(key)：从集合中删除一个元素
contains(key)：判断集合中是否包含该元素

要求使用布谷鸟哈希实现，该技术使用两个哈希表和两个哈希函数，通过"踢出"机制解决冲突。

解题过程

第一步：理解布谷鸟哈希的基本原理

布谷鸟哈希的核心思想是：

使用两个哈希表（table1, table2）和两个哈希函数（hash1, hash2）
每个元素有两个可能的存储位置：hash1(key) 或 hash2(key)
插入时，如果一个位置被占用，就"踢出"原有元素，将新元素放入
被踢出的元素会尝试去它的另一个位置，可能继续踢出其他元素

第二步：设计数据结构

class CuckooHashSet:
    def __init__(self, capacity=16):
        self.capacity = capacity
        self.table1 = [None] * capacity  # 第一个哈希表
        self.table2 = [None] * capacity  # 第二个哈希表
        self.load_factor = 0.5  # 负载因子阈值
        self.size = 0  # 当前元素数量

关键参数说明：

capacity：每个哈希表的大小
load_factor：当 size/capacity 超过此值时需要扩容
两个哈希表独立存储元素

第三步：实现哈希函数

def hash1(self, key):
    """第一个哈希函数：简单取模"""
    return hash(key) % self.capacity

def hash2(self, key):
    """第二个哈希函数：使用不同的哈希计算方式"""
    # 使用质数除法来获得不同的分布
    return (hash(key) * 2654435761) % self.capacity

哈希函数设计要点：

两个函数应该产生不同的哈希分布
使用大质数可以改善分布均匀性
实际应用中可以使用更复杂的哈希函数

第四步：实现查找操作

def contains(self, key):
    """检查元素是否存在于集合中"""
    idx1 = self.hash1(key)
    idx2 = self.hash2(key)
    
    # 在两个可能的位置查找
    return (self.table1[idx1] is not None and self.table1[idx1] == key) or \
           (self.table2[idx2] is not None and self.table2[idx2] == key)

查找过程分析：

计算元素在两个表中的可能位置
检查这两个位置是否包含目标元素
时间复杂度：O(1)，因为只检查两个固定位置

第五步：实现插入操作（核心）

def add(self, key):
    """向集合中添加元素"""
    if self.contains(key):
        return  # 元素已存在
    
    # 检查是否需要扩容
    if self.size >= self.capacity * self.load_factor:
        self._resize(2 * self.capacity)
    
    # 尝试插入，最多进行有限次踢出操作
    current = key
    table_index = 0  # 从第一个表开始
    max_evictions = 10  # 最大踢出次数，防止无限循环
    
    for _ in range(max_evictions):
        if table_index == 0:
            # 尝试插入第一个表
            pos = self.hash1(current)
            if self.table1[pos] is None:
                self.table1[pos] = current
                self.size += 1
                return
            else:
                # 踢出现有元素
                self.table1[pos], current = current, self.table1[pos]
                table_index = 1  # 切换到第二个表
        else:
            # 尝试插入第二个表
            pos = self.hash2(current)
            if self.table2[pos] is None:
                self.table2[pos] = current
                self.size += 1
                return
            else:
                # 踢出现有元素
                self.table2[pos], current = current, self.table2[pos]
                table_index = 0  # 切换回第一个表
    
    # 如果达到最大踢出次数，需要重新哈希
    self._rehash()
    self.add(current)  # 重新尝试插入

插入过程详解：

首先检查元素是否已存在
检查负载因子，决定是否需要扩容
尝试将元素插入到它的第一个位置
如果位置被占，踢出原有元素，将新元素放入
对被踢出的元素重复此过程
设置最大循环次数防止无限递归

第六步：实现删除操作

def remove(self, key):
    """从集合中删除元素"""
    idx1 = self.hash1(key)
    idx2 = self.hash2(key)
    
    if self.table1[idx1] is not None and self.table1[idx1] == key:
        self.table1[idx1] = None
        self.size -= 1
        return True
    elif self.table2[idx2] is not None and self.table2[idx2] == key:
        self.table2[idx2] = None
        self.size -= 1
        return True
    
    return False  # 元素不存在

删除过程：

计算元素的两个可能位置
检查这两个位置，如果找到就删除
时间复杂度：O(1)

第七步：实现扩容和重新哈希

def _resize(self, new_capacity):
    """扩容哈希表"""
    old_table1 = self.table1
    old_table2 = self.table2
    old_capacity = self.capacity
    
    self.capacity = new_capacity
    self.table1 = [None] * new_capacity
    self.table2 = [None] * new_capacity
    self.size = 0
    
    # 重新插入所有元素
    for item in old_table1:
        if item is not None:
            self.add(item)
    
    for item in old_table2:
        if item is not None:
            self.add(item)

def _rehash(self):
    """重新哈希所有元素"""
    all_items = []
    for item in self.table1:
        if item is not None:
            all_items.append(item)
    for item in self.table2:
        if item is not None:
            all_items.append(item)
    
    self.table1 = [None] * self.capacity
    self.table2 = [None] * self.capacity
    self.size = 0
    
    for item in all_items:
        self.add(item)

扩容和重哈希的关键点：

当负载因子过高时，扩容可以降低冲突概率
重新哈希可以打破循环依赖
选择适当的扩容因子（通常为2倍）

第八步：完整实现和测试

class CuckooHashSet:
    def __init__(self, capacity=16):
        self.capacity = capacity
        self.table1 = [None] * capacity
        self.table2 = [None] * capacity
        self.load_factor = 0.5
        self.size = 0
    
    def hash1(self, key):
        return hash(key) % self.capacity
    
    def hash2(self, key):
        return (hash(key) * 2654435761) % self.capacity
    
    def contains(self, key):
        idx1 = self.hash1(key)
        idx2 = self.hash2(key)
        return (self.table1[idx1] is not None and self.table1[idx1] == key) or \
               (self.table2[idx2] is not None and self.table2[idx2] == key)
    
    def add(self, key):
        if self.contains(key):
            return
        
        if self.size >= self.capacity * self.load_factor:
            self._resize(2 * self.capacity)
        
        current = key
        table_index = 0
        max_evictions = 10
        
        for _ in range(max_evictions):
            if table_index == 0:
                pos = self.hash1(current)
                if self.table1[pos] is None:
                    self.table1[pos] = current
                    self.size += 1
                    return
                else:
                    self.table1[pos], current = current, self.table1[pos]
                    table_index = 1
            else:
                pos = self.hash2(current)
                if self.table2[pos] is None:
                    self.table2[pos] = current
                    self.size += 1
                    return
                else:
                    self.table2[pos], current = current, self.table2[pos]
                    table_index = 0
        
        self._rehash()
        self.add(current)
    
    def remove(self, key):
        idx1 = self.hash1(key)
        idx2 = self.hash2(key)
        
        if self.table1[idx1] is not None and self.table1[idx1] == key:
            self.table1[idx1] = None
            self.size -= 1
            return True
        elif self.table2[idx2] is not None and self.table2[idx2] == key:
            self.table2[idx2] = None
            self.size -= 1
            return True
        
        return False
    
    def _resize(self, new_capacity):
        old_table1 = self.table1
        old_table2 = self.table2
        old_capacity = self.capacity
        
        self.capacity = new_capacity
        self.table1 = [None] * new_capacity
        self.table2 = [None] * new_capacity
        self.size = 0
        
        for item in old_table1:
            if item is not None:
                self.add(item)
        
        for item in old_table2:
            if item is not None:
                self.add(item)
    
    def _rehash(self):
        all_items = []
        for item in self.table1:
            if item is not None:
                all_items.append(item)
        for item in self.table2:
            if item is not None:
                all_items.append(item)
        
        self.table1 = [None] * self.capacity
        self.table2 = [None] * self.capacity
        self.size = 0
        
        for item in all_items:
            self.add(item)

算法分析

时间复杂度：

查找：O(1) - 只检查两个固定位置
插入：平均O(1)，最坏O(n)（需要重新哈希时）
删除：O(1)

空间复杂度： O(n)

优势：

最坏情况查找时间有保证
删除操作简单高效
适合对查找性能要求高的场景

局限性：

可能需要重新哈希
哈希函数选择对性能影响很大
内存使用是传统哈希表的两倍

布谷鸟哈希通过巧妙的"踢出"机制，在保持高效查找的同时优雅地处理了哈希冲突，是哈希技术中一个很有创意的解决方案。

哈希算法题目：设计哈希集合（布谷鸟哈希实现）我将为你详细讲解如何使用布谷鸟哈希（Cuckoo Hashing）技术设计一个哈希集合。这个算法能提供高效的查找性能，并优雅地处理哈希冲突。题目描述设计一个哈希集合，支持以下操作： add(key) ：向集合中插入一个元素 remove(key) ：从集合中删除一个元素 contains(key) ：判断集合中是否包含该元素要求使用布谷鸟哈希实现，该技术使用两个哈希表和两个哈希函数，通过"踢出"机制解决冲突。解题过程第一步：理解布谷鸟哈希的基本原理布谷鸟哈希的核心思想是：使用两个哈希表（table1, table2）和两个哈希函数（hash1, hash2）每个元素有两个可能的存储位置： hash1(key) 或 hash2(key) 插入时，如果一个位置被占用，就"踢出"原有元素，将新元素放入被踢出的元素会尝试去它的另一个位置，可能继续踢出其他元素第二步：设计数据结构关键参数说明： capacity ：每个哈希表的大小 load_factor ：当 size/capacity 超过此值时需要扩容两个哈希表独立存储元素第三步：实现哈希函数哈希函数设计要点：两个函数应该产生不同的哈希分布使用大质数可以改善分布均匀性实际应用中可以使用更复杂的哈希函数第四步：实现查找操作查找过程分析：计算元素在两个表中的可能位置检查这两个位置是否包含目标元素时间复杂度：O(1)，因为只检查两个固定位置第五步：实现插入操作（核心）插入过程详解：首先检查元素是否已存在检查负载因子，决定是否需要扩容尝试将元素插入到它的第一个位置如果位置被占，踢出原有元素，将新元素放入对被踢出的元素重复此过程设置最大循环次数防止无限递归第六步：实现删除操作删除过程：计算元素的两个可能位置检查这两个位置，如果找到就删除时间复杂度：O(1) 第七步：实现扩容和重新哈希扩容和重哈希的关键点：当负载因子过高时，扩容可以降低冲突概率重新哈希可以打破循环依赖选择适当的扩容因子（通常为2倍）第八步：完整实现和测试算法分析时间复杂度：查找：O(1) - 只检查两个固定位置插入：平均O(1)，最坏O(n)（需要重新哈希时）删除：O(1) 空间复杂度： O(n) 优势：最坏情况查找时间有保证删除操作简单高效适合对查找性能要求高的场景局限性：可能需要重新哈希哈希函数选择对性能影响很大内存使用是传统哈希表的两倍布谷鸟哈希通过巧妙的"踢出"机制，在保持高效查找的同时优雅地处理了哈希冲突，是哈希技术中一个很有创意的解决方案。