哈希算法题目：设计一个基于哈希的分布式实时排行榜系统（支持分数更新和Top K查询）

字数 681 2025-11-04 11:59:17

哈希算法题目：设计一个基于哈希的分布式实时排行榜系统（支持分数更新和Top K查询）

题目描述：设计一个分布式实时排行榜系统，需要支持以下操作：

addUser(userId, score) - 添加用户和初始分数
updateScore(userId, delta) - 更新用户分数（可正可负）
getTopK(k) - 获取前K名用户（按分数降序排列）
getUserRank(userId) - 获取用户排名

要求系统能够处理高并发请求，保证数据一致性，并支持大规模用户量。

解题过程：

第一步：分析需求和数据特点

需要快速根据userId查找和更新分数 → 使用哈希表存储userId到分数的映射
需要快速获取Top K和用户排名 → 需要维护有序数据结构
分布式环境下需要考虑数据分片和一致性

第二步：设计核心数据结构

// 用户分数哈希表（分布式键值存储）
userId -> score

// 排行榜数据结构（使用跳表或平衡树）
有序集合：按分数降序排列，分数相同时按userId升序排列

第三步：详细设计每个操作

addUser(userId, score)操作：

def addUser(userId, score):
    # 1. 检查用户是否已存在（哈希表查找）
    if userId in user_score_map:
        return False
    
    # 2. 原子性操作：同时更新哈希表和排行榜
    user_score_map[userId] = score
    leaderboard.add((score, userId))
    
    # 3. 分布式环境下需要加分布式锁保证一致性

updateScore(userId, delta)操作：

def updateScore(userId, delta):
    # 1. 检查用户是否存在
    if userId not in user_score_map:
        return False
    
    # 2. 获取旧分数
    old_score = user_score_map[userId]
    new_score = old_score + delta
    
    # 3. 原子性更新
    # 先删除旧的排名记录
    leaderboard.remove((old_score, userId))
    # 再插入新的排名记录
    leaderboard.add((new_score, userId))
    # 更新哈希表
    user_score_map[userId] = new_score

第四步：优化排行榜查询性能

getTopK(k)操作优化：

def getTopK(k):
    # 使用跳表或平衡树的有序特性
    # 时间复杂度：O(k)，k为返回结果数量
    
    results = []
    # 从最高分开始遍历
    current = leaderboard.first()  # 最高分节点
    
    for i in range(min(k, len(leaderboard))):
        if current is None:
            break
        score, userId = current.value
        results.append((userId, score))
        current = current.next  # 跳表的下一节点
    
    return results

getUserRank(userId)操作优化：

def getUserRank(userId):
    if userId not in user_score_map:
        return -1
    
    score = user_score_map[userId]
    # 在跳表中，可以维护每个节点的排名信息
    # 或者通过遍历计算排名
    
    rank = 1
    current = leaderboard.first()
    
    while current is not None:
        current_score, current_userId = current.value
        if current_userId == userId:
            return rank
        rank += 1
        current = current.next
    
    return -1

第五步：分布式架构设计

数据分片策略：

# 按userId分片，使用一致性哈希
def getShard(userId):
    hash_value = hash(userId) % SHARD_COUNT
    return shards[hash_value]

# 每个分片包含：
# - 该分片用户的分数哈希表
# - 该分片的局部排行榜

第六步：处理分布式Top K查询

全局Top K查询策略：

def getGlobalTopK(k):
    # 1. 从每个分片获取局部Top K
    local_tops = []
    for shard in all_shards:
        local_top = shard.getTopK(k)  # 每个分片返回自己的Top K
        local_tops.extend(local_top)
    
    # 2. 合并局部结果得到全局Top K
    # 使用堆排序或归并排序
    local_tops.sort(key=lambda x: x[1], reverse=True)
    return local_tops[:k]

第七步：性能优化考虑

缓存热点数据：缓存Top 100的查询结果
异步更新：非关键操作可以异步处理
读写分离：排行榜查询走从节点，更新走主节点
批量操作：支持批量分数更新，减少网络开销

这个设计结合了哈希表的快速查找和有序数据结构的高效排名查询，通过分布式架构支持大规模用户场景。