哈希算法题目：设计推特

字数 1251 2025-12-23 01:28:09

哈希算法题目：设计推特

我将为你详细讲解如何设计一个简化版的Twitter（推特）系统。这个题目结合了哈希表、链表、堆等多种数据结构，是系统设计类题目的经典代表。

题目描述

设计一个简化版的Twitter，用户可以发送推文，关注/取消关注其他用户，并且能够看到自己关注的人（包括自己）发布的最近10条推文。推文需要按时间顺序从新到旧显示。

系统需求分析

我们需要支持以下操作：

用户发送推文
用户关注另一个用户
用户取消关注另一个用户
获取用户新闻源（news feed）：按时间倒序显示该用户关注的所有人（包括自己）的最新10条推文

数据结构设计

第一步：核心数据结构选择

class Twitter:
    def __init__(self):
        # 用户关注列表：key=用户ID, value=该用户关注的用户集合
        self.follows = {}
        
        # 用户推文列表：key=用户ID, value=该用户发布的所有推文列表
        self.tweets = {}
        
        # 全局时间戳，确保推文按时间排序
        self.timestamp = 0

第二步：推文数据结构

每条推文需要包含内容和时间戳：

class Tweet:
    def __init__(self, tweetId: int, timestamp: int):
        self.id = tweetId        # 推文ID
        self.time = timestamp    # 发布时间戳

具体实现步骤

步骤1：用户发送推文

def postTweet(self, userId: int, tweetId: int) -> None:
    """
    用户发布一条新推文
    时间复杂度：O(1)
    """
    # 如果用户第一次发推，初始化其推文列表
    if userId not in self.tweets:
        self.tweets[userId] = []
    
    # 创建新推文对象，时间戳递增确保时序
    tweet = Tweet(tweetId, self.timestamp)
    self.timestamp += 1
    
    # 将推文添加到用户推文列表的头部（最新推文在末尾）
    self.tweets[userId].append(tweet)
    
    # 确保用户始终关注自己（这样才能在news feed中看到自己的推文）
    self.follow(userId, userId)  # 调用follow方法

关键点解释：

每个用户都有自己的推文列表
时间戳确保推文有全局顺序
用户自动关注自己，简化新闻源获取逻辑

步骤2：用户关注其他用户

def follow(self, followerId: int, followeeId: int) -> None:
    """
    用户关注另一个用户
    时间复杂度：O(1)
    """
    # 初始化关注列表
    if followerId not in self.follows:
        self.follows[followerId] = set()
    
    # 添加关注关系
    self.follows[followerId].add(followeeId)

设计思路：

使用集合（Set）存储关注列表，因为：
1. 去重：不能重复关注同一个人
2. 快速查找：O(1)时间复杂度检查是否已关注
3. 快速添加/删除

步骤3：用户取消关注

def unfollow(self, followerId: int, followeeId: int) -> None:
    """
    用户取消关注另一个用户
    注意：不能取消关注自己
    时间复杂度：O(1)
    """
    # 检查用户是否存在关注列表
    if followerId in self.follows:
        # 确保不能取消关注自己
        if followerId != followeeId:
            self.follows[followerId].discard(followeeId)

边界情况处理：

用户没有关注过任何人时的处理
不能取消关注自己（否则看不到自己的推文）

步骤4：获取新闻源（核心难点）

这是本题最复杂的部分，需要合并多个用户的推文流，并获取最新的10条。

方法一：暴力合并排序

def getNewsFeed(self, userId: int) -> List[int]:
    """
    获取用户新闻源：合并所有关注用户的推文，取最新的10条
    时间复杂度：O(N*logN)，其中N是总推文数
    """
    if userId not in self.follows:
        return []
    
    # 获取所有推文
    all_tweets = []
    for followeeId in self.follows[userId]:
        if followeeId in self.tweets:
            all_tweets.extend(self.tweets[followeeId])
    
    # 按时间戳降序排序
    all_tweets.sort(key=lambda x: x.time, reverse=True)
    
    # 取前10条的ID
    return [tweet.id for tweet in all_tweets[:10]]

问题：效率低下，每次都要合并排序所有推文。

方法二：优化版 - 使用堆（优先队列）

def getNewsFeed(self, userId: int) -> List[int]:
    """
    使用最大堆（大顶堆）优化获取最新10条推文
    时间复杂度：O(KlogN)，其中K=10，N=关注人数
    """
    if userId not in self.follows:
        return []
    
    import heapq
    
    # 最大堆：存储(-时间戳, 用户ID, 推文索引)
    # 使用负时间戳是因为Python的heapq是最小堆
    heap = []
    
    # 初始化堆：为每个关注用户添加其最新推文
    for followeeId in self.follows[userId]:
        if followeeId in self.tweets and self.tweets[followeeId]:
            tweets_list = self.tweets[followeeId]
            # 推文列表的最后一条是最新的
            last_index = len(tweets_list) - 1
            last_tweet = tweets_list[last_index]
            # 推入堆中：(-时间戳, 用户ID, 推文索引)
            heapq.heappush(heap, (-last_tweet.time, followeeId, last_index))
    
    # 收集最新的10条推文
    result = []
    while heap and len(result) < 10:
        time, followeeId, index = heapq.heappop(heap)
        tweets_list = self.tweets[followeeId]
        tweet = tweets_list[index]
        result.append(tweet.id)
        
        # 如果该用户还有更早的推文，加入堆中
        if index > 0:
            prev_tweet = tweets_list[index - 1]
            heapq.heappush(heap, (-prev_tweet.time, followeeId, index - 1))
    
    return result

堆优化详解：

为什么用堆：我们需要从多个有序列表中获取前K个最大元素，这是堆的典型应用场景
数据结构：(-时间戳, 用户ID, 推文索引)三元组
工作流程：
- 初始化：为每个被关注用户添加其最新推文到堆
- 循环：从堆中取出时间最新的推文
- 添加：将该用户的上一篇推文加入堆
- 停止：取满10条或堆为空

完整代码实现

class Tweet:
    def __init__(self, tweetId: int, timestamp: int):
        self.id = tweetId
        self.time = timestamp

class Twitter:
    def __init__(self):
        self.follows = {}  # userId -> set of followeeIds
        self.tweets = {}   # userId -> list of tweets
        self.timestamp = 0
    
    def postTweet(self, userId: int, tweetId: int) -> None:
        """发布推文"""
        if userId not in self.tweets:
            self.tweets[userId] = []
        
        tweet = Tweet(tweetId, self.timestamp)
        self.timestamp += 1
        self.tweets[userId].append(tweet)
        
        # 确保用户关注自己
        self.follow(userId, userId)
    
    def getNewsFeed(self, userId: int) -> List[int]:
        """获取新闻源 - 堆优化版"""
        import heapq
        
        if userId not in self.follows:
            return []
        
        heap = []
        
        # 为每个关注用户添加最新推文到堆
        for followeeId in self.follows[userId]:
            if followeeId in self.tweets and self.tweets[followeeId]:
                tweets = self.tweets[followeeId]
                last_idx = len(tweets) - 1
                last_tweet = tweets[last_idx]
                heapq.heappush(heap, (-last_tweet.time, followeeId, last_idx))
        
        # 获取最新的10条推文
        result = []
        while heap and len(result) < 10:
            time, followeeId, idx = heapq.heappop(heap)
            tweet = self.tweets[followeeId][idx]
            result.append(tweet.id)
            
            # 如果还有更早的推文，加入堆
            if idx > 0:
                prev_tweet = self.tweets[followeeId][idx - 1]
                heapq.heappush(heap, (-prev_tweet.time, followeeId, idx - 1))
        
        return result
    
    def follow(self, followerId: int, followeeId: int) -> None:
        """关注用户"""
        if followerId not in self.follows:
            self.follows[followerId] = set()
        self.follows[followerId].add(followeeId)
    
    def unfollow(self, followerId: int, followeeId: int) -> None:
        """取消关注"""
        if followerId in self.follows and followerId != followeeId:
            self.follows[followerId].discard(followeeId)

时间复杂度分析

操作	时间复杂度	空间复杂度
postTweet	O(1)	O(1)
follow	O(1)	O(1)
unfollow	O(1)	O(1)
getNewsFeed	O(F + 10*logF)	O(F)

其中F是用户关注的人数，10是要获取的推文数量。

扩展思考

如果推文数量巨大怎么办？
- 可以限制每个用户的推文存储数量（如只保存最近1000条）
- 使用分页加载，而不是一次性获取10条
如何支持实时更新？
- 可以考虑使用发布-订阅模式
- 当用户发推时，推送给所有粉丝的时间线
如何优化内存使用？
- 对推文内容进行压缩存储
- 使用LRU缓存最近访问的推文
分布式系统设计
- 用户数据分片存储
- 推文服务与关注关系服务分离
- 使用消息队列异步处理关注关系变更

这个设计展示了哈希表在构建复杂系统中的核心作用，通过合理的键值对组织，我们可以高效地管理用户关系和数据流。

哈希算法题目：设计推特我将为你详细讲解如何设计一个简化版的Twitter（推特）系统。这个题目结合了哈希表、链表、堆等多种数据结构，是系统设计类题目的经典代表。题目描述设计一个简化版的Twitter，用户可以发送推文，关注/取消关注其他用户，并且能够看到自己关注的人（包括自己）发布的最近10条推文。推文需要按时间顺序从新到旧显示。系统需求分析我们需要支持以下操作：用户发送推文用户关注另一个用户用户取消关注另一个用户获取用户新闻源（news feed）：按时间倒序显示该用户关注的所有人（包括自己）的最新10条推文数据结构设计第一步：核心数据结构选择第二步：推文数据结构每条推文需要包含内容和时间戳：具体实现步骤步骤1：用户发送推文关键点解释：每个用户都有自己的推文列表时间戳确保推文有全局顺序用户自动关注自己，简化新闻源获取逻辑步骤2：用户关注其他用户设计思路：使用集合（Set）存储关注列表，因为：去重：不能重复关注同一个人快速查找：O(1)时间复杂度检查是否已关注快速添加/删除步骤3：用户取消关注边界情况处理：用户没有关注过任何人时的处理不能取消关注自己（否则看不到自己的推文）步骤4：获取新闻源（核心难点）这是本题最复杂的部分，需要合并多个用户的推文流，并获取最新的10条。方法一：暴力合并排序问题：效率低下，每次都要合并排序所有推文。方法二：优化版 - 使用堆（优先队列）堆优化详解：为什么用堆：我们需要从多个有序列表中获取前K个最大元素，这是堆的典型应用场景数据结构： (-时间戳, 用户ID, 推文索引) 三元组工作流程：初始化：为每个被关注用户添加其最新推文到堆循环：从堆中取出时间最新的推文添加：将该用户的上一篇推文加入堆停止：取满10条或堆为空完整代码实现时间复杂度分析 | 操作 | 时间复杂度 | 空间复杂度 | |------|-----------|------------| | postTweet | O(1) | O(1) | | follow | O(1) | O(1) | | unfollow | O(1) | O(1) | | getNewsFeed | O(F + 10* logF) | O(F) | 其中F是用户关注的人数，10是要获取的推文数量。扩展思考如果推文数量巨大怎么办？可以限制每个用户的推文存储数量（如只保存最近1000条）使用分页加载，而不是一次性获取10条如何支持实时更新？可以考虑使用发布-订阅模式当用户发推时，推送给所有粉丝的时间线如何优化内存使用？对推文内容进行压缩存储使用LRU缓存最近访问的推文分布式系统设计用户数据分片存储推文服务与关注关系服务分离使用消息队列异步处理关注关系变更这个设计展示了哈希表在构建复杂系统中的核心作用，通过合理的键值对组织，我们可以高效地管理用户关系和数据流。