《编程之美》学习笔记——2.3寻找发贴“水王”

现在的位置: 首页 > 综合 > 正文

《编程之美》学习笔记——2.3寻找发贴“水王”

2018年03月30日 ⁄ 综合 ⁄ 共 5799字 ⁄ 字号小中大 ⁄ 评论关闭

一、问题

Tango是微软亚洲研究院的一个试验项目。研究院的员工和实习生们都很喜欢在Tango上面交流灌水。传说，Tango有一大“水王”，他不但喜欢发贴，还会回复其他ID发的每个帖子。坊间风闻该“水王”发帖数目超过了帖子总数的一半。如果你有一个当前论坛上所有帖子（包括回帖）的列表，其中帖子作者的ID也在表中，你能快速找出这个传说中的Tango水王吗？

问题分析：

输入：给入一个含有重复数据的ID数组

输出：重复次数最高的ID

约束：重复次数最高的ID个数超过总数的一半。

二、解法

版本一：排序查找

对于数据规模为n的ID序列，首先对序列根据ID进行排序，最快的排序算法时间复杂度为O(nlgn)，然后来我们根据排好序的序列对查找重复次数最多的ID，这个时间复杂度是O(n)，因此总的时间复杂度是O(nlgn+n)。

实现时使用快速排序算法（见排序专题博文）对问题的ID序列进行排序，并设计算法寻找重复次数最多的ID（包括但不仅限于重复次数大于总数1/2的ID元素）

算法C实现：

/**
 * @file find_id_by_sort_search.c
 * @brief find the id which repeats most in the sequence.
 * @author chenxilinsidney
 * @version 1.0
 * @date 2015-01-07
 */

#include <stdlib.h>
#include <stdio.h>
// #define NDEBUG
#include <assert.h>

// #define NDBG_PRINT
#include "debug_print.h"

typedef int TYPE;

#define MAX_COUNT      10000000
TYPE array[MAX_COUNT] = {0};

/**
 * @brief select the last element as a pivoit.
 * Reorder the array so that all elements with values less than the pivot
 * come before the pivot, while all elements with values greater than the pivot
 * come after it (equal values can go either way). After this partitioning, the
 * pivot is in its final position. 
 *
 * @param[in,out]  array          input and output array
 * @param[in]      index_begin    the begin index of the array(included)
 * @param[in]      index_end      the end index of the array(included)
 *
 * @return the position of the pivot(index from the array)
 */
TYPE partition(TYPE* array, TYPE index_begin, TYPE index_end)
{
    /// pick last element of the array as the pivot
    TYPE pivot = array[index_end];
    /// index of the elments that not greater than pivot
    TYPE i = index_begin - 1;
    TYPE j, temp;
    /// check array's elment one by one
    for (j = index_begin; j < index_end; j++) {
        if (array[j] <= pivot) {
            /// save the elements not greater than pivot to left index of i.
            i++;
            temp = array[j];
            array[j] = array[i];
            array[i] = temp;
        }
    }
    /// set the pivot to the right position
    array[index_end] = array[++i];
    array[i] = pivot;
    /// return the position of the pivot
    return i;
}

/**
 * @brief sort the array by Quicksort method.
 *
 * @param[in,out]  array       sequence for sort
 * @param[in]      index_begin begin index of sequence for sort(included)
 * @param[in]      index_end   end index of sequence for sort(included)
 */
void quick_sort(TYPE* array, TYPE index_begin, TYPE index_end)
{
    /// sort only under the index_begin < index_end condition
    if ( index_begin < index_end) {
        /// exchange elements to the pivot position by partition function
        TYPE index_pivot = partition(array, index_begin, index_end);
        /// sort the array before the pivot position
        quick_sort(array, index_begin, index_pivot - 1);
        /// sort the array after the pivot position
        quick_sort(array, index_pivot + 1, index_end);
    }
}

/**
 * @brief find the max repeated element from a sorted sequence
 *
 * @param[in]      array  sorted sequence
 * @param[in]      count  sequence length
 *
 * @return the max element that repeats max time
 */
TYPE search_max_repeated_element(TYPE* array, TYPE count)
{
    TYPE count_temp, value_temp;
    TYPE max_value = array[0];
    TYPE max_count = 0;
    TYPE i = 0;
    while (i < count) {
        value_temp = array[i];
        count_temp = 1;
        while (array[++i] == value_temp)
            count_temp++;
        if (count_temp >= max_count) {
            max_count = count_temp;
            max_value = value_temp;
        }
    }
    return max_value;
}

/**
 * @brief find the element that repeats max time in the unsorted array
 *
 * @param[in,out]  array  unsorted array
 * @param[in]      count  array length
 *
 * @return the element value that repeats max time
 */
TYPE find_id(TYPE* array, TYPE count)
{
    /// sort array first
    quick_sort(array, 0, count - 1);
    /// search the wanted element
    return search_max_repeated_element(array, count);
}

int main(void) {
    /// read data to array
    TYPE count = 0;
    while(count < MAX_COUNT && scanf("%u\n", array + count) == 1) {
        ++count;
    }
    /// find id
    TYPE id = find_id(array, count);
    /// output result
    printf("max repeat count element is %d.\n", id);
    return EXIT_SUCCESS;
}

分析：

在查找方式上，这里通过扫描一遍排好序的ID列表来统计各个ID出现次数，输出最大次数的ID，时间复杂度为O(n)。

更优的方式，可以考虑重复次数最高的ID个数超过总数的一半这个约束条件。如果这个约束条件成立，那么排好序的ID列表中第n/2项一定会是这个ID，这样就不用扫描ID列表，时间复杂度为O(1)，大大减低查找所花费的时间。（这里恰到好处的使用了题目中的约束条件）

但是以上两种方法都需要对ID进行排序，因此总时间复杂度至少为O(nlgn)。

版本二：采用“已知的水王发帖数超过一半”这个约束处理本题。

思路：如果每次删除两个不同的ID（不管是否包含“水王”的ID），那么，剩下的ID列表中，“水王”ID出现的次数仍然超过总数的一半。看到这一点之后，就可以通过不断重复这个过程，把ID列表中的ID总数降低（转化为更小的问题），从而得到问题的答案。（《编程之美》原话）

下面算法思想：（参考博客：

http://blog.csdn.net/rein07/article/details/6741661）

假设每个ID都有可能是水王，那么在遍历时这个水王就要遇到一种挑战，可能自己的帖子数是会增加的，也可能是遇到挑战的，帖子数要减少的。这样遍历下来，只有水王的

帖子增加的减去遇到挑战的帖子数会是大于0的。其他任何帖子假设为水王时都是禁不起挑战的。

步骤：

1. 可以假设帖子的第一个ID是次数最大的，用candidate记录，次数用nTimes记录。
2. 遍历下一个ID，如果跟candidate一样，nTimes++，否则，遇到一个挑战，则nTimes--，如果nTimes == 0，下一步就要重复第一步了。
3.遍历结束，nTimes>0的那个candidate就是水王ID，他是获胜者。

算法C实现：

TYPE find_id(TYPE* array, TYPE count)
{
    TYPE i;
    TYPE candidate, nTimes = 0;
    for (i = 0; i < count; i++) {
        if (nTimes == 0) {
            candidate = array[i];
            nTimes = 1;
        } else {
            if (candidate == array[i])
                nTimes++;
            else
                nTimes--;
        }
    }
    return candidate;
}

分析：这个算法只需要遍历一遍ID数组便可以找出“水王”，时间复杂度O(n)，并且需要的空间复杂度也很低，空间复杂度O(1)。

在这个题目中，有一个计算机科学中很普遍的思想，就是如何把一个问题转化为规模较小的若干个问题。分治、递推和贪心等都是基于这样的思路。在转化过程中，小的问题跟原问题本质上一致。这样，我们可以通过同样的方式将小问题转化为更小的问题。因此，转化过程是很重要的。像上面这个题目，我们保证了问题的解在小问题中仍然具有与原问题相同的性质：水王的ID在ID列表中的次数超过一半。转化本身计算的效率越高，转化之后问题规模缩小得越快，则整体算法的时间复杂度越低。(摘自《编程之美》)

三、拓展问题

随着Tango的发展，管理员发现，“超级水王”没有了。统计结果表明，有3个发帖很多的ID，他们的发帖数目都超过了帖子总数目N的1/4。你能从发帖ID列表中快速找出他们的ID吗？

思路：可以与前一个问题进行类比。前一问题通过每次删除两个不同的ID，剩下的ID列表中““水王”ID出现的次数仍然超过总数的一半，使问题规模降低，类比的：拓展问题我们每次删除4个两两不同的ID，剩下的ID列表中这3个超过总数1/4的ID出现的次数也将仍然超过总数的1/4，也使问题规模降低。接下来就剩下算法实现的问题了，我们需要理清的一问题是这3个ID相互之间并不做比较，他们是一伙的，为此我们需要修改上前一个问题解中if else语句的顺序。

算法C实现：

/**
 * @brief find the element that repeats max times in the unsorted array
 *
 * @param[in,out]  array  unsorted array
 * @param[in]      count  array length
 * @param[out]     count  three elements value that repeats max times
 * (times > 1 / 4 total)
 */
void find_id(TYPE* array, TYPE count, TYPE* candidate)
{
    TYPE i;
#define ELEMENT_COUNT 3
    /// iniitialize
    TYPE nTimes[ELEMENT_COUNT] = {0};
    candidate[0] = 0;
    candidate[1] = 0;
    candidate[2] = 0;
    for (i = 0; i < count; i++) {
        if (candidate[0] == array[i]) {
            nTimes[0]++;
        } else if (candidate[1] == array[i]) {
            nTimes[1]++;
        } else if (candidate[2] == array[i]) {
            nTimes[2]++;
        } else if (nTimes[0] == 0) {
            nTimes[0] = 1;
            candidate[0] = array[i];
        } else if (nTimes[1] == 0) {
            nTimes[1] = 1;
            candidate[1] = array[i];
        } else if (nTimes[2] == 0) {
            nTimes[2] = 1;
            candidate[2] = array[i];
        } else {
            nTimes[0]--;
            nTimes[1]--;
            nTimes[2]--;
        }
    }
#undef ELEMENT_COUNT
    return;
}

备注：

1.转载请注明出处：http://blog.csdn.net/chensilly8888/

2.全文源码均开源（在UBUNTU + GCC4.8.2下编译并测试通过），可下载或查看：https://github.com/chenxilinsidney/funnycprogram/tree/master/beauty_of_programming/find_id

3.有写错或写漏之处请指正，谢谢！

【上篇】《编程之美》学习笔记——2.10寻找数组中的最大值和最小值
【下篇】《编程之美》学习笔记——2.2不要被阶乘吓倒

作者: alcalde

该日志由 alcalde 于6年前发表在综合分类下，最后更新于 2018年03月30日.
转载请注明: 《编程之美》学习笔记——2.3寻找发贴“水王” | 学步园 +复制链接

抱歉!评论已关闭.

学步园

《编程之美》学习笔记——2.3寻找发贴“水王”

作者: alcalde

书签

最新文章New

本站推荐

返回首页