《编程之美》学习笔记——2.5寻找最大的K个数

现在的位置: 首页 > 综合 > 正文

RSS

《编程之美》学习笔记——2.5寻找最大的K个数

2019年07月17日 ⁄ 综合 ⁄ 共 5413字 ⁄ 字号小中大 ⁄ 评论关闭

一、问题

有很多无序的数，假定它们各不相等，从中找出最大的K个数。

问题分析：

输入：N个数；K。

输出：N个数中最大的K个数，这K个数并不需要是有序的，只需为数组中最大的K个数即可。

约束：N个数各不相等。

二、解法

解法一采用排序算法类

A.数组全排序：

可以对数组进行快速排序或堆排序(O(n*lgn))，然后获得最大的K个数(O(k))。

时间复杂度：O(n*lgn) + O(k)=O(n*lgn)(k < n)。

缺点：算法不仅对最大K个数排序，也对其他的N-K个数排序，后面的方法可以避免。

B.数组K个数排序：

可以进一步优化：由于我们只需要最大的K个数，对于其他的n-k个数的排序我们实际上是不需要进行的。因此可以修改或改用其他的排序算法，可使用选择排序和交换排序，每次排序时只针对最大的K个数，减少对其余n-k个数的排序计算量。

时间复杂度：O(n*k)。当 k < lgn 时，该算法要优于快速排序。

缺点：算法对最大的K个数排序，后面的方法可以避免。

C.数组排序分组：（利用快速排序算法思想）

继续进行优化：如果我们能够做到对最大的K个数不进行排序，剩下n-k个数也不进行排序，这时候排序算法应该是最优的，实际上我们只找出了区分K和N-K个数的数组临界点，并然K个数和n-K个数分别位于数组临界点的两端。可以考虑对快速排序算法进行改进以解决我们的问题。

时间复杂度：算法平均时间复杂度O(n*lgk)。

优化：

1.算法可以对递归进行封装，从而考虑k < 0时返回-1，k >= N时返回数组最后一个索引。

2.partition部分算法采用数组最后一个值作为pivot，将数组分成两部分，若随机选取枢纽，可达到线性期望时间O(n)。

缺点：

最后一种从快速排序演化来的数组排序分组来获得最大的K个数算法是最高效的。以上三种方法均有一个缺点：数组必须一次性存储全部的数据到内存，当数据规模较大时由於硬件本身内存不够约束此种方法无法使用。

算法步骤：（摘自《编程之美》）

在本问题中，假设 N 个数存储在数组 S 中，我们从数组 S 中随机找出一个元素 X，把数组分为两部分 Sa 和 Sb。

Sa 中的元素大于等于 X，Sb 中元素小于 X。这时，有两种可能性：

1. Sa中元素的个数小于K，Sa中所有的数和Sb中最大的K-|Sa|个元素（|Sa|指Sa中元素的个数）就是数组S中最大的K个数。

2. Sa中元素的个数大于或等于K，则需要返回Sa中最大的K个元素。

算法C实现：

1.排序采用从大到小排序以从数组得到最大的K个数，采用快速排序的partition算法根据这种排序方式实现（另一种是从小到大排序）。

3.算法关键在于找到递归的终止条件，这点和原始的快速排序是不同的。

/**
 * @file find_large_numbers.c
 * @brief find largest k numbers in a given array.
 * @author chenxilinsidney
 * @version 1.0
 * @date 2015-02-04
 */

#include <stdlib.h>
#include <stdio.h>
// #define NDEBUG
#include <assert.h>

// #define NDBG_PRINT
#include "debug_print.h"

typedef int TYPE;

#define MAX_COUNT      10000000
TYPE array[MAX_COUNT] = {0};

/**
 * @brief select the last element as a pivoit.
 * Reorder the array so that all elements with values less than the pivot
 * come before the pivot, while all elements with values less than the pivot
 * come after it (equal values can go either way). After this partitioning, the
 * pivot is in its final position. 
 *
 * @param[in,out]  array          input and output array
 * @param[in]      index_begin    the begin index of the array(included)
 * @param[in]      index_end      the end index of the array(included)
 *
 * @return the position of the pivot(index from the array)
 */
TYPE partition(TYPE* array, TYPE index_begin, TYPE index_end)
{
    /// pick last element of the array as the pivot
    TYPE pivot = array[index_end];
    /// index of the elments that not greater than pivot
    TYPE i = index_begin - 1;
    TYPE j, temp;
    /// check array's elment one by one
    for (j = index_begin; j < index_end; j++) {
        if (array[j] >= pivot) {
            /// save the elements not less than pivot to left index of i.
            i++;
            temp = array[j];
            array[j] = array[i];
            array[i] = temp;
        }
    }
    /// set the pivot to the right position
    array[index_end] = array[++i];
    array[i] = pivot;
    /// return the position of the pivot
    return i;
}

/**
 * @brief find the last index of the input array for largets N numbers.
 * the numbers that index is before the last index is the largest N numbers.
 *
 * @param[in,out]  array          input and output array
 * @param[in]      index_begin    the begin index of the array(included)
 * @param[in]      index_end      the end index of the array(included)
 * @param[in]      count          the count of the largest numbers.
 */
TYPE find_quick_sort(TYPE* array, TYPE index_begin, TYPE index_end, TYPE count)
{
    assert(count > 0);
    if (index_begin < index_end) {
        /// get pivot by partition
        TYPE index_pivot = partition(array, index_begin, index_end);
        TYPE first_array_count = index_pivot - index_begin + 1;
        DEBUG_PRINT_VALUE("%d", count);
        DEBUG_PRINT_VALUE("%d", index_begin);
        DEBUG_PRINT_VALUE("%d", index_end);
        DEBUG_PRINT_VALUE("%d", index_pivot);
        if (first_array_count < count)
            /// find the other numbers in second part of the array
            return find_quick_sort(array, index_pivot + 1, index_end,
                    count - first_array_count);
        else if (first_array_count > count)
            /// still find N numbers in first part of the array
            return find_quick_sort(array, index_begin, index_pivot - 1,
                    count);
        else
            /// just find N numbers
            return index_pivot;
    } else {
        return index_begin;
    }
}

int main(void) {
    /// read data to array
    TYPE count = 0;
    while(count < MAX_COUNT && scanf("%u\n", array + count) == 1) {
        ++count;
    }
    /// find largest N numbers
    TYPE N = 15;
    TYPE i, last_index;
    if ((last_index = find_quick_sort(array, 0, count - 1, N)) >= 0) {
        printf("get the largest %d numbers:\n", N);
        DEBUG_PRINT_VALUE("%d", last_index);
        for (i = 0; i <= last_index; i++) {
            printf("%d ", array[i]);
        }
    } else {
        printf("can not get the largest N numbers.\n");
    }
    return EXIT_SUCCESS;
}

D. 大小为K的小根堆：（堆排序算法思想）

维护一个大小为K的小根堆，此时堆顶元素是这K个元素中最小的一个元素，即第K个元素。遍历一次所有数据后得到的这个堆的K个元素即这个序列中最大的K个数。

时间复杂度：O(n*lgk)，与前一中方法一致，算法效率不变。但是空间复杂度为O(k)，因此不必使序列的所有的元素都存放到内存中，只需遍历一次序列即可，可分段遍历载入内存，克服了上一种方法的缺陷。

算法步骤：

1.利用遍历序列时数组前K个数建立一个大小为K的小根堆；

2.遍历序列时数组的剩余的每一个元素与小根堆堆顶做比较：如果该元素比堆顶元素小，则忽略该元素并继续遍历下一个元素；如果该元素比堆顶元素大，则用该元素替换堆顶元素，并调整堆使之满足堆的性质；

3.遍历结束后得到的小根堆的K个元素即问题的解。

算法C实现：

/**
 * @file find_large_numbers_heap.c
 * @brief find largest k numbers in a given array.
 * @author chenxilinsidney
 * @version 1.0
 * @date 2015-02-04
 */

#include <stdlib.h>
#include <stdio.h>
// #define NDEBUG
#include <assert.h>

#include "heap.h"
// #define NDBG_PRINT
#include "debug_print.h"

typedef int TYPE;

#define MAX_COUNT      10000000
TYPE array[MAX_COUNT] = {0};

/**
 * @brief find the last index of the input array for largets N numbers.
 * the numbers that index in the array is before the last index is
 * the largest N numbers.
 *
 * @param[in,out]  array          input and output array
 * @param[in]      array_length   array length
 * @param[in]      count          the count of the largest numbers.
 */
void find_heap_sort(TYPE* array, TYPE array_length, TYPE count)
{
    assert(array != NULL && array_length > 0 && count > 0);
    /// array need not to adjust if count larger than or equal to array length
    if (count >= array_length)
        return;
    /// build heap with first count of the array
    Heap heap;
    BuildHeap(&heap, count, array - 1, count);
    /// adjust heap with other elements
    TYPE i, element;
    for (i = count; i < array_length; i++) {
        HeapGet(&heap, &element);
        if (array[i] > element) {
            heap.data[1] = array[i];
            Heapify(heap.data, 1, count);
        }
    }
}

int main(void) {
    /// read data to array
    TYPE count = 0;
    while(count < MAX_COUNT && scanf("%u\n", array + count) == 1) {
        ++count;
    }
    /// find largest N numbers
    TYPE N = 15;
    TYPE i;
    find_heap_sort(array, count, N);
    printf("get the largest %d numbers:\n", N);
    for (i = 0; i < N; i++) {
        printf("%d ", array[i]);
    }
    return EXIT_SUCCESS;
}

【上篇】数据结构专题——堆
【下篇】《编程之美》学习笔记——2.19区间重合判断

作者: xjbymhd

该日志由 xjbymhd 于5年前发表在综合分类下，最后更新于 2019年07月17日.
转载请注明: 《编程之美》学习笔记——2.5寻找最大的K个数 | 学步园 +复制链接

抱歉!评论已关闭.

学步园

《编程之美》学习笔记——2.5寻找最大的K个数

作者: xjbymhd

书签

最新文章New

本站推荐

返回首页