序言
非比较排序,不需要比较,交换,在线性时间内完成排序。缺点:空间要求较多,不是原地排序,典型的空间换取时间。
计数排序
计数排序利用一个特点:已经排好序(例如从小到大)的数组中,第i个元素为x,则数组中一定有:小于等于x的元素有i个。计数排序需要一个空间代价:n + max_num.
桶排序
桶排序包含两个步骤:分发元素到每个桶内;顺序回收桶内元素。伪代码如下:
BUCKET_SORT (A)
- n
← length [A]- For i = 1 to n do
- Insert A[i] into list
B[nA[i]]- For i = 0 to n-1 do
- Sort list B with Insertion sort
- Concatenate the lists B[0], B[1], . . B[n-1] together in order.
代码实现
using namespace std;
void BucketSort(int *array, int length)
{
const int MAX = 256;
int bucket[MAX];
for (unsigned int i(0); i<MAX; i++) bucket[i] = 0;
for(unsigned int i=0; i<length; i++)
bucket[array[i]]++;
for(unsigned i=0, idx=0; i<MAX; i++)
for(unsigned int k=bucket[i]; k>0; k--) // 回收每个桶内元素
array[idx++] = i;
}
/** sort array values, store the results in dst array
@param arr point to source array
@param dst point to destinate array
@param length the array size.
@note Need O( N+Max_Num ) assistant space. space ==> time
**/
void counting_sort(int* arr, int* dst, int length)
{
const int max_num = 255;
int count[max_num];
for (int i(0); i<max_num; i++) count[i] = 0;
for (int i(0); i<length; i++)
count[ arr[i] ]++;
for (int i(1); i<max_num; i++)
count[i] += count[i-1];
for (int i=length-1; i>=0; i--)
{
dst[ count[arr[i]]-1 ] = arr[i]; // 必须减一
count[ arr[i] ]--;
}
}
int main ()
{
#define P_2
#ifdef P_1
{
int myints[] = {10,20,30,5,15};
int length = sizeof(myints)/sizeof(myints[0]);
int* result = new int[length];
counting_sort(myints, result, length);
for (int i(0); i<length; i++) cout << result[i] << " ";
cout << endl;
delete result; result = 0;
}
#endif // P_1
#ifdef P_2
{
const int test_num = 10000000; // 1 million
srand((int)time(0));
cout << "Test number " << "/t:/t/t" << test_num << std::endl;
int* testPtr = new int[test_num];
int* testPtr1 = new int[test_num];
int* testPtr2 = new int[test_num];
for(int idx=0; idx<test_num; idx++)
testPtr2[idx] = testPtr1[idx] = testPtr[idx] = rand()%255;
clock_t t = clock();
// sort
// keep the testPtr1, change the testPtr.
counting_sort(testPtr1, testPtr, test_num);
cout << "count_sort " << "/t:/t/t" << clock() - t << std::endl;
t = clock();
BucketSort(testPtr1, test_num);
cout << "BucketSort " << "/t:/t/t" << clock() - t << std::endl;
t = clock();
std::sort(testPtr2, testPtr2+test_num);
cout << "std::sort " << "/t:/t/t" << clock() - t << std::endl;
// Test the result validity
for(int idx=0; idx<test_num; idx++)
if (testPtr[idx]!=testPtr1[idx] || testPtr1[idx]!= testPtr2[idx])
cout << "wrong";
delete[] testPtr; testPtr = 0;
delete[] testPtr1; testPtr1 = 0;
delete[] testPtr2; testPtr2 = 0;
}
#endif // P_2
#ifdef P_3
#endif // P_3
system("PAUSE");
return 0;
}
测试结果:
计数和桶排序时间上的优势不用说了,但是这些非比较排序算法对空间要求比较高,而且跟排序数据的分布也有关。上面的测试结果是基于所有随机数据都在[0-255]之间的假设前提。数据分布越散乱,空间复杂度越高。
后记
桶排序的思路尤其适合字符串数组的排序,因为acsii字符都位于0,255之间。更加一般的桶排序可以使用链表,有时间再续吧~