散列表

现在的位置: 首页 > 综合 > 正文

RSS

2013年08月11日 ⁄ 综合 ⁄ 共 4616字 ⁄ 字号小中大 ⁄ 评论关闭

这里我总结一下有关hash表的算法技巧：（《算法导论》）

1、最普通的hash技术是：直接寻址表。

这种技术主要应用在当关键字的全域U比较小时，直接分配固定的hash表大小。使用方式（insert、search、delete）类似于普通的数组。

hash表组织方式主要有：separate-chaining和open-addressing

解决冲突碰撞的主要方法有：拉链法（separate-chaining）和开放定址法（open-addressing）

2、hash函数设计

注意点：（1）好的hash函数需要满足产生的hash值尽量散列均匀。

（2）一般设计的hash函数默认的关键字都是自然数。所以如果应用中的关键字集合不是自然数集合时，需要先进行转化。

这里主要介绍三种hash函数的方法：

（1）除法散列法

通过取k除以m的余数，来将关键字k映射到m个槽的某一个中去。

h(k)= k mod m

这里需要注意m值的选取，m不能选2的幂，因为如果m=2^p，则h(k)就是k的p个最低位数字。

可以选择m的值为与2的整数幂不太接近的质数。这里m的值就是hash表的大小。

（2）乘法散列法

构造乘法散列法的步骤有两个：

第一步，用关键字k乘上常数A（0<A<1），并抽取kA的小数部分，然后用m乘以这个值，在将结果取底（floor）

h(k)= [ m(kA mod 1)]

优点在于m的选取没有特定的要求，一般选择2的某个幂次。A一般选择(sqrt(5)-1)/2.

（3）全域散列

全域散列的基本思想就是使用随机化算法，就是从一族仔细设计的函数中，随机的选择一个作为散列函数。

hash函数的一些构造方法：

1、数字分析法

如果事先知道关键字集合，并且每个关键字的位数比hash表的地址数码位数多时，可以从关键字中选出分布均匀的若干位，构成hash地址。

2、平方取中法

当无法确定关键字中哪几位分布较均匀时，可以先求出关键字的平方值，然后按需要取平方值的中间几位作为hash地址。

3、分段叠加

主要有折叠法和移位法。（见书吧）

4、伪随机数法

采用一个伪随机函数做hash函数，即h(key)= random(key).

5、上面说的除法散列法。

程序设计中主要的查找算法：

顺序查找、折半查找、基于树的查找、计算式查找方法——哈希法。

在涉及hash处理的程序中，需要注意如下几点：

1、把hash表的大小选择为素数，这样能够使元素在hash函数的处理下分布得更加均匀。

因为最后求出hash值以后，都会进行取余操作： index = hashValue % hashTable.size()

2、当使用拉链法实现hash时，当它的负载系数超过1时，就需要重新hash，扩大hash表的大小，然后将原来hash表中的数据放到新的hash表中，其实扩展hash表就像扩展vector容器一样，是非常耗时的。

如果是自定义类对象，那么使用上述hash实现时，需要实现自身类的hash函数，因为hash类会调用每个被hash元素类型的hash函数，同时也要提供重载的operator== 和operator!= ，因为这在find函数中比较对象的时候会用到的。

可以进行的操作有：查询、插入、删除

3、当使用开放寻址法实现hash时，当它的负载系数超过0.5时，就需要重新hash。

当发生冲突时，采用的hash函数为:index=( hash(x) + f(i) ) % hashTable.size() .

这里根据f(i)的不同，可以有不同的冲突解决策略：

（1）线性探测： f(i)=i i=1,2,3.... 会产生二次聚集的现象

（2）二次探测：f(i)=i^2 i=1,2,3,.. 定理：如果使用二次试探的策略，且hash表的大小为prime，那么只要保证hash表的为半满，那么新的元素总能成功执行插入操作。

（3）伪随机再探测：使用随机函数产生f(i)的值

可以进行的操作有：查询、插入、删除（删除操作一般都是lazy deletion实现，可以通过状态来标识每一个hash表中的元素）

4、使用二次哈希

可以用第二个hashhash函数来解决冲突问题，选择第二个hash函数非常重要，否则效果会变得很差。

比较好的hash₂(x)= R- (x mod R) ,这里R 是比hashTable.size()小的一个prime。

所以f(i) = i * hash₂(x). 实验证明：二次hash的效果接近于随机hash，但是计算的代价高了。

C++ 中实现的hash_set和hash_map：

1、hash_map:

作用：

Stores and retrieves data quickly from a collection in which each element is a pair that has a sort key whose value is unique and an associated data value.

形式：

template <
   class Key,
   class Type,
   class Traits=hash_compare<Key, less<Key> >,
      class Allocator=allocator<pair <const Key, Type> >
>
class hash_map

特点：

The hash_map is:

An associative container, which a variable size container that supports the efficient retrieval of element values based on an associated key value.
Reversible, because it provides a bidirectional iterator to access its elements.
Hashed, because its elements are grouped into buckets based on the value of a hash function applied to the key values of the elements.
Unique in the sense that each of its elements must have a unique key.
A pair associative container, because its element data values are distinct from its key values.
A template class, because the functionality it provides is generic and so independent of the specific type of data contained as elements or keys. The data types to be used for elements and keys are, instead, specified as parameters in the class template along with the comparison function and allocator.

常用的成员函数和STL库中的map的成员函数基本上是一样的。

下面是使用的几个例子：

#include<iostream>
#include<hash_map>
#include<cstddef>
using namespace std;
using namespace stdext;

int main(){

hash_map<int,int>::iterator hm1_pIter;
hash_map<int,int> hm1;
typedef pair<int,int> Int_Pair;

hm1.insert( Int_Pair(1,10));
hm1.insert( Int_Pair(2,20));
hm1.insert( Int_Pair(3,30));
hm1.insert( Int_Pair(4,40));

cout<<"The original key values of hm1= ";
for(hm1_pIter= hm1.begin() ; hm1_pIter!=hm1.end() ;hm1_pIter++)
cout<<" "<<hm1_pIter->first;
cout<<" . "<<endl;

cout<<"The original hash_mapped values of hm1=";
for(hm1_pIter = hm1.begin() ; hm1_pIter!=hm1.end() ;hm1_pIter++)
cout<<" "<<hm1_pIter->second ;
cout<<" . "<<endl;

//insert operation
pair< hash_map<int,int>::iterator , bool> pr;
pr=hm1.insert(Int_Pair(1,10));
if( true == pr.second )
  cout<<"The element 10 was inserted in hm1 sucessfully"<<endl;
else
  cout<<"The element 10 already exists in hm1/n with a key value of "
  <<(pr.first)->first<<" . "<<endl;

//find operation
hm1_pIter= hm1.find(4);
cout<<"We find key value of 4 : "<< hm1_pIter->second<<" . "<<endl;

//erase iteration
size_t num=hm1.erase(3);
cout<<"We have erased : "<<num<<" key values ."<<endl;

hm1.insert( Int_Pair(5,50));
hm1.insert( Int_Pair(6,60));

//equal_range()

pair< hash_map<int,int>::const_iterator , hash_map<int,int>::const_iterator > myPair;
myPair = hm1.equal_range(5);
cout<<"the lower bound of the element with the key value of 5 is: "<<myPair.first->second<<endl;
cout<<"the upper bound of the element with the key value of 5 is: "<<myPair.second->second<<endl;

return 0;

}

另外一个例子：

typedef struct node{

int x;
int y;
node(int x1,int y1):x(x1),y(y1){}
}Node;

int main(){

Node n1(1,2),n2(3,4);
hash_map<int,Node> myMap;
typedef pair<int,Node> myPair;

myMap.insert(myPair(1,n1));
myMap.insert(myPair(2,n2));

cout<<"可以容纳的最大元素个数"<<myMap.max_size()<<endl; //357913941

return 0;

}

【上篇】reSIProcate中repro例子的参数设置失败
【下篇】Lucene全文搜索学习

作者: tertiary

该日志由 tertiary 于11年前发表在综合分类下，最后更新于 2013年08月11日.
转载请注明: 散列表 | 学步园 +复制链接

抱歉!评论已关闭.

学步园

散列表

作者: tertiary

书签

最新文章New

本站推荐

返回首页