HBase region预划分及查找过程

现在的位置: 首页 > 综合 > 正文

HBase region预划分及查找过程

2013年03月10日 ⁄ 综合 ⁄ 共 9877字 ⁄ 字号小中大 ⁄ 评论关闭

1.Region预划分：

RegionSplitter

java.lang.Object
  org.apache.hadoop.hbase.util.RegionSplitter

切分方式：分别按照不同的Split进行切分

bin/hbase org.apache.hadoop.hbase.util.RegionSplitter -c 60 -f test:rs myTable HexStringSplit

bin/hbase org.apache.hadoop.hbase.util.RegionSplitter -r -o 2 myTable UniformSplit

2.regionname查询：

HBase写记录过程中regionname查找简介：主要是看如何进行region选择，完成按domain域的数据散列，分摊至不同region上
|-->HTable table = new HTable(config, tablename);
|-->Put put = new Put(Bytes.toBytes("test2"));
|-->put.add(Bytes.toBytes(cfs[j]), Bytes.toBytes(String.valueOf(1)), Bytes.toBytes("value_3"));
|-->table.put(put);

HTable-->put() |利用HTable管理记录
|-->doPut(Arrays.asList(put));
     |-->for (Put put : puts)
          |-->writeBuffer.add(put);
               |-->if (n % DOPUT_WB_CHECK == 0 && currentWriteBufferSize > writeBufferSize)
                    |-->flushCommits();
     |-->if (autoFlush || currentWriteBufferSize > writeBufferSize)
          |-->flushCommits();

HTable-->flushCommits()   |提交添加的记录，内存阀值可控制
|-->Object[] results = new Object[writeBuffer.size()];
|-->this.connection.processBatch(writeBuffer, tableName, pool, results);
|-->finally
     |-->for (int i = results.length - 1; i>=0; i--)
          |-->if (results[i] instanceof Result)
               |-->writeBuffer.remove(i);

HConnection--->processBatch()     |HConnectionImplementation 批量处理
|-->processBatchCallback(list, tableName, pool, results, null)
     |-->List<Row> workingList = new ArrayList<Row>(list);
     |-->Map<HRegionLocation, MultiAction<R>> actionsByServer = new HashMap<HRegionLocation, MultiAction<R>>();
     |-->for (int i = 0; i < workingList.size(); i++)   |逐行处理Batch的每一行
          |-->Row row = workingList.get(i);
          |-->HRegionLocation loc = locateRegion(tableName, row.getRow(), true);   |根据行获取对应的Region
          |-->byte[] regionName = loc.getRegionInfo().getRegionName();
          |-->MultiAction<R> actions = actionsByServer.get(loc);
          |-->Action<R> action = new Action<R>(row, i);
          |-->lastServers[i] = loc;
          |-->actions.add(regionName, action);
     |-->for (Entry<HRegionLocation, MultiAction<R>> e: actionsByServer.entrySet()) {   |根据actionsByServer进行执行，执行过程在Callable中完成
               futures.put(e.getKey(), pool.submit(createCallable(e.getKey(), e.getValue(), tableName)));}
     |-->for (Entry<HRegionLocation, Future<MultiResponse>> responsePerServer : futures.entrySet())    |提交完成后，对future的返回对象进行处理

HConnectionImplementation--->locateRegion(final byte [] tableName,final byte [] row, boolean useCache)
|-->if (tableName == null || tableName.length == 0) |非空判断
|-->ensureZookeeperTrackers(); |ZK确认跟踪器
|-->if (Bytes.equals(tableName, HConstants.ROOT_TABLE_NAME)) |根据tableName判断为root,meta，还是一般类型表
     |-->ServerName servername = this.rootRegionTracker.getRootRegionLocation(); |从zk获取root的regionserver
     |-->return new HRegionLocation(HRegionInfo.ROOT_REGIONINFO,     |从ZK中返回root表存储的位置，建立server连接，用于获取meta表信息
            servername.getHostname(), servername.getPort());
|-->else if (Bytes.equals(tableName, HConstants.META_TABLE_NAME))       |从meta表中获取region的信息
     |-->return locateRegionInMeta(HConstants.ROOT_TABLE_NAME, tableName, row,
            useCache, metaRegionLock);
|-->else                                                                |此时从meta中获取信息
     |-->return locateRegionInMeta(HConstants.META_TABLE_NAME, tableName, row,
            useCache, userRegionLock);

HConnectionImplementation--->locateRegionInMeta()        |获取真正的region的location信息
|-->if (useCache) |客户端获取时会做cache，保存Location信息
     |-->location = getCachedLocation(tableName, row);
|-->byte [] metaKey = HRegionInfo.createRegionName(tableName, row, HConstants.NINES, false);
|-->metaLocation = locateRegion(parentTable, metaKey); |定位root/meta的region信息
|-->HRegionInterface server = getHRegionConnection(metaLocation.getHostname(), metaLocation.getPort());
|-->regionInfoRow = server.getClosestRowBefore( metaLocation.getRegionInfo().getRegionName(), metaKey,
            HConstants.CATALOG_FAMILY);
|-->byte [] value = regionInfoRow.getValue(HConstants.CATALOG_FAMILY, HConstants.REGIONINFO_QUALIFIER);
|-->HRegionInfo regionInfo = (HRegionInfo) Writables.getWritable(value, new HRegionInfo());
|-->value = regionInfoRow.getValue(HConstants.CATALOG_FAMILY, HConstants.SERVER_QUALIFIER);
|-->location = new HRegionLocation(regionInfo, hostname, port); |通过regionInfo和对应的regionserver机器名确认Location
|-->cacheLocation(tableName, location);
|-->return location;

HRegionServer-->getClosestRowBefore()    |用于寻找目标row最接近的行，在此用于meta表中寻找最近的regionname
|-->HRegion region = getRegion(regionName); |根据regionname获取对应的region信息
|-->Result r = region.getClosestRowBefore(row, family);   |在region中寻找最近的row
     |-->coprocessorHost.preGetClosestRowBefore(row, family, result) |coprocessorHost预处理
     |-->checkRow(row, "getClosestRowBefore"); |确认合法性
     |-->startRegionOperation();   |设置读写锁
     |-->Store store = getStore(family);   |
     |-->KeyValue key = store.getRowKeyAtOrBefore(row); |获取最接近的row行，并返回KeyValue对象
     |-->Get get = new Get(key.getRow());
     |-->get.addFamily(family);
     |-->result = get(get, null);
     |-->coprocessorHost.postGetClosestRowBefore(row, family, result); |处理完成后的钩子程序

Store-->getRowKeyAtOrBefore()   |获取具体的family对应的存储单元Store，每一个Store中存储一列簇
|-->KeyValue kv = new KeyValue(row, HConstants.LATEST_TIMESTAMP); |包装row为KeyValue对象值，用于比较
|-->GetClosestRowBeforeTracker state = new GetClosestRowBeforeTracker(     ｜封装比较对象
      this.comparator, kv, ttlToUse, this.region.getRegionInfo().isMetaRegion());
|-->this.lock.readLock().lock(); |设定读写锁
|-->this.memstore.getRowKeyAtOrBefore(state); |分别在meta和HFile中进行搜索
|-->for (StoreFile sf : Lists.reverse(storefiles))
     |-->rowAtOrBeforeFromStoreFile(sf, state);
|-->return state.getCandidate();

HConnection-->createCallable(final HRegionLocation loc, final MultiAction<R> multi, final byte [] tableName) |创建处理线程
|-->call() |在call对象中生成新的ServerCallable处理线程
     |-->ServerCallable<MultiResponse> callable = new ServerCallable<MultiResponse>(connection, tableName, null) {}
          |-->call()
               |-->return server.multi(multi);   |转换为HRegionServer的multi()方法
          |-->connect(boolean reload)
               |-->server = connection.getHRegionConnection(loc.getHostname(), loc.getPort()); |确认为哪个RegionServer

3.图文解析查找过程见： http://canann.iteye.com/blog/1559204

在 HBase中，大部分的操作都是在RegionServer完成的，Client端想要插入，删除，查询数据都需要先找到相应的 RegionServer。什么叫相应的RegionServer？就是管理你要操作的那个Region的RegionServer。Client本身并不知道哪个RegionServer管理哪个Region，那么它是如何找到相应的RegionServer的？本文就是在研究源码的基础上揭秘这个过程。

在前面的文章“HBase存储架构”中我们已经讨论了HBase基本的存储架构。在此基础上我们引入两个特殊的概念：-ROOT-和.META.。这是什么？它们是HBase的两张内置表，从存储结构和操作方法的角度来说，它们和其他HBase的表没有任何区别，你可以认为这就是两张普通的表，对于普通表的操作对它们都适用。它们与众不同的地方是HBase用它们来存贮一个重要的系统信息——Region的分布情况以及每个Region的详细信息。

好了，既然我们前面说到-ROOT-和.META.可以被看作是两张普通的表，那么它们和其他表一样就应该有自己的表结构。没错，它们有自己的表结构，并且这两张表的表结构是相同的，在分析源码之后我将这个表结构大致的画了出来：

我们来仔细分析一下这个结构，每条Row记录了一个Region的信息。

首先是RowKey，RowKey由三部分组成：TableName, StartKey 和 TimeStamp。RowKey存储的内容我们又称之为Region的Name。哦，还记得吗？我们在前面的文章中提到的，用来存放Region的文件夹的名字是RegionName的Hash值，因为RegionName可能包含某些非法字符。现在你应该知道为什么RegionName会包含非法字符了吧，因为StartKey是被允许包含任何值的。将组成RowKey的三个部分用逗号连接就构成了整个RowKey，这里TimeStamp使用十进制
的数字字符串来表示的。这里有一个RowKey的例子：

Table1,RK10000,12345678

然后是表中最主要的Family：info，info里面包含三个Column：regioninfo, server, serverstartcode。其中regioninfo就是Region的详细信息，包括StartKey, EndKey 以及每个Family的信息等等。server存储的就是管理这个Region的RegionServer的地址。

所以当Region被拆分、合并或者重新分配的时候，都需要来修改这张表的内容。

到目前为止我们已经学习了必须的背景知识，下面我们要正式开始介绍Client端寻找RegionServer的整个过程。我打算用一个假想的例子来学习这个过程，因此我先构建了假想的-ROOT-表和.META.表。

我们先来看.META.表，假设HBase中只有两张用户表：Table1和Table2，Table1非常大，被划分成了很多Region，因此在.META.表中有很多条Row用来记录这些Region。而Table2很小，只是被划分成了两个Region，因此在.META.中只有两条Row 用来记录。这个表的内容看上去是这个样子的：

.META.

现在假设我们要从Table2里面插寻一条RowKey是RK10000的数据。那么我们应该遵循以下步骤：

1. 从.META.表里面查询哪个Region包含这条数据。

2. 获取管理这个Region的RegionServer地址。

3. 连接这个RegionServer, 查到这条数据。

好，我们先来第一步。问题是.META.也是一张普通的表，我们需要先知道哪个RegionServer管理了.META.表，怎么办？有一个方法，我们把管理.META.表的RegionServer的地址放到ZooKeeper上面不久行了，这样大家都知道了谁在管理.META.。

貌似问题解决了，但对于这个例子我们遇到了一个新问题。因为Table1实在太大了，它的Region实在太多了，.META.为了存储这些Region信息，花费了大量的空间，自己也需要划分成多个Region。这就意味着可能有多个RegionServer在管理.META.。怎么办？在 ZooKeeper里面存储所有管理.META.的RegionServer地址让Client自己去遍历？HBase并不是这么做的。

HBase的做法是用另外一个表来记录.META.的Region信息，就和.META.记录用户表的Region信息一模一样。这个表就是-ROOT-表。这也解释了为什么-ROOT-和.META.拥有相同的表结构，因为他们的原理是一模一样的。

假设.META.表被分成了两个Region，那么-ROOT-的内容看上去大概是这个样子的：

-ROOT-

这么一来Client端就需要先去访问-ROOT-表。所以需要知道管理-ROOT-表的RegionServer的地址。这个地址被存在ZooKeeper中。默认的路径是：

/hbase/root-region-server

等等，如果-ROOT-表太大了，要被分成多个Region怎么办？嘿嘿，HBase认为-ROOT-表不会大到那个程度，因此-ROOT-只会有一个Region，这个Region的信息也是被存在HBase内部的。

现在让我们从头来过，我们要查询Table2中RowKey是RK10000的数据。整个路由过程的主要代码在org.apache.hadoop.hbase.client.HConnectionManager.TableServers中：

private HRegionLocation locateRegion(final byte [] tableName,



final byte [] row, boolean useCache)

throws IOException{



if (tableName == null || tableName.length == 0) {



throw new IllegalArgumentException(



“table name cannot be null or zero length”);



}



if (Bytes.equals(tableName, ROOT_TABLE_NAME)) {



synchronized (rootRegionLock) {



// This block guards against two threads trying to find the root



// region at the same time. One will go do the find while the



// second waits. The second thread will not do find.



if (!useCache || rootRegionLocation == null) {



this.rootRegionLocation = locateRootRegion();



}



return this.rootRegionLocation;



}



} else if (Bytes.equals(tableName, META_TABLE_NAME)) {



return locateRegionInMeta(ROOT_TABLE_NAME, tableName, row, useCache,

metaRegionLock);



} else {



// Region not in the cache – have to go to the meta RS



return locateRegionInMeta(META_TABLE_NAME, tableName, row, useCache, userRegionLock);



}



}

这是一个递归调用的过程：

获取Table2，RowKey为RK10000的RegionServer



=>



获取.META.，RowKey为Table2,RK10000, 99999999999999的RegionServer



=>



获取-ROOT-，RowKey为.META.,Table2,RK10000,99999999999999,99999999999999的RegionServer



=>



获取-ROOT-的RegionServer



=>



从ZooKeeper得到-ROOT-的RegionServer



=>



从-ROOT-表中查到RowKey最接近（小于）

.META.,Table2,RK10000,99999999999999,99999999999999的一条Row，并得到.META.的RegionServer



=>



从.META.表中查到RowKey最接近（小于）Table2,RK10000, 99999999999999的一条Row，并得到Table2的RegionServer



=>



从Table2中查到RK10000的Row

到此为止Client完成了路由RegionServer的整个过程，在整个过程中使用了添加“99999999999999”后缀并查找最接近（小于）RowKey的方法。对于这个方法大家可以仔细揣摩一下，并不是很难理解。

最后要提醒大家注意两件事情：

在整个路由过程中并没有涉及到MasterServer，也就是说HBase日常的数据操作并不需要MasterServer，不会造成MasterServer的负担。
Client端并不会每次数据操作都做这整个路由过程，很多数据都会被Cache起来。至于如何Cache，则不在本文的讨论范围之内。

【上篇】Apache常用函数
【下篇】CodeForces Good Bye 2013

作者: maha

该日志由 maha 于11年前发表在综合分类下，最后更新于 2013年03月10日.
转载请注明: HBase region预划分及查找过程 | 学步园 +复制链接

抱歉!评论已关闭.

学步园