Allow regions of specific table to be load-balanced

现在的位置: 首页 > 综合 > 正文

Allow regions of specific table to be load-balanced

2014年03月25日 ⁄ 综合 ⁄ 共 3451字 ⁄ 字号小中大 ⁄ 评论关闭

Description：

From our experience, cluster can be well balanced and yet, one table's regions may be badly concentrated on few region servers.
For example, one table has 839 regions (380 regions at time of table creation) out of which 202 are on one server.

It would be desirable for load balancer to distribute regions for
specified tables evenly across the cluster. Each of such tables has
number of regions many times the cluster size.

Jonathan Gray
给出的第一个comments是：

On cluster startup in 0.90, regions are assigned in one of two ways.
By default, it will attempt to retain the previous assignment of the
cluster. The other option which I've also used is round-robin. This
will evenly distribute each table.

That plus the change to do round-robin on table create should probably cover per-table distribution fairly well.

I think the next step in the load balancer is a major effort to
switch to something with more of a cost-based approach. I think ideally
you don't need even distribution of each table, you want even
distribution of load. If one hot table, it will get evenly balanced
anyways.

One thing we could do is get rid of all random assignments and always
try to do some kind of quick load balance or round-robin. It does seem
like randomness always leads to one guy who gets an unfair share.

Matt Corgan
提出可以用一致性哈希解决：

Have you guys considered using a
consistent hashing method to choose which server a region belongs to?
You would create ~50 buckets for each server by hashing
serverName_port_bucketNum, and then hash the start key of each region
into the buckets.

There are a few benefits:

when you add a server it takes an equal load from all existing servers
if you remove a server it distributes its regions equally to the remaining servers
adding a server does not cause all regions to shuffle like round robin assignment would
assignment is nearly random, but repeatable, so no hot spots
when a region splits the front half will stay on the same server, but the back half will usually be sent to another server

And a few drawbacks:

each server wouldn't end up with exactly the same number of regions, but they would be close
if a hot spot does end up developing, you can't do anything about
it, at least not unless it supported a list of manual overrides

Jonathan Gray
给出了不能使用一致性哈希的原因：

I think consistent hashing would be a
major step backwards for us and unnecessary because there is no cost of
moving bits around in HBase. The primary benefit of consistent hashing
is that it reduces the amount of data you have to physically move
around. Because of our use of HDFS, we never have to move physical data
around.

In your benefit list, we are already implementing almost all of these
features, or if not, it is possible in the current architecture. In
addition, our architecture is extremely flexible and we can do all kinds
of interesting load balancing techniques related to actual load
profiles not just #s of shards/buckets as we do today or as would be
done with consistent hashing.

The fact that split regions open back up on the same server is
actually an optimization in many cases because it reduces the amount of
time the regions are offline and when they come back online and do a
compaction to drop references, all the files are more likely to be on
the local DataNode rather than remote. In some cases, like time-series,
you may want the splits to move to different servers. I could imagine
some configurable logic in there to ensure the bottom half goes to a
different server (or maybe the top half would actually be more efficient
to move away since most the time you'll write more to the bottom half
and thus want the data locality / quick turnaround). There's likely
going to be a bit of split rework in 0.92 to make it more like the
ZK-based regions-in-transition.

As far as binding regions to servers between cluster restarts, this is already implemented and on by default in 0.90.

Consistent hashing also requires a fixed keyspace (right?) and that's a mismatch for HBase's flexibility in this regard.

更多讨论参考：https://issues.apache.org/jira/browse/HBASE-3373

HBase中Region分配问题的探讨： http://www.spnguru.com/?p=246