偶然一次停电,导致机房机器重启,于是很多机器静态IP冲突失效。好不容易把网络调整好,Hbase集群却无法正常启动,Hadoop却可以正常使用。大致情况如下:
HTTP ERROR: 500
Trying to contact region server null for region , row ", but failed after 3 attempts.
Exceptions:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy to /10.10.11.184:60020 after attempts=1
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy to /10.10.11.184:60020 after attempts=1
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy to /10.10.11.184:60020 after attempts=1
RequestURI=/master.jsp
Caused by:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server null for region , row ", but failed after 3 attempts.
Exceptions:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy to /10.10.11.184:60020 after attempts=1
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy to /10.10.11.184:60020 after attempts=1
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy to /10.10.11.184:60020 after attempts=1
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1002)
at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:55)
at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:28)
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.listTables(HConnectionManager.java:433)
at org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:127)
at org.apache.hadoop.hbase.generated.master.master_jsp._jspService(master_jsp.java:125)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:324)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
查看Hbase master的日志,大致错误如下:
日志呈现结果大致是无法连接到 /10.10.11.184:60020 ,不能连接到regionServer
0:0:0:0:0:0:0:1:60020
Now=10.10.11.184
optionallogflushinternal=10000ms
2011-12-08 22:45:02,566 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=RegionServer, sessionId=regionserver/0:0:0:0:0:0:0:1:60020
打开/etc/hosts文件,果不其然,regionServer的主机名映射的是 ::1,
我记得这个主机名和当前IP的映射应该是网卡初始化的时候由NetworkManager自动加上去的,
把 ::1改成 regionServer的实际地址(或重新添加实际地址与主机名的映射),重启Hbase,访问WEB站点,成功启动!
回顾整个hadoop集群启动失败这个过程,先是断电,然后是静态IP失效,问题应该就是在这个时候产生的,在实效的同试hosts文件被修改。
其实HBASE集群配置相当简单,但由于一些小小的失误,往往是我们自以为不会出错的地方,查看log的时候也不细心,然后以至于浪费我们很多的时间。至于IP与主机名映射的问题是我们配置集群的时候最容易忽视的地方,大家在下次碰到这类相关问题的时候不妨去看看hosts文件。