现在的位置: 首页 > 综合 > 正文

Hive某些版本在查询多个小文件组成的hive表时时间超长

2013年09月06日 ⁄ 综合 ⁄ 共 405字 ⁄ 字号 评论关闭
keywords: hive, slow query, many splits, inputformat
level: high
evidence:
different version hive compare:
1 node, select count(1) from t, time less than
2 nodes, same query
(t contains about 100 rows in 100 partitions)

==job log==
many splits, about 100 splits for this job

reason:
hive miss configuartion.
hive.input.format default is FileInputformat (which will not merge small tables)

solution:
modify hive-default.xml
change hive.input.format to CombinerInputFormat

抱歉!评论已关闭.