keywords: hive, slow query, many splits, inputformat
level: high
evidence:
different version hive compare:
1 node, select count(1) from t, time less than
2 nodes, same query
(t contains about 100 rows in 100 partitions)
==job log==
many splits, about 100 splits for this job
reason:
hive miss configuartion.
hive.input.format default is FileInputformat (which will not merge small tables)
solution:
modify hive-default.xml
change hive.input.format to CombinerInputFormat