1)streaming combiner 不一定非是java程序
2)
combiner 与-numReduceTasks 1有关
mjiang@venus ~/java/eclipse/target-hadoop/Streaming-jar $ hadoop jar ~/hadoop-1.0.0/contrib/streaming/hadoop-streaming-1.0.0.jar -input "/test" -output "o16" -mapper 'cat' -combiner 'cat' -inputformat TextInputFormat -numReduceTasks 1
如果没有reduce的话,combiner不运行
3)
combiner只能一行一行的处理
mjiang@venus ~/java/eclipse/target-hadoop/index/src-py $ cat test
adz
bdz
cdz
ddz
edz
fd
gdz
hdz
ifz
jdz
kf
ldfz
mdz
ndz
mjiang@venus ~/java/eclipse/target-hadoop/Streaming-jar $ hadoop jar ~/hadoop-1.0.0/contrib/streaming/hadoop-streaming-1.0.0.jar -input "/test" -output "o16" -mapper 'cat' -combiner 'cut -b1-2' -inputformat TextInputFormat -numReduceTasks
1
mjiang@venus ~/java/eclipse/target-hadoop/index/src-py $ cat 23.re
ad
bd
cd
dd
ed
fd
gd
hd
if
jd
kf
ld
md
nd
OK
但是像wc或者grep 就不行
-numReduceTasks 0 也不能启动combiner
4)
streaming很讨厌的一点是 不能调试