Hadoop HelloWord Examples – A Simple Sort

现在的位置: 首页 > 综合 > 正文

Hadoop HelloWord Examples – A Simple Sort

2019年04月17日 ⁄ 综合 ⁄ 共 2698字 ⁄ 字号小中大 ⁄ 评论关闭

来了近两个星期趁还没开学都是各种活动，相对有空，抓紧时间hadoop入门。不得不说Hadoop the Definitive Guide是本好书，但是却不是一本好的入门书，一上来讲了一堆各种Hadoop架构，对与一个菜鸟来说读起来总感觉有点心虚，一行Hadoop代码没写过,一直看各种Hadoop的架构，让人感觉非常的不踏实。找来找去也只是看到一个WordCount的demo，还好实验室Xia兄推荐了个文档，是虾皮工作室写的，名字叫做“细细评味Hadoop”系列的第9章，好几个由简单到复杂的demo，推荐，并在此对作者表示感谢。

吐槽下：Hadoop的官方文档应该学学directx sdk的官方文档，各种由简单到复杂的demo，后期demo都是不少经典论文的实现，效果也非常cool，加上足够的说明，一个个下来让人感觉非常的踏实和日益精进。相比之下Hadoop的官方文档也太简陋了一点了。

这个demo是对数据做简单的排序。学了wordcount后有点入门后，大家都知道经过map函数后，到达reduce之前会有个shuffle和sort的过程，这个过程主要对map函数output的key进行排序。我们就利用这个过程来对我们自己的数据排序。这样子思路就很简单了，在map阶段，我们将一个个值作为key输出，value随便写，reduce阶段将这些map阶段输入的key直接写出来就可以了。当然为了增加趣味性，可以增加一个变量count统计这个key值排在第几位。。

输入数据可以是：

//data1.txt：

123
12
87
150
22
23423
9874
9876

//data2.txt

29347
9877
27985
98776
989
767
2345
1532
8702
8702

详细代码如下：

import java.util.*;
import java.awt.datatransfer.StringSelection;
import java.io.*;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class DataSort {
	
	public static class SortMapper extends Mapper<Object,Text,IntWritable,IntWritable>{
		
		IntWritable one = new IntWritable(1);
		@Override
		public void map(Object key, Text value, Context context)throws IOException, InterruptedException
		{
			context.write( new IntWritable(Integer.parseInt(value.toString())), one);
		}
	}
	
	public static class SortReducer extends Reducer<IntWritable, IntWritable, IntWritable, IntWritable>{
		
		private static IntWritable count = new IntWritable(0);
		
		@Override
		public void reduce(IntWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException
		{	
			for(IntWritable val : values)
			{
				context.write(key, count);
				count.set(count.get() + 1);
			}
		}
	}
	
	
	public static void main(String[] args) throws Exception
	{
		Configuration conf = new Configuration();
		
		Job job =new Job(conf,"DataSort");
		job.setJarByClass(DataSort.class);
		
		job.setMapperClass(SortMapper.class);
		job.setReducerClass(SortReducer.class);
		
		job.setOutputKeyClass(IntWritable.class);
		job.setOutputValueClass(IntWritable.class);
		
		FileInputFormat.addInputPath(job, new Path(args[0]));
		FileOutputFormat.setOutputPath(job, new Path(args[1]));
		
		System.exit(job.waitForCompletion(true) ? 0 : 1);
	}
}

最终结果输出：

12   0
22   1
87   2
123   3
150   4
767   5
989   6
1532   7
2345   8
8702   9
8702   10
9874   11
9876   12
9877   13
23423   14
27985   15
29347   16
98776   17

最后分享下我犯的一个弱智错误，继承Mapper和Reducer两个虚类后必须实现map和reduce函数，但是我reduce函数不小心写成reducer，导致整个程序相当于从来没有进入reduce阶段，导致最后输出的结果一直是map的中间结果，还好Xia兄过来看后发现了这个错误。大家以后可以加上标志@Override，这样子以后万一不小心写错了编译器也可以提示。。

【上篇】Hadoop HelloWord Examples- 求平均数
【下篇】Hadoop 解除 “Name node is in safe mode”

作者: fascia

该日志由 fascia 于5年前发表在综合分类下，最后更新于 2019年04月17日.
转载请注明: Hadoop HelloWord Examples – A Simple Sort | 学步园 +复制链接

抱歉!评论已关闭.

学步园

Hadoop HelloWord Examples – A Simple Sort

作者: fascia

书签

最新文章New

本站推荐

返回首页