云计算学习笔记—04-运行hadoop的例子程序：统计字符–wordcount例子程序

现在的位置: 首页 > 综合 > 正文

RSS

云计算学习笔记—04-运行hadoop的例子程序：统计字符–wordcount例子程序

2018年04月05日 ⁄ 综合 ⁄ 共 4116字 ⁄ 字号小中大 ⁄ 评论关闭

04-运行wordcount例子程序

下面可以看下hadoop的例子程序：

hadoop-0.20.2-examples.jar

注意这里用到的wordcount.txt中的内容为：

hello hadoop credream hello hadoop credream hello hadoop credream hello hadoop credream hello hadoop credream

重复了15行

然后上传到hdfs中的test文件夹下

hadoop fs –put wordcount.txt /test/

hadoop fs –ls /test

可以看到已经上传成功了。

xiaofeng@xiaofeng-PC /opt/hadoop

$ cd /opt/hadoop//进入hadoop目录

xiaofeng@xiaofeng-PC /opt/hadoop

$ ls//查看hadoop中的文件

CHANGES.txt conf hadoop-0.20.2-core.jar librecordio

LICENSE.txt conf-local hadoop-0.20.2-examples.jar logs

NOTICE.txt conf-pseudo hadoop-0.20.2-test.jar src

README.txt conf.lnk hadoop-0.20.2-tools.jar webapps

bin contrib ivy

build.xml docs ivy.xml

c++ hadoop-0.20.2-ant.jar lib

hadoop-0.20.2-examples.jar//这个就是hadoop提供的一个wordcount例子程序

xiaofeng@xiaofeng-PC /opt/hadoop

$ hadoop jar hadoop-0.20.2-examples.jar//运行这条命令可以看到

An example program must be given as thefirst argument.

Valid program names are:

aggregatewordcount: An Aggregate based map/reduce program that countsthe word

s in the input files.

aggregatewordhist: An Aggregate based map/reduce program that computesthe his

togram of the words in the input files.

dbcount: An example job that count the pageview counts from a database.

grep: A map/reduce program that counts the matches of a regex in theinput.

join: A job that effects a join over sorted, equally partitioneddatasets

multifilewc: A job that counts words from several files.

pentomino: A map/reduce tile laying program to find solutions topentomino pro

blems.

pi:A map/reduce program that estimates Pi using monte-carlo method.

randomtextwriter: A map/reduce program that writes 10GB of randomtextual data

pernode.

randomwriter: A map/reduce program that writes 10GB of random data pernode.

secondarysort:An example defining a secondary sort to the reduce.

sleep: A job that sleeps at each map and reduce task.

sort: A map/reduce program that sorts the data written by the randomwriter.

sudoku: A sudoku solver.

teragen: Generate data for the terasort

terasort: Run the terasort

teravalidate: Checking results of terasort

//这个程序可以统计文件中的每个单词的出现次数

wordcount: A map/reduce program that countsthe words in the input files.

xiaofeng@xiaofeng-PC/opt/hadoop

$ hadoop jarhadoop-0.20.2-examples.jar wordcount

Usage:wordcount <in> <out>

//这里给出了这个程序的使用方法

xiaofeng@xiaofeng-PC /opt/hadoop

$ hadoop jar hadoop-0.20.2-examples.jarwordcount /test/wordcount.txt /test/result

//这里是运行hadoop-0.20.2-examples.jar这个jar文件来统计hdfs系统中test/下的wordcount.txt文件然后把统计结果放到/test/result文件中

运行的时候可以在http://localhost:50030/jobtracker.jsp这里查看进程的运行状态

RunningJobs

Jobid

Priority

User

Name

Map % Complete

Map Total

Maps Completed

Reduce % Complete

Reduce Total

Reduces Completed

Job Scheduling Information

job_201304211117_0001

NORMAL

xiaofeng-pc\xiaofeng

word count

0.00%

Completed Jobs

Jobid

Priority

User

Name

Map % Complete

Map Total

Maps Completed

Reduce % Complete

Reduce Total

Reduces Completed

Job Scheduling Information

job_201304211117_0001

NORMAL

xiaofeng-pc\xiaofeng

word count

100.00%

可以看到运行这个程序的时候先执行了Map，然后执行了Reduce，等完成了之后就不在RunningJobs中显示了，就会在Completed Jobs中显示100%

下面查看统计结果http://127.0.0.1:50070/点击进入hdfs系统，然后进入test下的result文件，查看

Goto parent directory

Name	Type	Size	Replication	Block Size	Modification Time	Permission	Owner	Group
_logs	dir	2013-04-21 11:52	rwxr-xr-x	xiaofeng-pc\xiaofeng	supergroup
part-r-00000	file	0.03 KB	3	64 MB	2013-04-21 11:52	rw-r--r--	xiaofeng-pc\xiaofeng	supergroup

这里的part-r-00000点击这个就可以查看了。

credream 65

hadoop 65

hello 65

可以看到已经统计出了结果

也可以通过命令查看：

xiaofeng@xiaofeng-PC /opt/hadoop

$ hadoop fs -ls /test/result//列出文件

Found 2 items

drwxr-xr-x - xiaofeng-pc\xiaofeng supergroup 0 2013-04-21 11:52 /test

/result/_logs

-rw-r--r-- 3 xiaofeng-pc\xiaofeng supergroup 31 2013-04-21 11:52 /test

/result/part-r-00000

xiaofeng@xiaofeng-PC /opt/hadoop

$ hadoop fs -cat /test/result/part*//注意这里使用了通配符，也就是开头是part的文件

credream 65

hadoop 65

hello 65

xiaofeng@xiaofeng-PC/opt/hadoop

另外在D:\hadoop4win\opt\hadoop也可以看到hadoop-0.20.2-examples.jar这个例子程序

2、MapReduce
范例操作－测试字符统计

先向服务器上传一文件：

执行统计：

root@linux:/home/wangjian#hadoop fs -copyFromLocal a.txt /wj/a.txt

root@linux:/home/wangjian#cd /opt/hadoop-0.20.2/

root@linux:/opt/hadoop-0.20.2#hadoop jar hadoop-0.20.2-examples.jar wordcount /wj/a.txt /wj/result2

4、查看结果：

默认情况下结果文件名为：/wj/result2/part-0000

root@linux:/opt/hadoop-0.20.2#hadoop fs -cat /wj/result2/part*

【上篇】Android零碎要点—eclipse两个小技巧
【下篇】EJB3.0学习笔记–SOAP-AXIS–深入Soap引擎

作者: Txkytnfs

该日志由 Txkytnfs 于6年前发表在综合分类下，最后更新于 2018年04月05日.
转载请注明: 云计算学习笔记—04-运行hadoop的例子程序：统计字符–wordcount例子程序 | 学步园 +复制链接

抱歉!评论已关闭.

学步园