现在的位置: 首页 > 综合 > 正文

hadoop实例之伪分布式模式

2013年06月08日 ⁄ 综合 ⁄ 共 4247字 ⁄ 字号 评论关闭

伪分布式模式就是用单机模拟多台机器的情况。

1.需要添加hadoop登陆权限,在分布式系统中,NameNode需要ssh权限来控制dataNode节点上进程的开始 过程和结束

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa 
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
注意,需要开启sshd服务

2.修改配置文件:伪分布式模式下的node节点配置情况。需要修改core-site.xml文件,添加如下内容:

<configuration>
  <property>
    <name>fs.default.name</name>
    <value>localhost:9000</value>
  </property>
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
</configuration>
fs.default.name配置namenode节点的ip和端口号;dfs.replication是datanode节点block数据的备份冗余数目。

老的教程中会让你修改hadoop-site.xml文件,0.2的版本后已经被core-site替代,但是其实也是支持hadoop-site的配置的。

3.格式化hdfs文件系统:bin/hadoop namenode - format会在/tmp下面生成一个分布式文件系统目录。

STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = songjings-macpro31.local/10.13.42.56
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
11/08/08 19:10:36 INFO namenode.FSNamesystem: fsOwner=songjing,staff,_lpadmin,com.apple.sharepoint.group.1,_appserveradm,_appserverusr,admin
11/08/08 19:10:36 INFO namenode.FSNamesystem: supergroup=supergroup
11/08/08 19:10:36 INFO namenode.FSNamesystem: isPermissionEnabled=true
11/08/08 19:10:36 INFO common.Storage: Image file of size 98 saved in 0 seconds.
11/08/08 19:10:36 INFO common.Storage: Storage directory /tmp/hadoop-songjing/dfs/name has been successfully formatted.
11/08/08 19:10:36 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at songjings-macpro31.local/10.13.42.56

4.启动hadoop进程:

bin/start-all.sh 
会启动五个java进程:JobTrack,TaskTrack,namenode,datanode和SecondNameNode(back up for namenode) 
starting namenode, logging to /Users/songjing/projects/hadoop-0.20.2/bin/../logs/hadoop-songjing-namenode-songjings-macpro31.local.out
localhost: starting datanode, logging to /Users/songjing/projects/hadoop-0.20.2/bin/../logs/hadoop-songjing-datanode-songjings-macpro31.local.out
localhost: starting secondarynamenode, logging to /Users/songjing/projects/hadoop-0.20.2/bin/../logs/hadoop-songjing-secondarynamenode-songjings-macpro31.local.out
starting jobtracker, logging to /Users/songjing/projects/hadoop-0.20.2/bin/../logs/hadoop-songjing-jobtracker-songjings-macpro31.local.out
localhost: starting tasktracker, logging to /Users/songjing/projects/hadoop-0.20.2/bin/../logs/hadoop-songjing-tasktracker-songjings-macpro31.local.out


5.跑例子:

先将本地文件copy到dfs文件系统中:

bin/hadoop dfs -put ./test-in input

然后执行计算:

bin/hadoop jar hadoop-0.16.0-examples.jar wordcount input output

11/08/08 19:28:35 INFO input.FileInputFormat: Total input paths to process : 2
11/08/08 19:28:36 INFO mapred.JobClient: Running job: job_201108081923_0001
11/08/08 19:28:37 INFO mapred.JobClient:  map 0% reduce 0%
11/08/08 19:28:55 INFO mapred.JobClient:  map 100% reduce 0%
11/08/08 19:29:09 INFO mapred.JobClient:  map 100% reduce 100%
11/08/08 19:29:14 INFO mapred.JobClient: Job complete: job_201108081923_0001
11/08/08 19:29:14 INFO mapred.JobClient: Counters: 17
11/08/08 19:29:14 INFO mapred.JobClient:   Job Counters 
11/08/08 19:29:14 INFO mapred.JobClient:     Launched reduce tasks=1
11/08/08 19:29:14 INFO mapred.JobClient:     Launched map tasks=2
11/08/08 19:29:14 INFO mapred.JobClient:     Data-local map tasks=2
11/08/08 19:29:14 INFO mapred.JobClient:   FileSystemCounters
11/08/08 19:29:14 INFO mapred.JobClient:     FILE_BYTES_READ=78
11/08/08 19:29:14 INFO mapred.JobClient:     HDFS_BYTES_READ=49
11/08/08 19:29:14 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=226
11/08/08 19:29:14 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=40
11/08/08 19:29:14 INFO mapred.JobClient:   Map-Reduce Framework
11/08/08 19:29:14 INFO mapred.JobClient:     Reduce input groups=5
11/08/08 19:29:14 INFO mapred.JobClient:     Combine output records=6
11/08/08 19:29:14 INFO mapred.JobClient:     Map input records=2
11/08/08 19:29:14 INFO mapred.JobClient:     Reduce shuffle bytes=84
11/08/08 19:29:14 INFO mapred.JobClient:     Reduce output records=5
11/08/08 19:29:14 INFO mapred.JobClient:     Spilled Records=12
11/08/08 19:29:14 INFO mapred.JobClient:     Map output bytes=81
11/08/08 19:29:14 INFO mapred.JobClient:     Combine input records=8
11/08/08 19:29:14 INFO mapred.JobClient:     Map output records=8
11/08/08 19:29:14 INFO mapred.JobClient:     Reduce input records=6

察看结果:

bin/hadoop dfs -cat output/*

也可以将其copy到本地再察看:

by	1
goodbye	1
hadoop	2
hello	2
world	2

$ bin/hadoop dfs -get output output 
$ cat output/*

抱歉!评论已关闭.