Hadoop安装、Hadoop环境搭建(Apache)版本

现在的位置: 首页 > 综合 > 正文

Hadoop安装、Hadoop环境搭建(Apache)版本

2012年02月15日 ⁄ 综合 ⁄ 共 4237字 ⁄ 字号小中大 ⁄ 评论关闭

　　今天早上帮一新人远程搭建Hadoop集群(1.x或者0.22以下版本)，感触颇深，在此写下最简单的Apache Hadoop搭建方法，给新人提供帮助，我尽量说得详尽点；点击查看Avatorhadoop搭建步骤。

1.环境准备:

　　1).机器准备：安装目标机器要能相互ping通，所以对于不同机器上的虚拟机要采取"桥连接"的方式进行网络配置(如果是宿主方式，要先关闭宿主机防火墙；上网方式的具体配置方法请google vmvair上网配置、Kvm桥连接上网、Xen在安装的时候就能够手动配置局域网IP，实在不行，请留言)；关闭机器的防火墙：/etc/init.d/iptables stop;chkconfig iptables off;修改机器的主机名建议用hadoopservern，n为实际你给机器的机器编号，因为主机名如果含有'_''.'等特殊符号会导致启动问题的。修改机器的/etc/hosts，将IP和hostname的映射关系添加进去。

　　2).下载稳定版本Hadoop包并解压，配置Java环境(对于java环境，一般都配置~/.bash_profile，考虑到机器的安全性问题)；

　　3).免密钥，这里有个小的技巧：在hadoopserver1上

　　　　ssh-kengen -t rsa -P '';一路回车

　　　　ssh-copy-id user@host;

　　　　然后将~/.ssh/目录下的id_rsa和id_rsa.pub，复制到其它机器；

　　　　ssh hadoopserver2；运行scp -r ~/.ssh/authorized_keys hadoopserver1:~/.ssh/；这样所有的免密钥都完成了，可以相互互相ssh；多实际，多学习，网上没有说hadoop免密钥用用ssh-copy-id来简化操作的。

2.步骤：

　　1).在hadoopserver1(namenode)上hadoop解压目录的conf下修改下面几个文件:

　　　　core-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
          <name>fs.default.name</name>
          <value>hdfs://hadoopserver1:9000</value>
</property>

<property>
          <name>hadoop.tmp.dir</name>
          <value>/xxx/hadoop-version/tmp</value>
</property>

</configuration>

　　　　hdfs-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
  <name>dfs.permissions</name>
  <value>false</value>
</property>
    
    <property>
          <name>dfs.replication</name>
          <value>3</value>
        </property>

        <property>
          <name>dfs.name.dir</name>
          <value>/xxx/hadoop-version/name</value>
        </property>

        <property>
          <name>dfs.data.dir</name>
          <value>/xxx/hadoop-version/data</value>
        </property>

        <property>
          <name>dfs.block.size</name>
          <value>670720</value>
        </property>
<!--
<property>
  <name>dfs.secondary.http.address</name>
  <value>0.0.0.0:60090</value>
  <description>
    The secondary namenode http server address and port.
    If the port is 0 then the server will start on a free port.
  </description>
</property>

<property>
  <name>dfs.datanode.address</name>
  <value>0.0.0.0:60010</value>
  <description>
    The address where the datanode server will listen to.
    If the port is 0 then the server will start on a free port.
  </description>
</property>

<property>
  <name>dfs.datanode.http.address</name>
  <value>0.0.0.0:60075</value>
  <description>
    The datanode http server address and port.
    If the port is 0 then the server will start on a free port.
  </description>
</property>

<property>
  <name>dfs.datanode.ipc.address</name>
  <value>0.0.0.0:60020</value>
  <description>
    The datanode ipc server address and port.
    If the port is 0 then the server will start on a free port.
  </description>
</property>



<property>
  <name>dfs.http.address</name>
  <value>0.0.0.0:60070</value>
  <description>
    The address and the base port where the dfs namenode web ui will listen on.
    If the port is 0 then the server will start on a free port.
  </description>
</property>
-->

<property>
  <name>dfs.support.append</name>
  <value>true</value>
  <description>Does HDFS allow appends to files?
               This is currently set to false because there are bugs in the
               "append code" and is not supported in any prodction cluster.
  </description>
</property>

</configuration>

　　　　mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>


        <property>
          <name>mapred.job.tracker</name>
          <value>hadoopserver1:9001</value>
        </property>

        <property>
          <name>mapred.tasktracker.map.tasks.maximum</name>
          <value>2</value>
        </property>

        <property>
          <name>mapred.tasktracker.reduce.tasks.maximum</name>
          <value>2</value>
        </property>
<!--
<property>    
  <name>mapred.job.tracker.http.address</name>
  <value>0.0.0.0:50030</value>
  <description>
    The job tracker http server address and port the server will listen on.
    If the port is 0 then the server will start on a free port.
  </description>
</property>

<property>
  <name>mapred.task.tracker.http.address</name>
  <value>0.0.0.0:60060</value>
  <description>
    The task tracker http server address and port.
    If the port is 0 then the server will start on a free port.
  </description>
</property>
-->


</configuration>

　　　　master中填写的是secondname的hostname，用来告知hadoop在这个机器上启动secondname；

　　　　slaves则标示的是datanode节点，一行一个hostname

　　2).修改hadoop-env.sh：

　　　　指定JAVA_HOME到你的java安装目录

　　　　添加一个启动环境：export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"。用来保证绑定IPV4IP；

　　3).手动分发：scp -r hadoop目录 hadoopserver1...n:/相同前缀目录/

　　4).启动：

　　　　bin/hadooop namenode -format

　　　　bin/start-all.sh

　　5).在浏览器里输入http://hadoopserver1的iP:50070即可查看机器的状态

【上篇】Linux总结
【下篇】常见游戏端口

作者: provoking

该日志由 provoking 于12年前发表在综合分类下，最后更新于 2012年02月15日.
转载请注明: Hadoop安装、Hadoop环境搭建(Apache)版本 | 学步园 +复制链接

抱歉!评论已关闭.

学步园

Hadoop安装、Hadoop环境搭建(Apache)版本

作者: provoking

书签

最新文章New

本站推荐

返回首页