现在的位置: 首页 > 综合 > 正文

Hadoop安装、Hadoop环境搭建(Apache)版本

2012年02月15日 ⁄ 综合 ⁄ 共 4237字 ⁄ 字号 评论关闭

  今天早上帮一新人远程搭建Hadoop集群(1.x或者0.22以下版本),感触颇深,在此写下最简单的Apache Hadoop搭建方法,给新人提供帮助,我尽量说得详尽点;点击查看Avatorhadoop搭建步骤。

1.环境准备:

  1).机器准备:安装目标机器要能相互ping通,所以对于不同机器上的虚拟机要采取"桥连接"的方式进行网络配置(如果是宿主方式,要先关闭宿主机防火墙;上网方式的具体配置方法请google vmvair上网配置、Kvm桥连接上网、Xen在安装的时候就能够手动配置局域网IP,实在不行,请留言);关闭机器的防火墙:/etc/init.d/iptables stop;chkconfig iptables off;修改机器的主机名建议用hadoopservern,n为实际你给机器的机器编号,因为主机名如果含有'_''.'等特殊符号会导致启动问题的。修改机器的/etc/hosts,将IP和hostname的映射关系添加进去。

  2).下载稳定版本Hadoop包并解压,配置Java环境(对于java环境,一般都配置~/.bash_profile,考虑到机器的安全性问题);

  3).免密钥,这里有个小的技巧:在hadoopserver1上

    ssh-kengen -t rsa -P '';一路回车

    ssh-copy-id user@host;

    然后将~/.ssh/目录下的id_rsa和id_rsa.pub,复制到其它机器;

    ssh hadoopserver2;运行scp -r ~/.ssh/authorized_keys hadoopserver1:~/.ssh/;这样所有的免密钥都完成了,可以相互互相ssh;多实际,多学习,网上没有说hadoop免密钥用用ssh-copy-id来简化操作的。

2.步骤:

  1).在hadoopserver1(namenode)上hadoop解压目录的conf下修改下面几个文件:

    core-site.xml:

      

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
          <name>fs.default.name</name>
          <value>hdfs://hadoopserver1:9000</value>
</property>

<property>
          <name>hadoop.tmp.dir</name>
          <value>/xxx/hadoop-version/tmp</value>
</property>

</configuration>

 

    hdfs-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
  <name>dfs.permissions</name>
  <value>false</value>
</property>
    
    <property>
          <name>dfs.replication</name>
          <value>3</value>
        </property>

        <property>
          <name>dfs.name.dir</name>
          <value>/xxx/hadoop-version/name</value>
        </property>

        <property>
          <name>dfs.data.dir</name>
          <value>/xxx/hadoop-version/data</value>
        </property>

        <property>
          <name>dfs.block.size</name>
          <value>670720</value>
        </property>
<!--
<property>
  <name>dfs.secondary.http.address</name>
  <value>0.0.0.0:60090</value>
  <description>
    The secondary namenode http server address and port.
    If the port is 0 then the server will start on a free port.
  </description>
</property>

<property>
  <name>dfs.datanode.address</name>
  <value>0.0.0.0:60010</value>
  <description>
    The address where the datanode server will listen to.
    If the port is 0 then the server will start on a free port.
  </description>
</property>

<property>
  <name>dfs.datanode.http.address</name>
  <value>0.0.0.0:60075</value>
  <description>
    The datanode http server address and port.
    If the port is 0 then the server will start on a free port.
  </description>
</property>

<property>
  <name>dfs.datanode.ipc.address</name>
  <value>0.0.0.0:60020</value>
  <description>
    The datanode ipc server address and port.
    If the port is 0 then the server will start on a free port.
  </description>
</property>



<property>
  <name>dfs.http.address</name>
  <value>0.0.0.0:60070</value>
  <description>
    The address and the base port where the dfs namenode web ui will listen on.
    If the port is 0 then the server will start on a free port.
  </description>
</property>
-->

<property>
  <name>dfs.support.append</name>
  <value>true</value>
  <description>Does HDFS allow appends to files?
               This is currently set to false because there are bugs in the
               "append code" and is not supported in any prodction cluster.
  </description>
</property>

</configuration>

    mapred-site.xml

      

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>


        <property>
          <name>mapred.job.tracker</name>
          <value>hadoopserver1:9001</value>
        </property>

        <property>
          <name>mapred.tasktracker.map.tasks.maximum</name>
          <value>2</value>
        </property>

        <property>
          <name>mapred.tasktracker.reduce.tasks.maximum</name>
          <value>2</value>
        </property>
<!--
<property>    
  <name>mapred.job.tracker.http.address</name>
  <value>0.0.0.0:50030</value>
  <description>
    The job tracker http server address and port the server will listen on.
    If the port is 0 then the server will start on a free port.
  </description>
</property>

<property>
  <name>mapred.task.tracker.http.address</name>
  <value>0.0.0.0:60060</value>
  <description>
    The task tracker http server address and port.
    If the port is 0 then the server will start on a free port.
  </description>
</property>
-->


</configuration>

    master中填写的是secondname的hostname,用来告知hadoop在这个机器上启动secondname;

    slaves则标示的是datanode节点,一行一个hostname

  2).修改hadoop-env.sh:

    指定JAVA_HOME到你的java安装目录

    添加一个启动环境:export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"。用来保证绑定IPV4IP;

  3).手动分发:scp -r hadoop目录 hadoopserver1...n:/相同前缀目录/

  4).启动:

    bin/hadooop namenode -format

    bin/start-all.sh

  5).在浏览器里输入http://hadoopserver1的iP:50070即可查看机器的状态

【上篇】
【下篇】

抱歉!评论已关闭.