现在的位置: 首页 > 综合 > 正文

Sqoop客户端安装以及使用小结

2014年09月05日 ⁄ 综合 ⁄ 共 3670字 ⁄ 字号 评论关闭

SQOOP是一款开源的工具,主要用于在HADOOP(Hive)与传统的数据库(mysql、postgresql...)间进行数据的传递,下面从SQOOP用户手册上摘录一段描述

Sqoop is a tool designed to transfer data between Hadoop andrelational databases. You can use Sqoop to import data from arelational database management system (RDBMS) such as MySQL or Oracle  into the Hadoop Distributed File System (HDFS),transform the data
in Hadoop MapReduce, and then export the data back into an RDBMS.

下面就是我在自己机子上配置sqoop客户端和使用的过程。

系统:ubuntu12.04

前提:

  1. 客户端已经配置好hadoop环境。
  2. 设置环境变量HADOOP_HOME。

 

版本:

  1. hadoop:hadoop-1.0.3 
  2. sqoop: Sqoop 1.4.1-incubating

 

客户端配置:

  1. 配置sqoop
    1. http://mirror.bit.edu.cn/apache/sqoop/1.4.1-incubating/ 下载 sqoop-1.4.1-incubating__hadoop-0.20.tar.gz 

      解压  tar zxvf sqoop-1.4.1-incubating__hadoop-0.20.tar.gz
    2. 修改SQOOP的文件bin/configure-sqoop,注释掉hbase检查(除非你准备使用HABASE等HADOOP上的组件,否则每次都出现warining信息)
      即注释掉:
      if [ ! -d "${HBASE_HOME}" ]; then
      echo "Warning: $HBASE_HOME does not exist! HBase imports will fail."
      echo 'Please set $HBASE_HOME to the root of your HBase installation.'

      fi
    3. 修改PATH,增加sqoop命令。

       $ vi  ~/.profile
       #增加
       export PATH=/path/to/your/sqoopHome/bin:$PATH
      使修改生效:
      $  source ~/.profile 

      现在在终端运行 sqoop就可以了。
  2. 配置JDBC数据库驱动
    1. 数据库安装
      本机要安装关系型数据的客户端本次是用的postgresql。
      apt-get install postgresql-client
    2. 与sqoop连接
      Sqoop automatically supports several databases, including MySQL. Connect strings beginning with jdbc:mysql:// are
      handled automatically in Sqoop. (A full list of databases with built-in support is provided in the "Supported Databases" section. For some, you may need to install the JDBC driver yourself.)

      While JDBC is a compatibility layer that allows a program to access many different databases through a common API, slight differences in the SQL language spoken by each database may mean that Sqoop can’t use every database out of the box, or that some databases
      may be used in an inefficient manner.

      When you provide a connect string to Sqoop, it inspects the protocol scheme to determine appropriate vendor-specific logic to use. If Sqoop knows about a given database, it will work automatically. If not, you may need to specify the driver class to load via --driver.
      This will use a generic code path which will use standard SQL to access the database. Sqoop provides some databases with faster, non-JDBC-based access mechanisms. These can be enabled by specfying the --direct parameter.

      Sqoop includes vendor-specific support for the following databases:

       

      Database version --direct support? connect string matches
      HSQLDB 1.8.0+ No jdbc:hsqldb:*//
      MySQL 5.0+ Yes jdbc:mysql://
      Oracle 10.2.0+ No jdbc:oracle:*//
      PostgreSQL 8.3+ Yes (import only) jdbc:postgresql://

      因为只支持postgresql的import功能所以postgresql的驱动还是要自己装。

      到网站http://jdbc.postgresql.org/download.html上下载相应的jdbc driver。并放在sqoop安装目录的lib自目录下。

  3. 权限问题
     从客户端操作hadoop集群时,会有权限问题,设置请参考wiki文章:
     hadoop权限配置

使用例程:

  • sqoop import   -conf ~ / hadoop_work/hadoop-1.0.3/etc/conf_sqoop/mapred-site.xml   -connect jdbc: postgresql :/ / 10.200.187.36/ccms_api_test
       -username ccms        -table retail_tb_order_item           -target-dir mjiang/test3-m = 1

    -conf 是sqoop对hadoop的个性化配置,如hadoop默认是压缩存储,在这里可以取消。
    其他参数可参考文档Sqoop User Guide (v1.3.0-cdh3u4)

可能会出现的问题:

  • 权限问题,hadoop权限设置请参考 hadoop权限配置
    12/07/26 15:52:10 ERROR security.UserGroupInformation: PriviledgedActionException as: mjiang cause: org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission
    denied: user = mjiang, access = WRITE, inode = "staging": hadoop: supergroup: rwxr-xr-x
  • 数据库对应驱动没有配置参考上面配置JDBC数据库驱动
    12/07/26 11:03: 03 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver
  • 重复导入时,自动生成的.java文件 已存在错误。这个错误可以自动跳过,不影响运行。
    ERROR orm.CompilationManager: Could not rename /tmp/sqoop-mjiang/compile/475a1785ffb0902fcf0ad2b98d9eb24f/retail_tb_order.java to /home/mjiang/program/hadoop/sqoop/java/1/retail_tb_order.java
    org.apache.commons.io.FileExistsException: Destination '/home/mjiang/program/hadoop/sqoop/java/1/retail_tb_order.java' already exists

有用的网站:

Sqoop User Guide (v1.3.0-cdh3u4)

抱歉!评论已关闭.