现在的位置: 首页 > 综合 > 正文

Hive serde 序列化表例子

2014年11月14日 ⁄ 综合 ⁄ 共 2181字 ⁄ 字号 评论关闭

1. 概述

一个文本f1.txt的格式如下:

1  tom

2        jame

3             mango

它的第一列是id,第二列是name,第一列和第二列间通过不固定长度的空白(如空格 制表符等)分割;

我们希望创建一个user表,能够识别f1.txt ,通过创建表时执行分隔符的方法就不行了,这就需要用到hive的序列化(SerDe)了。

2. 新建一个maven项目,添加hive-serde 0.11.0 , hadoop-core 1.0.3的依赖。

创建SerdeTest类,实现Deserializer接口,

  • 在initialize()方法中,描述表的各个字段及其类型
  • 在deserialize(Writable text)方法中将text解析成id和name
  • getObjectInspector()方法返回ObjectInspectorFactory.getStandardStructObjectInspector(structFieldNames,structFieldObjectInspectors)

package com.renren.hive.tools;


public class SerdeTest implements Deserializer {
	private List<String> structFieldNames = new ArrayList<String>();
	private List<ObjectInspector> structFieldObjectInspectors = new ArrayList<ObjectInspector>();

	@Override
	public ObjectInspector getObjectInspector() throws SerDeException {
		// TODO Auto-generated method stub

		return ObjectInspectorFactory.getStandardStructObjectInspector(
				structFieldNames, structFieldObjectInspectors);
	}

	@Override
	public Object deserialize(Writable text) throws SerDeException {
		// TODO Auto-generated method stub

		List<Object> result = new ArrayList<Object>();
		StringTokenizer tokenizer = new StringTokenizer(text.toString());
		int index = 0;

		while (tokenizer.hasMoreTokens()) {
			if (index == 0) {
				result.add(Integer.valueOf(tokenizer.nextToken()).intValue());
			} else {
				result.add(tokenizer.nextToken());
			}
			index++;
		}

		return result;
	}

	@Override
	public void initialize(Configuration arg0, Properties arg1)
			throws SerDeException {
		// TODO Auto-generated method stub
		structFieldNames.add("id");
		structFieldObjectInspectors.add(ObjectInspectorFactory
				.getReflectionObjectInspector(Integer.TYPE,
						ObjectInspectorOptions.JAVA));
		structFieldNames.add("name");
		structFieldObjectInspectors.add(ObjectInspectorFactory
				.getReflectionObjectInspector(String.class,
						ObjectInspectorOptions.JAVA));
	}

	@Override
	public SerDeStats getSerDeStats() {
		// TODO Auto-generated method stub
		return null;
	}

}

3.  生成jar包添加到hive/lib下:

   

mvn clean package 

   将生成的jar包:hive-serde-tool-1.0.1-SNAPSHOT.jar 添加到hive_home/lib下,并在hive-site.xml中添加:

<property>
  <name>hive.aux.jars.path</name>
  <value>file:///home/dp/hive/lib/hive-serde-tool-1.0.1-SNAPSHOT.jar</value>
</property>

4. 创建hive表,指定serde

  

 hive -e "create table test row formated serde 'com.renren.hive.tools.SerdeTest'"

5.加载并查询数据

 

   hive -e "load data  local inpath 'f1.txt' overwrite into table test"

    hive -e "select * from test"

抱歉!评论已关闭.