mapreduce框架详解

现在的位置: 首页 > 云计算 > 正文

mapreduce框架详解

2013年07月16日 ⁄ 云计算 ⁄ 共 2108字 ⁄ 字号小中大 ⁄ 评论关闭

原文地址：http://www.cnblogs.com/sharpxiajun/p/3151395.html

　开始聊mapreduce，mapreduce是hadoop的计算框架，我学hadoop是从hive开始入手，再到hdfs，当我学习hdfs时候，就感觉到hdfs和mapreduce关系的紧密。这个可能是我做技术研究的思路有关，我开始学习某一套技术总是想着这套技术到底能干什么，只有当我真正理解了这套技术解决了什么问题时候，我后续的学习就能逐步的加快，而学习hdfs时候我就发现，要理解hadoop框架的意义，hdfs和mapreduce是密不可分，所以当我写分布式文件系统时候，总是感觉自己的理解肤浅，今天我开始写mapreduce了，今天写文章时候比上周要进步多，不过到底能不能写好本文了，只有试试再说了。

　　Mapreduce初析

　　Mapreduce是一个计算框架，既然是做计算的框架，那么表现形式就是有个输入（input），mapreduce操作这个输入（input），通过本身定义好的计算模型，得到一个输出（output），这个输出就是我们所需要的结果。

　　我们要学习的就是这个计算模型的运行规则。在运行一个mapreduce计算任务时候，任务过程被分为两个阶段：map阶段和reduce阶段，每个阶段都是用键值对（key/value）作为输入（input）和输出（output）。而程序员要做的就是定义好这两个阶段的函数：map函数和reduce函数。

　　Mapreduce的基础实例

　　讲解mapreduce运行原理前，首先我们看看mapreduce里的hello world实例WordCount,这个实例在任何一个版本的hadoop安装程序里都会有，大家很容易找到，这里我还是贴出代码，便于我后面的讲解，代码如下：

/**

 *

 Licensed to the Apache Software Foundation (ASF) under one

 *

 or more contributor license agreements.  See the NOTICE file

 *

 distributed with this work for additional information

 *

 regarding copyright ownership.  The ASF licenses this file

 *

 to you under the Apache License, Version 2.0 (the

 *

 "License"); you may not use this file except in compliance

 *

 with the License.  You may obtain a copy of the License at

 *

 *    http://www.apache.org/licenses/LICENSE-2.0

 *

 *

 Unless required by applicable law or agreed to in writing, software

 *

 distributed under the License is distributed on an "AS IS" BASIS,

 *

 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

 *

 See the License for the specific language governing permissions and

 *

 limitations under the License.

 */

package org.apache.hadoop.examples;

import java.io.IOException;

import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount

 {

  public static class TokenizerMapper

       extends Mapper<Object,

 Text, Text, IntWritable>{