目前hadoop有2个开源版本,一个是Apache的版本,另一个是Cloudera在Apache的基础上进行优化的版本,也称为CDH3版。
两个版本的对比情况如下:
CDH3 版本 |
Apache 版本 |
描述 |
|
Hadoop Common |
● |
● |
The common utilities that support the other Hadoop subprojects. |
Hadoop Distributed File System (HDFS) |
● |
● |
A distributed file system that provides high-throughput access to application data. |
Hadoop MapReduce |
● |
● |
A software framework for distributed processing of large data sets on compute clusters. |
Flume |
● |
A distributed, reliable, and available service for efficiently moving large amounts of data as the data is |
|
Sqoop |
● |
A tool that imports data from relational databases into Hadoop clusters. |
|
Hue |
● |
A graphical user interface to work with CDH. |
|
Pig |
● |
● |
A high-level data-flow language and execution framework for parallel computation.Enables you to analyze large |
Hive |
● |
● |
A data warehouse infrastructure that provides data summarization and ad hoc querying. A powerful data warehousing |
HBase |
● |
● |
A scalable, distributed database that supports structured data storage for large tables. provides large-scale |
ZooKeeper |
● |
● |
A high-performance coordination service for distributed applications.A highly reliable and available service |
Oozie |
● |
A server-based workflow engine specialized in running workflow jobs with actions that execute Hadoop jobs. |
|
Whirr |
● |
Provides a fast way to run cloud services. |
|
Snappy |
● |
A compression/decompression library. |
|
Avro |
● |
A data serialization system. |
|
Cassandra |
● |
A scalable multi-master database with no single points of failure. |
|
Chukwa |
● |
A data collection system for managing large distributed systems. |
|
Mahout |
● |
A Scalable machine learning and data mining library. |
理论上说,CDH3版本应该支持Apache版本的全部组件及其子项目。
两个hadoop版本的异同如下:
系统从CDH3b3开始不支持hadoop.job.ugi参数,请使用UserGroupInformation.doAs()方法代替。 其它见:https://ccp.cloudera.com/display/CDHDOC/Incompatible+Changes 安装Cloudera CDH3基于hadoop稳定版0.20.2,并集成很多补丁(patch)。 CDH提供rpm包和tar两种方式(Cloudera更推荐使用rpm方式),hadoop0.20.2只提供了tar包安装方式。
Cloudera CDH3
Apache hadoop使用start/stop-dfs.sh start/stop-all.sh脚本维护集群,CDH通过root身份运行/etc/init.d/hadoop-0.20-*
Cloudera CDH安装成功后会添加两个用户:hdfs(hdfs文件系统相关),
Cloudera CDH通过alternatives切换多个配置文件,而Apache eclipse插件
Cloudera CDH默认没有提供eclipse插件,需要自己编译,而且它的插件和Apache 安全
CDH3支持Kerberos安全认证,apache |