你的位置:首页 > 数据库

[数据库]Hadoop企业级集群架构

1. 在h1.hadoop上解压并配置hadoop-2.5.2

 

2. 在hadoop安装目录下,创建tmp,data目录

 

3. 使用awk命令创建复制脚本 (java, hadoop-2.5.2)

 

awk '{ print "scp -rp hadoop-2.5.2 " $1 ":/home/grid" }' /home/grid/hadoop-2.5.2/etc/hadoop/slaves >dd.sh

source ./dd.sh

 

awk '{ print "scp -rp jdk1.7.0_79 " $1 ":/home/grid" }' /home/grid/hadoop-2.5.2/etc/hadoop/slaves >dd_java.sh

source ./dd_java.sh

 

4. 格式化namenode

bin/hdfs namenode -format

 

 

5. 启动Hadoop

sbin/start-dfs.sh

sbin/start-yarn.sh

 

 

6. 检查进程

/home/grid/jdk1.7.0_79/bin/jps

 

Master节点:

h1.hadoop.com

 

Sencod namenode节点:

h2.hadoop.com

 

其它3个datanoe节点:

h3.hadoop.com

 

h4.hadoop.com

h5.hadoop.com

 

7.运行示例

7.1 准备输入

mkdir input

cd input

echo "hello word" >test1.txt

echo "hello hadoop" >test2.txt

bin/hadoop fs -put /home/grid/hadoop-2.5.2/input /in

bin/hadoop fs -ls -R /in

 

7.2 运行示例

bin/hadoop jar /home/grid/hadoop-2.5.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.2.jar wordcount /in/* /out

 

8. 查看结果

 

bin/hadoop fs -ls -R /out

bin/hadoop fs -cat /out/part-r-00000