你的位置:首页 > Java教程

[Java教程]懒人记录 Hadoop2.7.1 集群搭建过程


 

懒人记录 Hadoop2.7.1 集群搭建过程

2016-07-02 13:15:45

 

 

  • 总结
    • 除了配置hosts ,和免密码互连之外,先在一台机器上装好所有东西
    • 配置好之后,拷贝虚拟机,配置hosts和免密码互连
    • 之前在公司装的时候jdk用的32位,hadoop的native包不能正常加载,浪费好多时间自己编译,所以jdk务必64位
    • 配置免密码互连
    • 其它也没什么了,注意下文件的用户组,不一定是"hadoop",根据自己的情况设置
      • sudo chown -R hadoop /opt
      • sudo chgrp -R hadoop /opt

  • 准备文件
    1. linuxmint17x64
    2. jdk1.8 x64     一定要64位,除非你想自己去编译hadoop的native包
    3. hadoop2.7.1
    4. VirtualBox
    5. MobaXterm  SSH工具 用putty也可以,随意
  • 虚拟机安装和配置

      我们需要三台虚拟机,可以先装一台虚拟机,下载好hadoop,配置好JDK,设置好环境变量后拷贝虚拟机

    1. 安装第一台虚拟机
      1. 安装的步骤不说了,说一下注意点
        • 注意共享粘贴板
          •   sudo chown -R hadoop /opt
          •   sudo chgrp -R hadoop /opt  
        • 安装openssh-server,linuxmint默认应该是没有装过的.  sudo apt-get install openssh-server
        • 关闭防火墙   sudo ufw disable
        • 查看防火墙状态 sudo ufw status inactive
        • 安装vim,sudo apt-get install vim
        • 修改hostname(三台机器的hostname最好不一样,比如我是master-hadoop,slave1-hadoop,slave2-hadoop,为了好区分)
          • Debian系: vi /etc/hostname 
          • Redhat系: vi /etc/sysconfig/network
          • 重启
        • 安装JDK
          • 用MobaXterm连接到虚拟机(自己查看一下IP,第一台应该是192.168.56.101)
          • 创建lib目录用来存放一些会用到的组建,比如jdk
            • mkdir /opt/lib
          • 把下载的jdk上传到/opt/lib中(用MX直接可以拖放进去)
          • 解压jdk tar -zxvf jdk-8u92-linux-x64.tar.gz
          • mv jdk1.8.0_92 jdk8  重命名一下文件夹名称
          • 看一下现在的目录结构,注意下own和grp 都是hadoop(也可以不是hadoop,但是最好和hadoop相关的文件目录都属于一个组,防止权限不足等情况)
          • hadoop@hadoop-pc / $ cd /opt/hadoop@hadoop-pc /opt $ lltotal 16drwxr-xr-x 4 hadoop hadoop 4096 Jul 2 00:33 ./drwxr-xr-x 23 root  root  4096 Jul 1 23:23 ../drwxr-xr-x 3 hadoop hadoop 4096 Nov 29 2015 firefox/drwxr-xr-x 3 hadoop hadoop 4096 Jul 2 01:04 lib/hadoop@hadoop-pc /opt $ cd lib/hadoop@hadoop-pc /opt/lib $ lltotal 177156drwxr-xr-x 3 hadoop hadoop   4096 Jul 2 01:04 ./drwxr-xr-x 4 hadoop hadoop   4096 Jul 2 00:33 ../drwxr-xr-x 8 hadoop hadoop   4096 Apr 1 12:20 jdk8/-rw-rw-r-- 1 hadoop hadoop 181389058 Jul 2 01:00 jdk-8u92-linux-x64.tar.gzhadoop@hadoop-pc /opt/lib $ mkdir packagehadoop@hadoop-pc /opt/lib $ mv jdk-8u92-linux-x64.tar.gz package/hadoop@hadoop-pc /opt/lib $ lltotal 16drwxr-xr-x 4 hadoop hadoop 4096 Jul 2 01:08 ./drwxr-xr-x 4 hadoop hadoop 4096 Jul 2 00:33 ../drwxr-xr-x 8 hadoop hadoop 4096 Apr 1 12:20 jdk8/drwxrwxr-x 2 hadoop hadoop 4096 Jul 2 01:08 package/hadoop@hadoop-pc /opt/lib $ cd jdk8/hadoop@hadoop-pc /opt/lib/jdk8 $ lltotal 25916drwxr-xr-x 8 hadoop hadoop   4096 Apr 1 12:20 ./drwxr-xr-x 4 hadoop hadoop   4096 Jul 2 01:08 ../drwxr-xr-x 2 hadoop hadoop   4096 Apr 1 12:17 bin/-r--r--r-- 1 hadoop hadoop   3244 Apr 1 12:17 COPYRIGHTdrwxr-xr-x 4 hadoop hadoop   4096 Apr 1 12:17 db/drwxr-xr-x 3 hadoop hadoop   4096 Apr 1 12:17 include/-rwxr-xr-x 1 hadoop hadoop 5090294 Apr 1 11:33 javafx-src.zip*drwxr-xr-x 5 hadoop hadoop   4096 Apr 1 12:17 jre/drwxr-xr-x 5 hadoop hadoop   4096 Apr 1 12:17 lib/-r--r--r-- 1 hadoop hadoop    40 Apr 1 12:17 LICENSEdrwxr-xr-x 4 hadoop hadoop   4096 Apr 1 12:17 man/-r--r--r-- 1 hadoop hadoop   159 Apr 1 12:17 README.html-rw-r--r-- 1 hadoop hadoop   525 Apr 1 12:17 release-rw-r--r-- 1 hadoop hadoop 21104834 Apr 1 12:17 src.zip-rwxr-xr-x 1 hadoop hadoop  110114 Apr 1 11:33 THIRDPARTYLICENSEREADME-JAVAFX.txt*-r--r--r-- 1 hadoop hadoop  177094 Apr 1 12:17 THIRDPARTYLICENSEREADME.txthadoop@hadoop-pc /opt/lib/jdk8 $

             

          • 设置JAVA_HOME和环境变量
            # /etc/profile: system-wide .profile file for the Bourne shell (sh(1))# and Bourne compatible shells (bash(1), ksh(1), ash(1), ...).if [ "$PS1" ]; then if [ "$BASH" ] && [ "$BASH" != "/bin/sh" ]; then  # The file bash.bashrc already sets the default PS1.  # PS1='\h:\w\$ '  if [ -f /etc/bash.bashrc ]; then   . /etc/bash.bashrc  fi else  if [ "`id -u`" -eq 0 ]; then   PS1='# '  else   PS1='$ '  fi fifi# The default umask is now handled by pam_umask.# See pam_umask(8) and /etc/login.defs.if [ -d /etc/profile.d ]; then for i in /etc/profile.d/*.sh; do  if [ -r $i ]; then   . $i  fi done unset ifi#ADD HEREJAVA_HOME=/opt/lib/jdk8CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jarPATH=$JAVA_HOME/bin:$PATHexport JAVA_HOMEexport CLASSPATHexport PATH

             

          • 检查JAVA版本和环境变量
            hadoop@hadoop-pc / $ java -versionjava version "1.8.0_92"Java(TM) SE Runtime Environment (build 1.8.0_92-b14)Java HotSpot(TM) 64-Bit Server VM (build 25.92-b14, mixed mode)hadoop@hadoop-pc / $ echo $JAVA_HOME/opt/lib/jdk8hadoop@hadoop-pc / $ echo $CLASSPATH.:/opt/lib/jdk8/lib/dt.jar:/opt/lib/jdk8/lib/tools.jarhadoop@hadoop-pc / $ echo $PATH/opt/lib/jdk8/bin:/opt/lib/jdk8/bin:/opt/lib/jdk8/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/gameshadoop@hadoop-pc / $

             

        • 建立几个hadoop需要用的文件夹
          • tmp目录
            •   mkdir /opt/hadoop-tmp
          • hdfs目录
            • mkdir /opt/hadoop-dfs
            • name 目录
              • mkdir /opt/hadoop-dfs/name
            • data目录
              • mkdir /opt/hadoop-dfs/data
        • 上传hadoop
          • 用MX把hadoop的压缩包上传到/opt,或者在/opt下 wget  http://mirrors.cnnic.cn/apache/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz
          • tar -zxvf hadoop-2.7.1.tar.gz
          • mv hadoop-2.7.1.tar.gz lib/package/ 把压缩包备份到package
          • mv hadoop-2.7.1 hadoop 重命名一下文件夹
        • 修改一下hadoop的配置文件
        • hadoop 的配置文件在/opt/hadoop/etc/hadoop下面
          1. core-site.
            • <configuration>    <property>        <name>fs.defaultFS</name>        <value>hdfs://master:9000</value>    </property>    <property>        <name>hadoop.tmp.dir</name>        <value>file:/opt/hadoop-tmp</value>        <description>Abasefor other temporary directories.</description>    </property></configuration>

               

                
          2. hdfs-site.
            • <configuration>    <property>        <name>dfs.namenode.secondary.http-address</name>        <value>master:9001</value>    </property>    <property>        <name>dfs.namenode.name.dir</name>        <value>file:/opt/hadoop-dfs/name</value>    </property>    <property>        <name>dfs.datanode.data.dir</name>        <value>file:/opt/hadoop-dfs/data</value>    </property>    <property>        <name>dfs.replication</name>        <value>3</value>    </property>    <property>        <name>dfs.webhdfs.enabled</name>        <value>true</value>    </property></configuration>

               


          3. mapred-site.
            • cp mapred-site.
            • <configuration>  <property>    <name>mapreduce.framework.name</name>    <value>yarn</value>  </property>  <property>    <name>mapreduce.jobhistory.address</name>    <value>master:10020</value>  </property>  <property>    <name>mapreduce.jobhistory.webapp.address</name>    <value>master:19888</value>  </property></configuration>

               


          4. yarn-site.  
            • <configuration>  <property>    <name>yarn.nodemanager.aux-services</name>    <value>mapreduce_shuffle</value>  </property>  <property>    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>    <value>org.apache.hadoop.mapred.ShuffleHandler</value>  </property>  <property>    <name>yarn.resourcemanager.address</name>    <value>master:8032</value>  </property>  <property>    <name>yarn.resourcemanager.scheduler.address</name>    <value>master:8030</value>  </property>  <property>    <name>yarn.resourcemanager.resource-tracker.address</name>    <value>master:8035</value>  </property>  <property>    <name>yarn.resourcemanager.admin.address</name>    <value>master:8033</value>  </property>  <property>    <name>yarn.resourcemanager.webapp.address</name>    <value>master:8088</value>  </property></configuration>

          5. slaves 
            • slave1slave2

          6. hadoop-env.sh
            • 修改JAVA_HOME
              export JAVA_HOME=/opt/lib/jdk8

               

          7. yarn-env.sh
            • 添加JAVA_HOME环境变量
              export JAVA_HOME=/opt/lib/jdk

               

 

  • 到此第一个虚拟机配置的差不多了,把这个虚拟机拷贝两份(注意是完全复制,并且需要重置mac地址),就有了三台虚拟机,分别为 master,slave1,slave2
    •   
    • 修改slave1和slave2的hostname为slave1-hadoop,slave2-hadoop
    • 修改三台机器的hosts
      • 192.168.56.101 master
        192.168.56.102 slave1
        192.168.56.103 slave2
    • ip不一定,需要自己看下虚机的ip
  • 配置master可以免密码登录其它两台机器和自己
    • 在master上操作
    • ssh-keygen -t rsa -P '',一切都选择默认操作,该输密码输密码
    • ssh-copy-id hadoop@master
    • ssh-copy-id hadoop@slave1
    • ssh-copy-id hadoop@slave2
    • 完成之后测试一下ssh slave1 正常情况下应该不用密码就直接连接到slave1上
      hadoop@master-hadoop ~ $ ssh-keygen -t rsa -P ''Generating public/private rsa key pair.Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):Created directory '/home/hadoop/.ssh'.Your identification has been saved in /home/hadoop/.ssh/id_rsa.Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.The key fingerprint is:5c:c9:4c:0c:b6:28:eb:21:b9:6f:db:6e:3f:ee:0d:9a hadoop@master-hadoopThe key's randomart image is:+--[ RSA 2048]----+|    oo.   ||    o =..   ||  . . . =   ||  . o . .    || o o  S    ||  + .      || . .  .    ||  ....o.o    ||  .o+E++..   |+-----------------+hadoop@master-hadoop ~ $ ssh-copy-id hadoop@slave1The authenticity of host 'slave1 (192.168.56.102)' can't be established.ECDSA key fingerprint is d8:fc:32:ed:a7:2c:e1:c7:d7:15:89:b9:f6:97:fb:c3.Are you sure you want to continue connecting (yes/no)? yes/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keyshadoop@slave1's password:Number of key(s) added: 1Now try logging into the machine, with:  "ssh 'hadoop@slave1'"and check to make sure that only the key(s) you wanted were added.

       

       

  •  格式化namenode
    •   ./bin/hdfs namenode –format 
  •   启动hadoop验证一下
    • ./sbin/start-all.sh
    • 正常的日志应该是这样:
      hadoop@master-hadoop /opt/hadoop/sbin $ ./start-all.shThis script is Deprecated. Instead use start-dfs.sh and start-yarn.shStarting namenodes on [master]master: starting namenode, logging to /opt/hadoop/logs/hadoop-hadoop-namenode-master-hadoop.outslave1: starting datanode, logging to /opt/hadoop/logs/hadoop-hadoop-datanode-slave1-hadoop.outslave2: starting datanode, logging to /opt/hadoop/logs/hadoop-hadoop-datanode-slave2-hadoop.outStarting secondary namenodes [master]master: starting secondarynamenode, logging to /opt/hadoop/logs/hadoop-hadoop-secondarynamenode-master-hadoop.outstarting yarn daemonsstarting resourcemanager, logging to /opt/hadoop/logs/yarn-hadoop-resourcemanager-master-hadoop.outslave1: starting nodemanager, logging to /opt/hadoop/logs/yarn-hadoop-nodemanager-slave1-hadoop.outslave2: starting nodemanager, logging to /opt/hadoop/logs/yarn-hadoop-nodemanager-slave2-hadoop.out

       

    •   看下三个节点的jps
      hadoop@master-hadoop /opt/hadoop/sbin $ jps5858 ResourceManager5706 SecondaryNameNode5514 NameNode6108 Jpshadoop@slave2-hadoop ~ $ jps3796 Jps3621 NodeManager3510 DataNodehadoop@slave1-hadoop ~ $ jps3786 Jps3646 NodeManager3535 DataNode

  •   一切正常,安装完毕