你的位置:首页 > 数据库

[数据库]hadoop单击模式环境搭建


一 安装jdk

下载相应版本的jdk安装到相应目录,我的安装目录是/usr/lib/jdk1.8.0_40

下载完成后,在/etc/profile中设置一下环境变量,在文件最后追加如下内容

export JAVA_HOME=/usr/lib/jdk1.8.0_40export JRE_HOME=/usr/lib/jdk1.8.0_40/jreexport CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$CLASSPATHexport PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH

 

二 安装ssh---------sudo apt-get install ssh 

主要使用其管理远端守护进程,这里是单击模式,所以,不重要.

三 下载hadoop

http://hadoop.apache.org/releases.html

建议下载稳定版本的,我下载的是hadoop2.6.4,并把它放在了/usr/local/目录下

hadoop运行在apache服务器上的,需要java环境的支持,所以,下载的hadoop需要配置java环境变量,使java认识hadoop,同时也要使hadoop放到java环境中.

1 设置 ~/.bashrc,为登录的hadoop用户设置环境变量

export JAVA_HOME=/usr/lib/jdk1.8.0_40export HADOOP_INSTALL=/usr/local/hadoop-2.6.4export PATH=$PATH:$HADOOP_INSTALL/binexport PATH=$PATH:$JAVA_HOME/bin export PATH=$PATH:$HADOOP_INSTALL/sbinexport HADOOP_MAPRED_HOME=$HADOOP_INSTALLexport HADOOP_COMMON_HOME=$HADOOP_INSTALLexport HADOOP_HDFS_HOME=$HADOOP_INSTALLexport YARN_HOME=$HADOOP_INSTALLexport HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/nativeexport HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"

设置完成之后,要运行

source ~/.bashrc

使设置的环境变量生效

2 配置hadoop

在   /usr/local/hadoop-2.6.4/etc/hadoop/下打开hadoop-env.sh

export JAVA_HOME=/usr/lib/jdk1.8.0_40export JRE_HOME=/usr/lib/jdk1.8.0_40/jreexport CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$CLASSPATHexport PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH

到这里hadoop单击模式就配置好了

运行  

./bin/hadoop version

可看到如下信息

Hadoop 2.6.4Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 5082c73637530b0b7e115f9625ed7fac69f937e6Compiled by jenkins on 2016-02-12T09:45ZCompiled with protoc 2.5.0From source with checksum 8dee2286ecdbbbc930a6c87b65cbc010This command was run using /usr/local/hadoop-2.6.4/share/hadoop/common/hadoop-common-2.6.4.jar

说明hadoop配置好了

 

 

下面来运行一下hadoop自带的wordcount程序检验一下

1 在hadoop目录下创建input文件夹,将/etc/hadoop中的配置文件复制到里面作为待测文件

mkdir input

cp etc/hadoop/* input/

2 运行程序,计数

在hadoop目录下运行命令

./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar grep input output '[a-z.]+'

意思是,通过example那个jar包,将a-z开头的单词数统计出来

看到如下运行信息

  File System Counters    FILE: Number of bytes read=632564    FILE: Number of bytes written=1415622    FILE: Number of read operations=0    FILE: Number of large read operations=0    FILE: Number of write operations=0  Map-Reduce Framework    Map input records=1151    Map output records=1151    Map output bytes=22396    Map output materialized bytes=24704    Input split bytes=126    Combine input records=0    Combine output records=0    Reduce input groups=70    Reduce shuffle bytes=24704    Reduce input records=1151    Reduce output records=1151    Spilled Records=2302    Shuffled Maps =1    Failed Shuffles=0    Merged Map outputs=1    GC time elapsed (ms)=0    CPU time spent (ms)=0    Physical memory (bytes) snapshot=0    Virtual memory (bytes) snapshot=0    Total committed heap usage (bytes)=667942912  Shuffle Errors    BAD_ID=0    CONNECTION=0    IO_ERROR=0    WRONG_LENGTH=0    WRONG_MAP=0    WRONG_REDUCE=0  File Input Format Counters     Bytes Read=32250  File Output Format Counters     Bytes Written=15798

说明运行成功

查看运行结果 

cat output/*

 

再次运行的话,需要 rm -r output/ 删除output文件夹才能再次运行