steup a single Hadoop 2.4 on Mac OS X 10.9.3
支 持 本 站: 捐赠服务器等运维费用,需要您的支持!
支 持 本 站: 捐赠服务器等运维费用,需要您的支持! 支 持 本 站: 捐赠服务器等运维费用,需要您的支持!
steup a single Hadoop 2.4 on Mac OS X 10.9.3
install
brew install hadoop
Setup passphraseless ssh
- try
ssh localhost
- $ ssh-keygen -t dsa -P ” -f ~/.ssh/id_dsa
- $ cat ~/.ssh/iddsa.pub >> ~/.ssh/authorizedkeys
Environment
- check /usr/local/Cellar/hadoop/2.4.0/libexec/etc/hadoop/hadoop-env.sh
export JAVA_HOME="$(/usr/libexec/java_home)"
- cd /usr/local/Cellar/hadoop/2.4.0
- try bin/hadoop
$ bin/hadoop version Hadoop 2.4.0 Subversion http://svn.apache.org/repos/asf/hadoop/common -r 1583262 Compiled by jenkins on 2014-03-31T08:29Z Compiled with protoc 2.5.0 From source with checksum 375b2832a6641759c6eaf6e3e998147 This command was run using /usr/local/Cellar/hadoop/2.4.0/libexec/share/hadoop/common/hadoop-common-2.4.0.jar</li>
try Standalone mode
cd /usr/local/Cellar/hadoop/2.4.0
mkdir input
cp libexec/etc/hadoop/*.xml input
bin/hadoop jar libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar grep input output 'dfs[a-z]+'
cat output/*
try Pseudo-Distributed mode
vi libexec/etc/hadoop/core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration>
vi libexec/etc/hadoop/hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
run MapReduce job locally
hdfs file system
- rm -fr /tmp/hadoop-username; rm -fr /private/tmp/hadoop-username
- Format the filesystem:
$ bin/hdfs namenode -format
“INFO common.Storage: Storage directory /tmp/hadoop-username/dfs/name has been successfully formatted.”“
start daemon
- Start NameNode daemon and DataNode daemon:
$ sbin/start-dfs.sh
Check java processes with org.apache.hadoop.hdfs.server.namenode.NameNode & org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.
Check log withls -lstr libexec/logs/
Check http://localhost:9000/ - Browse the web interface for the NameNode; by default it is available at:
NameNode - http://localhost:50070/
hdfs command
- Make the HDFS directories required to execute MapReduce jobs:
$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/username
$ bin/hdfs dfs -mkdir /user/username/input
$ bin/hdfs dfs -ls /user/
$ jps
29398 Jps
25959 DataNode
25839 NameNode
26109 SecondaryNameNode
run mapreduce
- Copy the input files into the distributed filesystem:
$ bin/hdfs dfs -put etc/hadoop input - Run some of the examples provided:
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar grep input output ‘dfs[a-z.]+’ - Examine the output files:
Copy the output files from the distributed filesystem to the local filesystem and examine them:
$ bin/hdfs dfs -get output output
$ cat output/*
$ bin/hdfs dfs -cat output/*
stop hdfs
- stop hdfs
$ sbin/stop-dfs.sh
run MapReduce job on YARN
start hdfs
- sbin/start-dfs.sh
- bin/hdfs dfs -rm -r output
- bin/hdfs dfs -rm -r input
config yarn
etc/hadoop/mapred-site.xml:
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
etc/hadoop/yarn-site.xml:
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
Start ResourceManager daemon and NodeManager daemon:
- sbin/start-yarn.sh
- jps
99082 SecondaryNameNode
98803 NameNode
99215 Jps
97753 NodeManager
97649 ResourceManager
98929 DataNode - Browse the web interface for the ResourceManager; by default it is available at:
ResourceManager - http://localhost:8088/
run a mapreduce
- bin/hdfs dfs -put libexec/etc/hadoop input
- bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar grep input output ‘dfs[a-z.]+’
- bin/hdfs dfs -cat /user/yinlei/output/part-r-00000
4 dfs.class 4 dfs.audit.logger 3 dfs.server.namenode. 2 dfs.audit.log.maxbackupindex 2 dfs.period 2 dfs.audit.log.maxfilesize 1 dfsmetrics.log 1 dfsadmin 1 dfs.servers 1 dfs.replication 1 dfs.file 1 dfs.data.dir 1 dfs.name.dir</li>
支 持 本 站: 捐赠服务器等运维费用,需要您的支持! 支 持 本 站: 捐赠服务器等运维费用,需要您的支持!
留言簿