This is how I setup Hadoop on Windows, for anyone looking to setup Hadoop go through this : http://v-lad.org/Tutorials/Hadoop/00%20-%20Intro.html. Everything else that I tried was too complicated or had steps missing or inconsistent.
How to run Hadoop Map Reduce program from Command line February 15, 2012
A few days ago I had problems with hadoop eclipse plugin, so I had to use command line to run my hadoop programs. If you also need to run your mapreduce program from command line here it is:
1) Open your windows command prompt. First Create a jar file with your class files in it. You will also need to create a manifest file to specify the main class in the jar file.
manifest.txt contents (we are saving the manifest.txt file in same location as our class file just for sake of this example, you can have it in different location, but then you will need to give full path of manifest file when creating jar file)
Jar file command
C:\Users\pjain\workspace\HadoopTest\t>jar cvfm ch.jar manifest.txt *.class added manifest adding: Histogram$MapClass.class(in = 2201) (out= 823)(deflated 62%) adding: Histogram$Reduce.class(in = 2197) (out= 807)(deflated 63%) adding: Histogram.class(in = 2360) (out= 1057)(deflated 55%)
2) Start cygwin shell and use the current directory as above where you created jar file or copy over the jar file to the current directory of your shell prompt.
pjain@c0250 ~ $ cp /cygdrive/c/users/pjain/workspace/hadooptest/t/*.jar .
3) Copy the input files for the mapreduce program to hadoop file system HDFS. We first create an input directory in HDFS so it is easier to manage input files. Then copy over the input files from “in” directory in current windows directory and copy all files in it. Now you can see that the files are present in HDFS.
pjain@c0250 ~ $ hadoop fs -mkdir input pjain@c0250 ~ $ hadoop fs -put in/* input pjain@c0250 ~ $ hadoop fs -lsr drwxr-xr-x - risk2000\pjain supergroup 0 2012-02-09 14:53 /user/risk2000/pjain/input -rw-r--r-- 1 risk2000\pjain supergroup 236903179 2012-02-09 14:49 /user/risk2000/pjain/input/apat63_99.txt -rw-r--r-- 1 risk2000\pjain supergroup 264075431 2012-02-09 14:50 /user/risk2000/pjain/input/cite75_99.txt
4) To run the jar file use the command below. Also make sure you do not have the output directory already present, hadoop will create on for you. If you have it already present then you need to delete it first.
pjain@c0250 ~ $ hadoop fs -rmr output Deleted hdfs://localhost:9000/user/risk2000/pjain/output pjain@c0250 ~ $ hadoop jar ch.jar /user/risk2000/pjain/input/cite75_99.txt /user/risk2000/pjain/output
Now you can run MapReduce programs through command line.
Issues with Hadoop eclipse plugin February 2, 2012
It has been a day that I am now playing with Hadoop and bumped into issues with Eclipse plugin for Hadoop (Later, I ran mapreduce program on command line, you can find the notes posted here).
I have Windows 7, 64 bit machine, eclipse Indigo 3.7.1, and installed hadoop 0.20.2 (why did I install 0.20.2).
Ok, and then I copied over the eclipse plugin from Hadoop directory /contrib/eclipse-plugin/hadoop-0.20.2-eclipse-plugin.jar
Everything seems fine until I try to run the application, and I see that I cannot run the application on Hadoop. You can click on Run on Hadoop and nothing happens.
Plugin version : 0.20.2
Hadoop version : 0.20.2
So then I find that this error was fixed later, I downloaded hadoop 0.20.203 and copied over its plugin from /contrib/eclipse-plugin/hadoop-eclipse-plugin-0.20.203.0.jar into eclipse plugin directory and started eclipse
Plugin version : 0.20.203.0
Hadoop version : 0.20.2
Now I can see that I can run my application.
But I have another issue I cannot look into DFS location! WTH Hadoop!
I get a new error : Connecting to DFS has encountered a problem. An internal error has occurred during “Connecting to DFS localhost. org/apache/commons/configuration/Configuration”
Hadoop 0.20.203.0 plugin for Eclipse fixes one problem and introduces a new one, that I can run the application on hadoop, but I cannot see the HDFS file system through plugin. So in the end I resorted to using these commands to view files on HDFS file system.
// to view the files in the directory hadoop fs -ls // to delete files from a directory hadoop fs -rm myfolder/sub2/* // to delete empty directory hadoop fs -rm myfolder/sub2 // to delete recursively hadoop fs -rmr myfolder/sub2 // copy file sample.txt to HDFS from current directory in native file system hadoop fs -put sample.txt . // get file sample.txt from HDFS to current directory in native file system hadoop fs -get sample.txt .
Later I managed to run my mapreduce program on command line, you can find the notes posted here.
java.io.IOException: Failed to set permissions of path: file:/tmp/hadoop-pjain/mapred/staging/pjain1500477637/.staging to 0700 February 1, 2012
Ran into this exception when installing hadoop stable version 0.20.203.0 on local dev workstation and running sample examples.
Truncated Exception :
java.io.IOException: Failed to set permissions of path: file:/tmp/hadoop-pjain/mapred/staging/pjain1500477637/.staging to 0700 at org.apache.hadoop.fs.RawLocalFileSystem.checkReturnValue(RawLocalFileSystem.java:525) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:499) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:318) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:183) at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:797) ... at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
After banging my head in resolving this, I found out this is a known issue with this (stable!) release.
So I reverted to stable version 0.20.2 and ran the samples. And it worked fine.
Hope that helps.