A few days ago I had problems with hadoop eclipse plugin, so I had to use command line to run my hadoop programs. If you also need to run your mapreduce program from command line here it is:
1) Open your windows command prompt. First Create a jar file with your class files in it. You will also need to create a manifest file to specify the main class in the jar file.
manifest.txt contents (we are saving the manifest.txt file in same location as our class file just for sake of this example, you can have it in different location, but then you will need to give full path of manifest file when creating jar file)
Jar file command
C:\Users\pjain\workspace\HadoopTest\t>jar cvfm ch.jar manifest.txt *.class added manifest adding: Histogram$MapClass.class(in = 2201) (out= 823)(deflated 62%) adding: Histogram$Reduce.class(in = 2197) (out= 807)(deflated 63%) adding: Histogram.class(in = 2360) (out= 1057)(deflated 55%)
2) Start cygwin shell and use the current directory as above where you created jar file or copy over the jar file to the current directory of your shell prompt.
pjain@c0250 ~ $ cp /cygdrive/c/users/pjain/workspace/hadooptest/t/*.jar .
3) Copy the input files for the mapreduce program to hadoop file system HDFS. We first create an input directory in HDFS so it is easier to manage input files. Then copy over the input files from “in” directory in current windows directory and copy all files in it. Now you can see that the files are present in HDFS.
pjain@c0250 ~ $ hadoop fs -mkdir input pjain@c0250 ~ $ hadoop fs -put in/* input pjain@c0250 ~ $ hadoop fs -lsr drwxr-xr-x - risk2000\pjain supergroup 0 2012-02-09 14:53 /user/risk2000/pjain/input -rw-r--r-- 1 risk2000\pjain supergroup 236903179 2012-02-09 14:49 /user/risk2000/pjain/input/apat63_99.txt -rw-r--r-- 1 risk2000\pjain supergroup 264075431 2012-02-09 14:50 /user/risk2000/pjain/input/cite75_99.txt
4) To run the jar file use the command below. Also make sure you do not have the output directory already present, hadoop will create on for you. If you have it already present then you need to delete it first.
pjain@c0250 ~ $ hadoop fs -rmr output Deleted hdfs://localhost:9000/user/risk2000/pjain/output pjain@c0250 ~ $ hadoop jar ch.jar /user/risk2000/pjain/input/cite75_99.txt /user/risk2000/pjain/output
Now you can run MapReduce programs through command line.