Prash's Blog

How to run Hadoop Map Reduce program from Command line February 15, 2012

Filed under: Hadoop — prazjain @ 9:59 am
Tags: ,

A few days ago I had problems with hadoop eclipse plugin, so I had to use command line to run my hadoop programs. If you also need to run your mapreduce program from command line here it is:

1) Open your windows command prompt. First Create a jar file with your class files in it. You will also need to create a manifest file to specify the main class in the jar file.

manifest.txt contents (we are saving the manifest.txt file in same location as our class file just for sake of this example, you can have it in different location, but then you will need to give full path of manifest file when creating jar file)

Main-class: Histogram

Jar file command

C:\Users\pjain\workspace\HadoopTest\t>jar cvfm ch.jar manifest.txt *.class
added manifest
adding: Histogram$MapClass.class(in = 2201) (out= 823)(deflated 62%)
adding: Histogram$Reduce.class(in = 2197) (out= 807)(deflated 63%)
adding: Histogram.class(in = 2360) (out= 1057)(deflated 55%)

2) Start cygwin shell and use the current directory as above where you created jar file or copy over the jar file to the current directory of your shell prompt.

pjain@c0250 ~
$ cp /cygdrive/c/users/pjain/workspace/hadooptest/t/*.jar .

3) Copy the input files for the mapreduce program to hadoop file system HDFS. We first create an input directory in HDFS so it is easier to manage input files. Then copy over the input files from “in” directory in current windows directory and copy all files in it. Now you can see that the files are present in HDFS.

pjain@c0250 ~
$ hadoop fs -mkdir input

pjain@c0250 ~
$ hadoop fs -put in/* input

pjain@c0250 ~
$ hadoop fs -lsr
drwxr-xr-x - risk2000\pjain supergroup 0 2012-02-09 14:53 /user/risk2000/pjain/input
-rw-r--r-- 1 risk2000\pjain supergroup 236903179 2012-02-09 14:49 /user/risk2000/pjain/input/apat63_99.txt
-rw-r--r-- 1 risk2000\pjain supergroup 264075431 2012-02-09 14:50 /user/risk2000/pjain/input/cite75_99.txt

4) To run the jar file use the command below. Also make sure you do not have the output directory already present, hadoop will create on for you. If you have it already present then you need to delete it first.

pjain@c0250 ~
$ hadoop fs -rmr output
Deleted hdfs://localhost:9000/user/risk2000/pjain/output

pjain@c0250 ~
$ hadoop jar ch.jar /user/risk2000/pjain/input/cite75_99.txt /user/risk2000/pjain/output

Now you can run MapReduce programs through command line.


One Response to “How to run Hadoop Map Reduce program from Command line”

  1. […] How to run Hadoop Map Reduce program from Command line « Prash's Blog Says: February 15, 2012 at 9:59 am […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s