Prash's Blog

How to run Hadoop Map Reduce program from Command line February 15, 2012

Filed under: Hadoop — prazjain @ 9:59 am
Tags: ,

A few days ago I had problems with hadoop eclipse plugin, so I had to use command line to run my hadoop programs. If you also need to run your mapreduce program from command line here it is:

1) Open your windows command prompt. First Create a jar file with your class files in it. You will also need to create a manifest file to specify the main class in the jar file.

manifest.txt contents (we are saving the manifest.txt file in same location as our class file just for sake of this example, you can have it in different location, but then you will need to give full path of manifest file when creating jar file)


Main-class: Histogram

Jar file command


C:\Users\pjain\workspace\HadoopTest\t>jar cvfm ch.jar manifest.txt *.class
added manifest
adding: Histogram$MapClass.class(in = 2201) (out= 823)(deflated 62%)
adding: Histogram$Reduce.class(in = 2197) (out= 807)(deflated 63%)
adding: Histogram.class(in = 2360) (out= 1057)(deflated 55%)

2) Start cygwin shell and use the current directory as above where you created jar file or copy over the jar file to the current directory of your shell prompt.


pjain@c0250 ~
$ cp /cygdrive/c/users/pjain/workspace/hadooptest/t/*.jar .

3) Copy the input files for the mapreduce program to hadoop file system HDFS. We first create an input directory in HDFS so it is easier to manage input files. Then copy over the input files from “in” directory in current windows directory and copy all files in it. Now you can see that the files are present in HDFS.


pjain@c0250 ~
$ hadoop fs -mkdir input

pjain@c0250 ~
$ hadoop fs -put in/* input

pjain@c0250 ~
$ hadoop fs -lsr
drwxr-xr-x - risk2000\pjain supergroup 0 2012-02-09 14:53 /user/risk2000/pjain/input
-rw-r--r-- 1 risk2000\pjain supergroup 236903179 2012-02-09 14:49 /user/risk2000/pjain/input/apat63_99.txt
-rw-r--r-- 1 risk2000\pjain supergroup 264075431 2012-02-09 14:50 /user/risk2000/pjain/input/cite75_99.txt

4) To run the jar file use the command below. Also make sure you do not have the output directory already present, hadoop will create on for you. If you have it already present then you need to delete it first.


pjain@c0250 ~
$ hadoop fs -rmr output
Deleted hdfs://localhost:9000/user/risk2000/pjain/output

pjain@c0250 ~
$ hadoop jar ch.jar /user/risk2000/pjain/input/cite75_99.txt /user/risk2000/pjain/output

Now you can run MapReduce programs through command line.

 

com.gargoylesoftware.htmlunit.ScriptException: Exception invoking jsxGet_cookie February 14, 2012

Filed under: Google App Engine — prazjain @ 2:20 pm
Tags: , ,

I was running HtmlUnit 2.9 with Google App Engine in development environment, and hit upon this exception when loading a url:

Truncated Exception is pasted below:


com.gargoylesoftware.htmlunit.ScriptException: Exception invoking jsxGet_cookie
 at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:595)
 at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:537)
 ...
Caused by: java.lang.RuntimeException: Exception invoking jsxGet_cookie
 at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:198)
 ... 65 more
Caused by: java.lang.IllegalArgumentException: Invalid port: -1
 at org.apache.http.cookie.CookieOrigin.<init>(CookieOrigin.java:58)
 ... 82 more

This is due to a bug in Google App Engine and following is the workaround to fix it in htmlunit for the time being.

Workaround


webClient.setCookieManager(new CookieManager()
{
protected int getPort(final URL url)
{
final int r = super.getPort(url);
return r != -1 ? r : 80;
}
});

 

Metro Journey Planner Android App February 7, 2012

Filed under: Android — prazjain @ 3:54 pm
Tags: , , ,

There are loads of journey planner apps out there, but none that were good for the indian metro trains.
So I decided to write one, here is the link : https://market.android.com/details?id=com.jainpraz.apps.travel.metro.ind
Some things that I have added in this app, which are missing from other similar apps for Indian market are : ability to find shortest route from source to destination, intuitive user interface.
The user interface is much easy to use.
Try it out.
Of course feedback is welcome.
I will keep updating the application in terms of User Interface, and Functionality on time to time basis,
but if there is something urgent you will like to address soon then do drop me a line.

Currently supports Bangalore and New Delhi metros.

Cheers

 

Issues with Hadoop eclipse plugin February 2, 2012

Filed under: Hadoop — prazjain @ 10:17 am
Tags: , , ,

It has been a day that I am now playing with Hadoop and bumped into issues with Eclipse plugin for Hadoop (Later, I ran mapreduce program on command line, you can find the notes posted here).

I have Windows 7, 64 bit machine, eclipse Indigo 3.7.1, and installed hadoop 0.20.2 (why did I install 0.20.2).

Ok, and then I copied over the eclipse plugin from Hadoop directory /contrib/eclipse-plugin/hadoop-0.20.2-eclipse-plugin.jar

Everything seems fine until I try to run the application, and I see that I cannot run the application on Hadoop. You can click on Run on Hadoop and nothing happens.

Plugin version : 0.20.2

Hadoop version : 0.20.2

So then I find that this error was fixed later, I downloaded  hadoop 0.20.203 and copied over its plugin from /contrib/eclipse-plugin/hadoop-eclipse-plugin-0.20.203.0.jar  into eclipse plugin directory and started eclipse


eclipse -clean

Plugin version : 0.20.203.0

Hadoop version : 0.20.2

Now I can see that I can run my application.

But I have another issue I cannot look into DFS location! WTH Hadoop!

I get a new error : Connecting to DFS has encountered a problem. An internal error has occurred during “Connecting to DFS localhost. org/apache/commons/configuration/Configuration”

Hadoop 0.20.203.0 plugin for Eclipse  fixes one problem and introduces a new one, that I can run the application on hadoop, but I cannot see the HDFS file system through plugin. So in the end I resorted to using these commands to view files on HDFS file system.


// to view the files in the directory

hadoop fs -ls

// to delete files from a directory

hadoop fs -rm myfolder/sub2/*

// to delete empty directory

hadoop fs -rm myfolder/sub2

// to delete recursively

hadoop fs -rmr myfolder/sub2

// copy file sample.txt to HDFS from current directory in native file system

hadoop fs -put sample.txt .

// get file sample.txt from HDFS to current directory in native file system

hadoop fs -get sample.txt .

Later I managed to run my mapreduce program on command line, you can find the notes posted here.

 

java.io.IOException: Failed to set permissions of path: file:/tmp/hadoop-pjain/mapred/staging/pjain1500477637/.staging to 0700 February 1, 2012

Filed under: Hadoop — prazjain @ 10:50 am
Tags:

Ran into this exception when installing hadoop stable version 0.20.203.0 on local dev workstation and running sample examples.

Truncated Exception :


java.io.IOException: Failed to set permissions of path: file:/tmp/hadoop-pjain/mapred/staging/pjain1500477637/.staging to 0700
 at org.apache.hadoop.fs.RawLocalFileSystem.checkReturnValue(RawLocalFileSystem.java:525)
 at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:499)
 at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:318)
 at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:183)
 at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
 at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:797)
 ...
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

After banging my head in resolving this, I found out this is a known issue with this (stable!) release.

So I reverted to stable version 0.20.2 and ran the samples. And it worked fine.

Hope that helps.