Prash's Blog

How to run Hadoop Map Reduce program from Command line February 15, 2012

Filed under: Hadoop — prazjain @ 9:59 am
Tags: ,

A few days ago I had problems with my hadoop eclipse plugin, so I had to use command line to run my hadoop programs. If you also need to run your mapreduce program from command line here it is:

1) Open your windows command prompt. First Create a jar file with your class files in it. You will also need to create a manifest file to specify the main class in the jar file.

manifest.txt contents (we are saving the manifest.txt file in same location as our class file just for sake of this example, you can have it in different location, but then you will need to give full path of manifest file when creating jar file)


Main-class: Histogram

Jar file command


C:\Users\pjain\workspace\HadoopTest\t>jar cvfm ch.jar manifest.txt *.class
added manifest
adding: Histogram$MapClass.class(in = 2201) (out= 823)(deflated 62%)
adding: Histogram$Reduce.class(in = 2197) (out= 807)(deflated 63%)
adding: Histogram.class(in = 2360) (out= 1057)(deflated 55%)

2) Start cygwin shell and use the current directory as above where you created jar file or copy over the jar file to the current directory of your shell prompt.


pjain@c0250 ~
$ cp /cygdrive/c/users/pjain/workspace/hadooptest/t/*.jar .

3) Copy the input files for the mapreduce program to hadoop file system HDFS. We first create an input directory in HDFS so it is easier to manage input files. Then copy over the input files from “in” directory in current windows directory and copy all files in it. Now you can see that the files are present in HDFS.


pjain@c0250 ~
$ hadoop fs -mkdir input

pjain@c0250 ~
$ hadoop fs -put in/* input

pjain@c0250 ~
$ hadoop fs -lsr
drwxr-xr-x - risk2000\pjain supergroup 0 2012-02-09 14:53 /user/risk2000/pjain/input
-rw-r--r-- 1 risk2000\pjain supergroup 236903179 2012-02-09 14:49 /user/risk2000/pjain/input/apat63_99.txt
-rw-r--r-- 1 risk2000\pjain supergroup 264075431 2012-02-09 14:50 /user/risk2000/pjain/input/cite75_99.txt

4) To run the jar file use the command below. Also make sure you do not have the output directory already present, hadoop will create on for you. If you have it already present then you need to delete it first.


pjain@c0250 ~
$ hadoop fs -rmr output
Deleted hdfs://localhost:9000/user/risk2000/pjain/output

pjain@c0250 ~
$ hadoop jar ch.jar /user/risk2000/pjain/input/cite75_99.txt /user/risk2000/pjain/output

Now you can run MapReduce programs through command line.

 

com.gargoylesoftware.htmlunit.ScriptException: Exception invoking jsxGet_cookie February 14, 2012

Filed under: Google App Engine — prazjain @ 2:20 pm
Tags: , ,

I was running HtmlUnit 2.9 with Google App Engine in development environment, and hit upon this exception when loading a url:

Truncated Exception is pasted below:


com.gargoylesoftware.htmlunit.ScriptException: Exception invoking jsxGet_cookie
 at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:595)
 at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:537)
 ...
Caused by: java.lang.RuntimeException: Exception invoking jsxGet_cookie
 at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:198)
 ... 65 more
Caused by: java.lang.IllegalArgumentException: Invalid port: -1
 at org.apache.http.cookie.CookieOrigin.<init>(CookieOrigin.java:58)
 ... 82 more

This is due to a bug in Google App Engine and following is the workaround to fix it in htmlunit for the time being.

Workaround


webClient.setCookieManager(new CookieManager()
{
protected int getPort(final URL url)
{
final int r = super.getPort(url);
return r != -1 ? r : 80;
}
});

 

Metro Journey Planner Android App February 7, 2012

Filed under: Android — prazjain @ 3:54 pm
Tags: , , ,

There are loads of journey planner apps out there, but none that were good for the indian metro trains.
So I decided to write one, here is the link : https://market.android.com/details?id=com.jainpraz.apps.travel.metro.ind
Some things that I have added in this app, which are missing from other similar apps for Indian market are : ability to find shortest route from source to destination, intuitive user interface.
The user interface is much easy to use.
Try it out.
Of course feedback is welcome.
I will keep updating the application in terms of User Interface, and Functionality on time to time basis,
but if there is something urgent you will like to address soon then do drop me a line.

Currently supports Bangalore and New Delhi metros.

Cheers

 

Issues with Hadoop eclipse plugin February 2, 2012

Filed under: Hadoop — prazjain @ 10:17 am
Tags: , , ,

It has been a day that I am now playing with Hadoop and bumped into issues with Eclipse plugin for Hadoop (Later, I ran mapreduce program on command line, you can find the notes posted here).

I have Windows 7, 64 bit machine, eclipse Indigo 3.7.1, and installed hadoop 0.20.2 (why did I install 0.20.2).

Ok, and then I copied over the eclipse plugin from Hadoop directory /contrib/eclipse-plugin/hadoop-0.20.2-eclipse-plugin.jar

Everything seems fine until I try to run the application, and I see that I cannot run the application on Hadoop. You can click on Run on Hadoop and nothing happens.

Plugin version : 0.20.2

Hadoop version : 0.20.2

So then I find that this error was fixed later, I downloaded  hadoop 0.20.203 and copied over its plugin from /contrib/eclipse-plugin/hadoop-eclipse-plugin-0.20.203.0.jar  into eclipse plugin directory and started eclipse


eclipse -clean

Plugin version : 0.20.203.0

Hadoop version : 0.20.2

Now I can see that I can run my application.

But I have another issue I cannot look into DFS location! WTH Hadoop!

I get a new error : Connecting to DFS has encountered a problem. An internal error has occurred during “Connecting to DFS localhost. org/apache/commons/configuration/Configuration”

Hadoop 0.20.203.0 plugin for Eclipse  fixes one problem and introduces a new one, that I can run the application on hadoop, but I cannot see the HDFS file system through plugin. So in the end I resorted to using these commands to view files on HDFS file system.


// to view the files in the directory

hadoop fs -ls

// to delete files from a directory

hadoop fs -rm myfolder/sub2/*

// to delete empty directory

hadoop fs -rm myfolder/sub2

// to delete recursively

hadoop fs -rmr myfolder/sub2

// copy file sample.txt to HDFS from current directory in native file system

hadoop fs -put sample.txt .

// get file sample.txt from HDFS to current directory in native file system

hadoop fs -get sample.txt .

Later I managed to run my mapreduce program on command line, you can find the notes posted here.

 

java.io.IOException: Failed to set permissions of path: file:/tmp/hadoop-pjain/mapred/staging/pjain1500477637/.staging to 0700 February 1, 2012

Filed under: Hadoop — prazjain @ 10:50 am
Tags:

Ran into this exception when installing hadoop stable version 0.20.203.0 on local dev workstation and running sample examples.

Truncated Exception :


java.io.IOException: Failed to set permissions of path: file:/tmp/hadoop-pjain/mapred/staging/pjain1500477637/.staging to 0700
 at org.apache.hadoop.fs.RawLocalFileSystem.checkReturnValue(RawLocalFileSystem.java:525)
 at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:499)
 at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:318)
 at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:183)
 at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
 at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:797)
 ...
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

After banging my head in resolving this, I found out this is a known issue with this (stable!) release.

So I reverted to stable version 0.20.2 and ran the samples. And it worked fine.

Hope that helps.

 

java.lang.VerifyError: Expecting a stackmap frame at branch target January 19, 2012

Filed under: Google App Engine,Java — prazjain @ 6:30 pm
Tags: ,

If you get this same error :


WARNING: Error for /doit
java.lang.VerifyError: Expecting a stackmap frame at branch target 27 in method com.yourpackage.YourServlet.doGet(Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V at offset 11
 at java.lang.Class.getDeclaredConstructors0(Native Method)
 at java.lang.Class.privateGetDeclaredConstructors(Class.java:2404)
 at java.lang.Class.getConstructor0(Class.java:2714)
 at java.lang.Class.newInstance0(Class.java:343)
 at java.lang.Class.newInstance(Class.java:325)
 at org.mortbay.jetty.servlet.Holder.newInstance(Holder.java:153)
 at org.mortbay.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:428)
 at org.mortbay.jetty.servlet.ServletHolder.getServlet(ServletHolder.java:339)
 at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1166)
 at com.google.appengine.tools.development.HeaderVerificationFilter.doFilter(HeaderVerificationFilter.java:35)
 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
 at com.google.appengine.api.blobstore.dev.ServeBlobFilter.doFilter(ServeBlobFilter.java:60)
 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
 at com.google.apphosting.utils.servlet.TransactionCleanupFilter.doFilter(TransactionCleanupFilter.java:43)
 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
 at com.google.appengine.tools.development.StaticFileFilter.doFilter(StaticFileFilter.java:122)
 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
 at com.google.appengine.tools.development.BackendServersFilter.doFilter(BackendServersFilter.java:97)
 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
 at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
 at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
 at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
 at com.google.appengine.tools.development.DevAppEngineWebAppContext.handle(DevAppEngineWebAppContext.java:78)
 at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at com.google.appengine.tools.development.JettyContainerService$ApiProxyHandler.handle(JettyContainerService.java:362)
 at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:326)
 at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
 at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
 at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
 at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

&nbsp;

then you need to add an attribute -XX:UseSplitVerifier to your Default VM arguments.

On Eclipse : Windows -> Preferences -> Java -> Installed JRE’s

And selected and edit, the one that you are using, to include -XX:-UseSplitVerifier

Hope that helps

 

oauth.signpost.exception.OAuthNotAuthorizedException: Authorization failed (server replied with a 401). This can happen if the consumer key was not correct or the signatures did not match January 6, 2012

Filed under: Android — prazjain @ 1:00 pm
Tags: , , ,

I got this error when trying to post a simple tweet from my android app into twitter.

I had a sample test app created in twitter that I was using for a couple of days and it worked fine with all its consumer keys / secrets. But just before I was about to release the android app and created a new twitter app for use in release BOOM! I get this error! All I did was replace the CONSUMER KEY  and CONSUMER SECRET.

I searched the net all over the place but nothing could solve my issue.

Here is the resolution that worked.

In Twitter you can create apps that are either Browser based or Client based. Earlier twitter had a mechanism where it will ask the user to choose their app type. As seen in the below screenshot.

Here user can easily specify whatever type of application he intends to make. Also the callback url is optional because your android app application will overwrite it anyways to receive a callback.

Previous version of Twitter's Application creation page

Previous version of Twitter's Application creation page

Previous version of Twitter’s Application creation page

The new Twitter application creation page looks like this :

New Version of Twitter's application creation page

New Version of Twitter's application creation page

Here still the callback url is optional but what is not mentioned here in description is, this field is also used to determine if your application is a Client or Browser based application.

So only if you fill callback url for your application, it will be considered as browser based application else it will be considered as a client application.

In my case in the first test app I had set callback url but when I created a second app on twitter I left it out (because it is an optional field) and hence the Authorization failed.

So remember to give your application Read and Write Access and assign it a callback url so twitter will know that it is a browser based application.

I hope that helps.

 

Cannot convert lambda expression to type ‘string’ because it is not a delegate type November 7, 2011

Filed under: ADO — prazjain @ 12:34 pm
Tags: , , , , ,

This error happens over and over but only after considerable time when you have forgotten how you fixed it last time.

            foreach (string colName in breakup)
            {
                IEnumerable res = from row in QueryCSVModel.FilteredData.Table
                                  select row[colName];    //error on select : Cannot convert lambda expression to type 'string' because it is not a delegate type
                //do something here
            }

This is the solution specified here  (rewriting here in case it helps you)

  • Add reference to System.Data.DataSetExtensions
  • Add using for System.Data and System.Linq

But this still did not work for me.

What I missed is using AsEnumerable() after DataTable, this code below works

            foreach (string colName in breakup)
            {
                IEnumerable res = from row in QueryCSVModel.FilteredData.Table.AsEnumerable()
                                  select row[colName];
                //do something here
            }

 

How to Select or Project some columns from a Datatable programmatically without SQL November 7, 2011

Filed under: ADO — prazjain @ 11:26 am
Tags: , , , , , , , ,

If you want to filter a few rows you a give a conditional expression that will be applied and you get only the resulting rows that satify the conditional expression.

But what if you have around 50 columns and you do not want to see all of them in the result after a select query.

So how do you remove the extra columns? This is how I worked around removing the columns I did not need.

Steps:

  1. Get the rows that satify the filter search criteria
  2. Import them and create a new table
  3. Now all I am doing is, if the the column name is not part of SelectedColumn (string property) then I just remove it from the data table’s column collection. (Of course you cannot remove it in the ‘for’ loop because you cannot edit the collection underlying the enumerator while you are running through the enumerator. So just store it in a temporary collection and remove them later).
  4. Thats it, you have your results!
        private void btnExecute_Click(object sender, RoutedEventArgs e)
        {
            StringBuilder queryBuilder = new StringBuilder();
            //Filter the rows that satisfy the conditional expression
            DataRow[] rows = QueryCSVModel.Data.Table.Select(QueryCSVModel.FilterCriteria);
            //Clone the table to create a new one, and import the eligible rows
            DataTable dt = QueryCSVModel.Data.Table.Clone();
            foreach (DataRow item in rows)
            {
                dt.ImportRow(item);
            }
            List<DataColumn> toRemove = new List<DataColumn>();
            foreach (DataColumn col in dt.Columns)
            {
                //Any column name that is not desired (not in selected columns) is to be removed.
                if (!string.IsNullOrWhiteSpace(QueryCSVModel.SelectedColumns) && !QueryCSVModel.SelectedColumns.Contains(col.ColumnName))
                {
                    toRemove.Add(col);
                }
            }
            // iterate over data column collection and remove the unwanted columns.
            foreach (DataColumn col in toRemove)
	        {
                dt.Columns.Remove(col);
	        }
            //thats it, you have your data with fewer columns
            QueryCSVModel.FilteredData = dt.DefaultView;
        }
     

 

OleDbException is not a valid path Make sure the path name is spelled correctly November 3, 2011

Filed under: ADO — prazjain @ 5:39 pm
Tags: , , , , ,

It has been ages since I did ADO programming. And naturally I hit this common error when loading up my CSV file in memory.

Problem:

      System.Data.OleDb.OleDbException was unhandled

      Message=’C:\CSVTasks\TXT2_RMMTRADE_20101124EOD_20101125_RERUN.CSV’ is not a valid path.

      Make sure that the path name is spelled correctly  and that you are connected to the server on which the file resides.

      Source=Microsoft JET Database Engine ErrorCode=-2147467259

Reason:

The reason this happens is if you give the full file path in the connection string or if you give just the relative file path when creating Data Adapter using select command text.

example:

            // if you have given full file path below, then it will cause this issue
            string connStr = "Provider=Microsoft.Jet.OleDb.4.0;Data Source=" + filePathFull + ";Extended Properties=\"Text;HDR=YES;FMT=Delimited\"";
            OleDbConnection conn = new OleDbConnection(connStr);
            conn.Open();

Or if you give just the file name (relative url) when loading file in adapter you will get this exception:

            // if you have given just the file name (not full path) below then it will cause this issue
            OleDbDataAdapter adapter = new OleDbDataAdapter("SELECT * FROM " + fileName, conn);
            DataSet ds = new DataSet("QueryCSV");
            adapter.Fill(ds);

Solution:

To avoid this issue

  1. You need to give the full path to the directory when your CSV or Text file resides, in the connection string.
  2. You need to give the full file path in the select command text when creating data adapter.

Code sample

    public class CSVReader
    {
        public static DataView GetData(string fileFullPath,string directoryFullPath)
        {
            // give full path to DIR here
            string connStr = "Provider=Microsoft.Jet.OleDb.4.0;Data Source=" + directoryFullPath + ";Extended Properties=\"Text;HDR=YES;FMT=Delimited\"";
            OleDbConnection conn = new OleDbConnection(connStr);
            conn.Open();
            // give full path to the file here
            OleDbDataAdapter adapter = new OleDbDataAdapter("SELECT * FROM " + fileFullPath, conn);
            DataSet ds = new DataSet("QueryCSV");
            adapter.Fill(ds);
            DataTable dt = ds.Tables[0];
            conn.Dispose();
            return dt.AsDataView();
        }
    }

 

 
Follow

Get every new post delivered to your Inbox.