Prash's Blog

Search Engine in Java August 5, 2009

Filed under: Java — prazjain @ 3:45 pm
Tags: ,

In this post I would try to dissect the Search Engine I built in Java during my free time in College (that was 2005.. ah those were the days).

Do not worry if you are a first timer and do not know how search engines work,I would try to put in additional references to the articles that I found helpful when I was building mine (that was around 4 years back!) and I had zilch knowledge about how Search Engines really worked.

When I tried to take this up, I still had to worry about my daily classes, quizes, end terms and gradesĀ  :), so I tried to do the best I could do in the month and a half to build mine. In that time I had to research all I could about various aspects of a search engine, and also know what components I could develop on my own in the time constraints I had. So for few things I went for off-the-shelf libraries that were available.

This is how I started, and would recommend this as a mandatory study for knowing what components are there in a search engine and how they are glued together. While I was doing this, I never really had any interaction with anyone else about it or never really had any other resource to know what Search Engine is about, so if you are in the same boat then you do not have to worry. Just follow the link (and go through it as many times as needed until you feel comfortable) :

http://infolab.stanford.edu/~backrub/google.html

I am a believer in Head Fake learning. Through project like this and a few others I got way better at concepts like multi-threading and caching across multiple threads.

I would start digging out the source code from wherever I stored it, and then write out a multi-series article explaining it. This whole series could take a while to complete.

Update after 6 months of writing this post : Search engines are no longer the hot thing anymore and there are quite of few of them available in open source now, so I would save myself some effort here and close this topic by saying that unless someone here wants to know it, I would not be publishing it on my blog.

Cheers

Advertisements