Has you ever looked for an easy way to add searching capabilities to your web site or application?

For some reason when I started looking a year ago I didn’t have much luck. I was fairly familiar with an Apache Product named Lucene (probably one of my favorite product names ever… I just love how it sounds), but it really didn’t seem to work well with web applications. It seems much more useful to search documents (word docs, text, pdf, etc) and index them for a search. A very useful tool for a company portal, but not for a website that is built with dynamic content or has jsp includes.

Then I stumbled across another Apache Project named Nutch (pretty neat name also).

Nutch is open source web-search software. It builds on Lucene Java, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc.

In a nutshell, that means you can search your website or application with Lucene :) Probably may not be really useful for smaller sites, but some of our applications have around 100 pages of content so users always appreciate a search option. Nutch is still in the early stages but we have used it a few times and it seems very solid already. The crawling part can be a bit confusing if you have to setup CYGWIN on your Windows OS for the first time. But once that step is completed and you index a few sites it becomes pretty easy. As with most Open Source tools, the documentation is a bit tricky but give it a shot if you need to integrate search capabilities into your applications!

Here are a couple examples of sites that use Nutch:

OfficeGateway
More of a basic implementation for searching the website for information.
OfficeGateway.ca

Krugle
A more in depth implementation that searches thousands (guessing) of files when you enter a search criteria.
Krugle.com

Searching may not seem to important to some, but it seems to come up quite a bit in our projects… so I guess it might just be important after all! Check out Nutch if you need to find a solid search feature today!

Cheers,
Adrian