Desktop search engine uses open source tools

A new desktop search and indexing tool written in Java uses several open source programming tools -- the PDF Box, Lucene and POI Apache APIs -- to search and index many of the most common types of documents. The tool supports searching HTML, Word, Excel, RTF, PDF, OpenOffice/StarOffice and plain-text documents.

As this site's resident desktop management expert, I've written about a few desktop search engines, such as the Google Desktop search application, as well as my current favorite, the Microsoft Outlook plug-in Lookout.

Relatively new to the game is DocSearcher 3.88, a desktop search and indexing tool written entirely in Java. DocSearcher 3.88 uses several open source programming tools -- the PDF Box, Lucene and POI Apache APIs -- to search and index many of the most common types of documents. The tool supports searching HTML, Word, Excel, RTF, PDF, OpenOffice/StarOffice and plain-text documents, and it performs searches within the body or supported metadata (document name, author, etc.) for documents that support it.

The program can also spider a remote Web site and create a locally searchable directory of that site.

DocSearcher requires no installation, only the presence of the Java Runtime Environment, so it can run directly from any directory. After you launch the program's .JAR file, you'll want to create an index for a given document directory. If you already have an index produced by an instance of DocSearcher, you can import it rather than regenerate it from scratch. Indexes can be updated on demand, when the program launches or after the index has aged a certain number of days. The program's search interface produces a report that can be saved as an HTML file.

The program can also create a self-contained index that you can place on a CD-ROM or DVD along with the application itself, meaning you can essentially package a set of documents on a disk with its own search engine. The program will send a notification in e-mail whenever a given index is updated. It also supports third-party document handlers, so people can write their own search handlers and implement them as needed.

The program still has a few limitations. For one, it cannot sort search results. Another is that it does not yet search Outlook .PST files (although this capability may be added down the road). The tool's author has stated plans to add support for generic XML files, Microsoft Project documents and metadata from common multimedia files (such as MP3 ID3 tags) as well.

 


Serdar Yegulalp is editor of the Windows Power Users Newsletter. Check it out for the latest advice and musings on the world of Windows network administrators -- and please share your thoughts as well!


More information from SearchWinSystems.com


This was first published in March 2006

Dig deeper on Windows Server Virtualization and Microsoft Hyper-V

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchServerVirtualization

SearchCloudComputing

SearchExchange

SearchSQLServer

SearchWinIT

SearchEnterpriseDesktop

SearchVirtualDesktop

Close