Tip

Desktop search engine uses open source tools

As this site's resident desktop management expert, I've written about a few desktop search engines, such as the Google Desktop search application, as well as my current favorite, the Microsoft Outlook

    Requires Free Membership to View

plug-in Lookout.

Relatively new to the game is DocSearcher 3.88, a desktop search and indexing tool written entirely in Java. DocSearcher 3.88 uses several open source programming tools -- the PDF Box, Lucene and POI Apache APIs -- to search and index many of the most common types of documents. The tool supports searching HTML, Word, Excel, RTF, PDF, OpenOffice/StarOffice and plain-text documents, and it performs searches within the body or supported metadata (document name, author, etc.) for documents that support it.

The program can also spider a remote Web site and create a locally searchable directory of that site.

DocSearcher requires no installation, only the presence of the Java Runtime Environment, so it can run directly from any directory. After you launch the program's .JAR file, you'll want to create an index for a given document directory. If you already have an index produced by an instance of DocSearcher, you can import it rather than regenerate it from scratch. Indexes can be updated on demand, when the program launches or after the index has aged a certain number of days. The program's search interface produces a report that can be saved as an HTML file.

The program can also create a self-contained index that you can place on a CD-ROM or DVD along with the application itself, meaning you can essentially package a set of documents on a disk with its own search engine. The program will send a notification in e-mail whenever a given index is updated. It also supports third-party document handlers, so people can write their own search handlers and implement them as needed.

The program still has a few limitations. For one, it cannot sort search results. Another is that it does not yet search Outlook .PST files (although this capability may be added down the road). The tool's author has stated plans to add support for generic XML files, Microsoft Project documents and metadata from common multimedia files (such as MP3 ID3 tags) as well.

 


Serdar Yegulalp is editor of the Windows Power Users Newsletter. Check it out for the latest advice and musings on the world of Windows network administrators -- and please share your thoughts as well!


More information from SearchWinSystems.com


This was first published in March 2006

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.