Wipe up link rot with Web-based HTML error checker
Category: Web site link checker
Name of tool: HTML Toolbox
Company name: NetMechanic Inc.
Price: free to $200 and more, depending on the number of pages on your site
Windows platforms supported: All
Quick description: A Web-based service providing a series of site tune-up tools that check for broken links, misspellings, and other common errors.
*** = Hey, not bad. One notch below very cool
All done via a browser. There is no software to download and install (see cons). The product tests for bad links, HTML coding errors, and other mistakes on your Web pages.
Extremely easy and straightforward to use. You go to their web site and click on a button to start the testing process.
Reports should take into account the Microsoft XML page syntax and newest Netscape browser versions.
Every web user is familiar with the situation when we go to a Web page, only to be stopped by a broken link or other more annoying mistakes. The challenge for web site operators is to keep up with the ever-changing nature of the web. That means checking your links, correcting grammar and spelling errors, and other mundane housekeeping chores.
Back when I began my own Web site at strom.com several years ago, I had a rather foolhardy notion of zero tolerance for mistakes. Today, I am more forgiving, but it still sticks in my craw when I come across a rotten link on my pages. Often the problem isn't my doing -- someone Out There in the great cyber universe has reorganized their web site, and an outbound link of mine goes nowhere now.
Over the years I have tried a wide variety of tools to combat what is called colorfully link rot. One method is the most tedious but guaranteed to work all the time: go to your site, bring up a page, and start clicking on the links therein. Then go to another page, and another. It is time consuming, extremely tedious, but you'll find the rotten links almost every time.
If you are looking for a more automated technique, then you need a link checker. There are literally hundreds of tools and services available. Over the years I have used a number of link checker products, and the trouble is trying to wade through their reports to find the really critical errors that need fixing. Some of the products are so picky as to be useless, and others operate at too gross a level to do much good either.
One product that strikes a nice balance is from NetMechanic, called HTML Toolbox. It provides just the right kind of information, and its reports are fairly easy to read and to act on.
HTML Toolbox actually runs five separate tests: a link checker that finds dead links; HTML Check, to spot and fix HTML coding errors; Browser Compatibility Check, to find unsupported HTML tags in both Netscape and Microsoft browser versions; Load Time Check, to find pages which are slow to load; and a spelling checker. The tests vary in their utility.
The toolbox is actually a service: there is no software for you to download, and everything runs inside your browser. You submit your URL for analysis and in a few minutes you will get an email notification with a link to your reports. There are two basic types of reports: a summary, showing the specific pages and a 1-5 "star" rating, along with the number of errors reported and a link to more detailed report for each particular page that was scanned by the service.
You can perform a one-time test on up to five pages on your site for free, and upgrade for $35 to check up to 100 pages on your site. The fee-based products also operate on a fixed schedule (weekly, biweekly, or monthly), so you can periodically check your site and get reports emailed to you with the results. That is perhaps the most valuable service, because you can never check your pages too often, given how links can change so quickly. For example, when I ran the tool on one of my pages, I found out that the magazine Web Review had changed all of its archives of my previously published articles, and many articles were temporarily gone. Such is the nature of the web.
Thus, as you would suspect, the main focus of the toolbox is the link checker report. The free version can test up to 25 links on a particular page, while the subscription version can test up to 5000 links. The report shows the line number, the link itself (with a hyperlink to the actual site, so you can see for yourself if it is broken), and a status indicator.
There is another report that attempts to do more than check, and insert in-line comments or corrections to your HTML syntax. I found this one less than useful, particularly as it couldn't deal with the XML tagging from Microsoft Word, which is one of the programs I use to generate my web pages. (From my perspective, the Microsoft version of XML is enough of a standard that the service should recognize it as such.)
If you pay a fee, this tool will also make changes to your HTML coding, and allow you to compare the old and new versions side by side. That sounds good in theory, but since most of my Web pages don't have hard line breaks, the side-by-side comparison doesn't work well since the new code comes with these embedded line breaks.
The reports for page load time shows the average time it will take for various browser connections to view your page in seconds, and also show the locations of each of the different web servers that are used to make up the data on your pages. That can be useful, particularly if you have a link from somebody else's web server that you have forgotten about on your page.
I was also less fond of the spell checker test. It didn't fare well with my site, given that I have lots of company names and computer terms that didn't parse its dictionary. You can create a custom dictionary and load it as part of the service, but that may be more trouble than it is worth. And the browser compatibility test neglects to include the latest Netscape version in its reports. Even if it did, the report is still hard to understand.
Overall, link checking is still far from an exact science, but NetMechanic's service is a good start and can help you keep your links up to date on your site. And it is priced fairly and the reports are clear enough to use in your regular site maintenance routines.
**** = Very cool, very useful
*** = Hey, not bad. One notch below very cool
** = A tad shaky to install and use but has some value.
* = Don't waste your time. Minimal real value.
Bio: David Strom is president of his own consulting firm in Port Washington, NY. He has tested hundreds of computer products over the past two decades working as a computer journalist, consultant, and corporate IT manager. Since 1995 he has written a weekly series of essays on web technologies and marketing called Web Informant. You can send him email at firstname.lastname@example.org.