Signature-based data recovery: A last ditch technique

If your file allocation table has been destroyed, your best bet is the signature-based recovery method of data recovery. But it's tedious and has the potential to inflict further damage. Use it only if you're out of options.

So far the data recovery techniques discussed in this guide all have one thing in common: They depend on your ability

to read at least one copy of the disk's file allocation table. Therefore, as valuable as they are, they don't do you a whole lot of good if your file allocation table has been completely destroyed.

Now I want to show you one last data recovery technique that you can use if the file allocation table is damaged or missing, or if all the other recovery techniques have failed.

Note: There's a reason why I refer to this technique as a last-ditch effort. It works, but it's tedious and has the potential to inflict further damage. Use this method only if you're out of options.

Windows uses a file's extension to determine which application the file is associated with. For example, if you double-click on a file with the .doc extension, Windows will attempt to open the file in Microsoft Word.

But there is more to a file type than just the extension. Suppose you renamed a .pdf file so it now has a .doc extension. If you double-clicked on the file, Windows would attempt to open the file in Microsoft Word, but would probably tell you that the file isn't a valid Word document. No matter what a file is named, Microsoft Word can tell the difference between a Word document and other types of files, and so can most other applications. This is because most data files have headers that identify which application the file belongs to.

From a data recovery standpoint, this means that many file types have a unique signature. The file's header is a string of bytes at the beginning of the file that uniquely identifies the file type. Some types of data files also use a unique string of bytes at the end of the file. Therefore, if you wanted to recover all the Microsoft Word documents from a corrupt hard drive, you could use Norton Disk Editor to do a search on the signature associated with a Word document. Once you locate a file, you would then copy it to another disk. This is easier said than done. Before you can even think about doing a signature-based recovery, you need to determine what the signatures are.

Determining file signatures for the data recovery process
Figuring out what the signatures are is by far the most difficult part of the recovery process. This is because every file type uses a different signature technique. Some file types place the signature at the beginning of the file. Some put it at the end of the file. Some do both.

As if this wasn't confusing enough, there may be a few bytes before or after the signature. In other words, the signature isn't necessarily at the very beginning or the very end of the file. Furthermore, some files have multiple possible signatures (RTF files have at least three possible signatures). Luckily, you won't run into that with most file types.

The trick to determining a file's signature is to use known files to figure out what the signature is. To do so, you will want to install a spare hard drive into your recovery machine. Format the hard drive using the same file system as the disk that you will be recovering. Also, use a full format (not a quick format) so that any existing data on the drive is overwritten. You don't want left-over data complicating things for you.

Once you've prepared the drive, you must copy a few files to it. For example, if you need to recover some documents created in Word 2003, use Word 2003 to create a few sample files on your spare hard disk. (Tip: Avoid creating large files; smaller files are easier to analyze.)

Now take a look at the first and last 50 or so bytes of each of those sample files. Look for anything the files might have in common. (Note: Use more than just a few files, to avoid having bytes that are identical just because of a coincidence.)

To show how the process works, I have copied a bunch of .jpg files from my digital camera onto a disk in order to determine the signature of a .jpg file. The comparison looks something like this:

Filename

 

Bytes

 

DSC01709.JPG FF D8 FF E1 28 24 45 78 69 66 00 00 49 49 2A 00
DSC01710.JPG FF D8 DD E1 28 C7 45 78 69 66 00 00 49 49 2A 00
DSC01711.JPG FF D8 FF E1 29 FD 45 78 69 66 00 00 49 49 2A 00
DSC01712.JPG FF D8 FF E1 21 F7 45 78 69 66 00 00 49 49 2A 00
DSC01713.JPG FF D8 FF E1 22 E7 45 78 69 66 00 00 49 49 2A 00
DSC01714.JPG FF D8 FF E1 29 76 45 78 69 66 00 00 49 49 2A 00

You can see a good deal of consistency among the first several bytes of the file. In fact, there is enough consistency that you could use 45 78 69 66 00 00 49 49 2A 00 as a signature. Although the first few bytes match as well, there are a couple of bytes that don't, so you want to base the signature on the largest possible chunk of consistent bytes for the best accuracy.

Note: The signature above is based on .jpg files created by a Sony digital camera, and is probably different from that used by other types of .jpg files. In fact, the signature used by a raw .jpg file is 4A 46 49 46. In older .jpgs, this signature is found near the beginning of the file. The files created by my camera do contain this string, but it is found deep within the file. Therefore, it's probably easier to search on the header string.

Performing signature based data recovery
Suppose you determined that the signature listed above was valid for the .jpg files you wanted to recover. The actual recovery process is simple. Note: The technique I'm about to share assumes that the files in question are not fragmented. If you have fragmentation in the files you're trying to recover, you're better off with a commercial disk recovery solution, such as Ontrack's Easy Recovery Professional, which can perform a signature-based recovery.

To recover data yourself, boot your system using the floppy we created earlier in this guide. Now go into Norton's Disk Editor and select the Cluster command from the Object menu. Enter the disk's full cluster range, and click OK. Once Disk Editor loads the disk's contents, select the Find command from the Tools menu. Click on the text box in the Hex section and enter the file's signature in hexadecimal format. Click the Find button and Disk Editor will locate the first occurrence of the signature that you have entered.

Now, using the research that you performed earlier, determine where the file starts. Click on the beginning of the file. Select the Mark command from the Edit menu. Now scroll to the end of the file, which will usually be designated by some blank space or a bunch of periods. (Note: For larger files, you may be scrolling for a long time.)

When you find the end of the file, select the Write Object To command from the Tools menu. When the Write dialog box appears, select the File option and click OK. You'll be prompted to enter a filename and a path of where you want to write the recovered file.

Uncovering file names
As you can see, the signature-based recovery method is fairly straightforward. One problem is that it does not allow you to recover filenames. You won't know which files you have recovered until you actually open them.

Not knowing which files you have recovered isn't always a big deal. If you know you've recovered a bunch of Microsoft Word documents, you can name them File1.doc, File2.doc, File3.doc, etc. You could then open each one individually and rename the recovered documents to a more appropriate file name.

But there are times when not knowing exactly what you have recovered can be a more serious problem. For example, some versions of Microsoft Office use identical file headers for each application in the suite (Word, Excel, PowerPoint, etc.) So you could potentially recover a file that you know is a Microsoft Office document, but not have a clue as to how to name the file because you don't know what type of Microsoft Office document it is. As if that weren't confusing enough, some versions of Quattro Pro supposedly also use the same file header as some Microsoft Office documents.

You could just start out by naming the recovered files with a .doc extension. If some files don't open in Word, you could try other extensions until all the files eventually open. But that takes a lot of time, and could expose you to risk. For example, if you're performing a recovery on someone else's PC, you have no idea whether or not their Microsoft Office documents are safe, or if they're infected with macro viruses. Blindly opening unknown files is risky.

During signature-based recoveries, you often find files on the disk that do not match any known file-type signatures. You'll want to recover these files in case they're something important, but you'll have no idea what you are recovering.

The solution? Use a file viewer to check out the files you've recovered. One utility that works well for signature-based recoveries is a utility called Quick View Plus. This tool determines the file type by looking at a file's contents, not its file extension. This means it can figure out what a file is, even if you can't. Furthermore, since Quick View Plus opens files within a viewer rather than in the application that created the file, you don't have to worry about launching a macro virus.

As you can see, there is a way of recovering files, even if your system's file allocation table has been damaged beyond repair. Signature-based recovery techniques work, but for the reasons listed above, should be used only as a last ditch attempt at data recovery.


Data Recovery Techniques for Windows
- Introduction
- How to recover data
- How to create a boot disk to run Norton Disk Editor
- How disk cluster size affects data recovery processes
- How long file names complicate data recovery
- How to recover deleted files on FAT via Disk Editor
- How data recovery for NTFS differs from FAT
- How to recover data from corrupt NTFS boot sector
- Signature-based data recovery: A last ditch technique

About the author: Brien M. Posey, MCSE, is a Microsoft Most Valuable Professional for his work with Windows 2000 Server, Exchange Server and IIS. He has served as CIO for a nationwide chain of hospitals and was once in charge of IT security for Fort Knox. He writes regularly for SearchWincomputing.com and other TechTarget sites.

This was first published in June 2006

Dig deeper on Windows Disaster Recovery and Business Continuity

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

SearchServerVirtualization

SearchCloudComputing

SearchExchange

SearchSQLServer

SearchWinIT

SearchEnterpriseDesktop

SearchVirtualDesktop

Close