If you just want the application then skip directly to the Download section at the bottom of this page.
On August 8, 2006 Microsoft released KB920958, a security update for Windows 2000 SP4. My system's Windows Update service picked this up on August 11th and installed it. Though I was not aware of the connection at the time I started noticing blocks of “rainbow” garbage across the bottom of some images such as with the picture of two cats to the right. At first I either ignored it, as I was busy with other tasks, or would re-download the file, see that it looked ok, and move on.
On Friday September 24th I noticed some corrupted images that I knew I had fixed earlier in the week and started digging into what was happening. After much swapping of hardware and other tests I discovered the fault was with Microsoft's NTFS file compression and Google then quickly pointed me to a hotfix (KB925308) which I installed. This fix seems to work and files are no longer getting corrupted though files previous corrupted are still broken. At the time I wrote ScanDFMicrosoft had not included the KB925308 fix with their automatic updates but seems to have done so at a later date..
While installing the hotfix resolved further corruption of my data I still had the problem of locating the existing corrupted files so that I could recover them from backup sources. To locate these files I wrote “scandf” which a small application that scans and reports on files that seem to be corrupted.
Scandf
Scandf is a command line utility that will scan a directory tree and report on files that seem to be corrupted. The name of this utility comes from that the end of the corrupted files are filled with hexadecimal character value DF. When the Microsoft file compression code is unable to decompress a file it returns a stream of DF bytes rather than the file data you would expect. Oddly, Microsoft's decompression code did not log any errors about its failure to return the expected data for a file.
The link for downloading scandf is at the bottom of this page.
Typical usage from a command prompt is:
scandf c:\ >scan.txt
Scandf will scan the entire C: drive and write the results to scan.txt which you can then inspect. Note that when the output is redirected to a file that scandf also logs to stderr so that you can view the progress of the scan.
Usage is:
scandf [flags] filename [filename...]
The flags are:
- /b or /bare - By default scandf notes how many bytes were corrupted in each file and also writes a summary at the end. If you need a plain list of the corrupted files for processing with other applications then use the “/bare” option.
- /c or /compressed - By default scandf checks all files on the assumption that you may have received a corrupted file from someone and saved it to a non-compressed folder or that you have already turned compression off in an effort to prevent further corruption. You can cause scandf to check only the compressed files by including /compressed on the command line.
- /p or /progress - Show the directory names as we are scanning in the window's title bar. This is enabled by default. Use /p to disable showing the directory names during the scan.
- /x:# or /size:# - (default is /size:3600). My own testing found that the only files affected by this issue had between 3639 and 4095 bytes in the last 4096 byte block. Scandf checks the file size and by default only checks those files that have between 3600 to 4096 bytes in the last block. You can override this behavior with the /size argument. Use /size:0 to force scandf to scan all files that have between 1 and 4096 bytes in the last block. Note that it is possible for a regular (uncorrupted) file to end in 0xDF bytes though I consider it unlikely that a normal date file would have 3,600 or more bytes of 0xDFs at the end.
- /s or /subdir - Scandf scans sub-directories by default. Use /-s to disable scanning of sub-directories.
- /v or /verbose - Emit extra debugging output.
The filename argument(s) can be either directories such as C:\ or you can specify files such as *.jpeg. You can combine this to check all JPEG images on drive C: by using C:\*.JPEG.
Note that scandf processes the arguments from left to right. Something like “scandf c:\ /bare” would cause scandf to first scan C:\ with normal output and then it would switch to bare mode.
How scandf works
The NTFS file compression code organizes files into 4096 byte blocks. If the last block of the file contains 3639 to 4095 bytes then there is a chance it may be corrupted. The corruption is that the contents of the entire last block will get filled with hex value DF. Scandf first checks the file length and if the last block contains at least 3600 bytes then it reads the the last block of the file to see if it is entirely "DF" bytes. If so, scandf reports that file is possibly corrupted. The scanner is fairly fast as it can skip roughly 88% of the files as they contain less than 3600 bytes in the last 4096 byte block and even with the files it's checking the scanner only needs to read the very end of each file.
Version Notes
- Version 1.0 - September 25, 2006
- Initial public release of scandf.
- Version 1.11 - September 26, 2006
- REPORT_MAX was increased from 4085 to 4086 bytes based on a user report. If a file has less than 3641 or more than 4086 bytes of overwritten data then scandf writes a message at the end of the scan asking that I get e-mailed with the numbers. I'm collecting the data as I'm curious on the range of how much or little data in each file can be corrupted by KB920958.
- The directory scanner now always does two passes on each directory. With version 1.0 scandf would look at the file search pattern and if it was not “*” or “*.*” then it would run two passes in each directory. On the first pass scandf uses the user supplied search pattern, *.jpeg for example, and ignores the sub-directories. If sub-directory scanning is enabled scandf would then do a second pass searching “*” to list all sub-directories and would scan each of them. If the original search pattern was “*” or “*.*” then scandf could do it all in one pass but then mean that files would be listed “out of order.” For example, if you had files named A, B, C, E, F, G and a sub-directory named D that contained X, Y, and Z the files would get scanned as A, B, C, X, Y, Z, D, E, F. Now that scandf always does two passes the scan order is A, B, C, E, F, G, X, Y, Z meaning the all top level files will get scanned before moving on to sub-directories.
- Version 1.12 - September 27, 2006
- I replaced the “ScanDir” progress logging with instead writing it to the console title which allows me to watch the scanning progress while also better seeing error messages and/or corrupted files being found.
- Version 1.13 - October 2, 2006
- Added the # of directories scanned to the normal mode summary at the end.
- Cleaned up the layout of the verbose mode summary.
- Only show progress in window title if _bShowProgress is set.
- Added support for flag- with the hyphen after the flag.
- Version 1.14 - November 20, 2006
- REPORT_MIN was lowered from 3641 to 3639 bytes based on a user report.
- REPORT_MAX was increased from 4086 to 4089 bytes based on a user report. If a file has less than 3639 or more than 4089 bytes of overwritten data then scandf writes a message at the end of the scan asking that I get e-mailed with the numbers. I'm collecting the data as I'm curious on the range of how much or little data in each file can be corrupted by KB920958.
- Version 1.15 - November 28, 2006
- REPORT_MAX was increased from 4089 to 4094 bytes based on a user
report. If a file has less than 3639 or more than 4094 bytes of
overwritten data then scandf writes a message at the end of the scan
asking that I get e-mailed with the numbers. I'm collecting the data as
I'm curious on the range of how much or little data in each file can be
corrupted by KB920958.
- November-2008: I was working on a customer's Windows 2003 SBS server
and noticed that it had been infected by the same issues brought up by
the Windows 2000 KB920958 update. Unfortunately, the customer did not
know the history of the server well (their IT people had changed) and
so I don't know if the corruption of this server occurred when it was
was copied to a Windows 2000 machine or if the issue issue affected
Windows 2003.
- Version 1.16 - September 17, 2009
- REPORT_MAX was increased from 4094 to 4095 bytes based on a user report. If a file has less than 3639 or more than 4095 bytes of overwritten data then scandf writes a message at the end of the scan asking that I get e-mailed with the numbers. I'm collecting the data as I'm curious on the range of how much or little data in each file can be corrupted by KB920958.
- The web-link at the end of the built in help page was updated to point at the current home for ScanDF.
Download
Scandf-1.16.zip version 1.16 (18,699 bytes, md5 2227e0903e170672bf8e463a721c9f38). You can copy scandf.exe to any directory in your path such as c:\windows\system32. Please note that this is a console application and not a Windows GUI meaning you can't just double-click on scandf.exe to use it. You instead need to start a Command Prompt (press the Windows key plus the R key and then type cmd at the pop-up box) and would use scandf from there.
Scandf was written in C using Microsoft's Visual C++ 6.0. The source code is available though if you make any changes, improvements, or bug fixes then please let me know.
This software was developed with great care. However, I can not guarantee that it will run on your computer(s) flawlessly. I am providing the utility “as is” without warranty of any kind. However, I am also providing the source code so that you can see what the utility does and to allow you to build your own version if desired. Please do not mirror or copy this software onto other sites. (Making your own private copies is ok).
Contact
If you wish to e-mail me about scandf then please mention “scandf” in either the subject line or message body to get through my spam filter.