Wednesday 2 November 2011

NTFS & Long/short file names


I have found an issue with NTFS and long/short file names which was previously unknown to me. This is going to involve some brain-melting, but stick with it...

This is based on an actual scenario I have personally encountered where several million files had to be moved from one disk to another. The real original file names have been changed for the sake of clarity and anonymity.


Take two files in an NTFS folder. One is called Test Document.doc and one is called TESTDO~1.DOC. It doesn't matter how these files came to be. The timestamp on the files doesn't matter either, apart from the fact that TESTDO~1.DOC was the first file created in the folder, which is how it has its name.

If a command prompt window is opened in the folder in question, a 'dir/x' will show us that the short file name for Test Document.doc is TESTDO~2.DOC. TESTDO~1.DOC doesn't have a short file name per se, because it already qualifies as an 8.3 filename. All good so far.

C:\LFNTest>dir/x
 Volume in drive C is DISK
 Volume Serial Number is 1234-ABCD

 Directory of C:\LFNTest
28/10/2011  10:17    <DIR>                       .
28/10/2011  10:17    <DIR>                       ..
28/10/2011  10:06    <DIR>                       Result
28/10/2011  10:05            44,032 TESTDO~2.DOC Test Document.DOC
08/09/2011  14:30            33,792              TESTDO~1.DOC
               2 File(s)         77,824 bytes
               3 Dir(s)  12,345,678,910 bytes free


The important point comes up when the need arises to copy them to another folder. Robocopy was chosen as the tool to use, but the same result is achieved with a basic copy/xcopy command, so it can be said that it is something happening in the underlying NTFS.

In this case, let's use the old reliable 'copy' command, but we will use the '/y' switch so that we're not prompted about over-writing files, thus partly replicating the behaviour of Robocopy. There is a folder called Result which is a subfolder of our test folder.

C:\LFNTest>copy /y *.* result

Test Document.DOC
TESTDO~1.DOC
    2 file(s) copied.

Now let's examine the subfolder

C:\LFNTest>dir Result
 Volume in drive C is DISK
 Volume Serial Number is 1234-ABCD

 Directory of C:\LFNTest\Result
28/10/2011  10:22    <DIR>          .
28/10/2011  10:22    <DIR>          ..
08/09/2011  14:30            33,792 Test Document.DOC
               1 File(s)         33,792 bytes
               2 Dir(s)  12,345,645,118 bytes free


Hang on, where's the other file gone to? Let's do a 'dir/x'.

C:\LFNTest>dir Result
 Volume in drive C is DISK
 Volume Serial Number is 1234-ABCD

 Directory of C:\LFNTest\Result
28/10/2011  10:22    <DIR>                       .
28/10/2011  10:22    <DIR>                       ..
08/09/2011  14:30            33,792 TESTDO~1.DOC Test Document.DOC
               1 File(s)         33,792 bytes

               2 Dir(s)  12,345,645,118 bytes free

So which file is it? An examination through Windows Explorer would lead one to think that it is the document with the long file name, but the timestamp and file size tells a different story.

What appears to be happening is that Test Document.DOC is copied first. There is no file in the destination folder with a short file name of TESTDO~1.DOC, so this short file name is assigned to the file. NTFS then copies the real TESTDO~1.DOC. As it is already in the short file name format, and the short file name is already in use in the destination folder, it is treated as an update to the existing file. The long file name remains that of Test Document.DOC, but the contents are those of TESTDO~1.DOC.

A potential workaround to this is to manually copy TESTDO~1.DOC before the other file and then the short file name will not be available when the other file is copied. However, this is not really an option when a very large number of files is being copied. The option of renaming TESTDO~1.DOC was discarded because of a requirement to keep the filenames in their original state. The option of renaming purely for the copy procedure and then changing the name back once the copy had completed was discarded because of a requirement to maintain the timestamp of the folder in which the files are stored. A rename will change the timestamp to the current time (Robocopy has a switch to allow the retention of the timestamp of the files and folders being copied).

No comments:

Post a Comment

Post a Comment