Unicode filenames on FAT-32?

jake.libber

As far as I understand - NTFS supports Unicode filenames (UTF-16 as Micorsoft claims?).

But official MSDN documentation is very vague regarding what codepage(s) is used to store filenames (filepaths) on FAT-32.

Here it says that OEM code page (CP437 I assume) is used to store filenames: http://msdn.microsoft.com/en-us/library/windows/desktop/dd317748.aspx

But here it turns out that there can be different OEM codepages with CP437 being one of them: http://msdn.microsoft.com/en-us/library/windows/desktop/dd317752.aspx

And we all now that utilities like mount support many more different codepages for FAT, more than just OEM codepages set.

So what is the actual cdepage for FAT-32 filenames? It depends on the system codepage at the time when FAT volume was created? Can FAT support true Double Byte Character Set codepages like UTF-16? Or Multi Byte Character Set codepages like UTF-8 is the limit?

And more specific question: What happens when I use CreateFileW function (which, as MSDN states, use UTF-16 as filename codepage) to create a file on FAT-32 volume?

Thanatos

You might have to experiment here. This is a great question, and I'm not 100% confident, but:

So what is the actual codepage for FAT-32 filenames? It depends on the system codepage at the time when FAT volume was created?

The "OEM codepage", whatever that is for the system.

Can FAT support true Double Byte Character Set codepages like UTF-16? Or Multi Byte Character Set codepages like UTF-8 is the limit?

No, I don't believe FAT is directly capable of either UTF-16 or UTF-8. That said, Microsoft stores the Unicode filename in an out of band method. A file thus has two filenames. (This is how you can have longer than 8.3 character filenames, as well.)

And more specific question: What happens when I use CreateFileW function (which, as MSDN states, use UTF-16 as filename codepage) to create a file on FAT-32 volume?

The Unicode filename, as passed to CreateFileW is stored directly in the out of band filename. It is re-encoded into the OEM codepage (whatever that happens to be on the system) and is put there. If it cannot be converted into the OEM codepage, or exceeds 8.3 characters, Windows will call the file something like, FILENA~1.TXT.

Some citations for these answers:

First, this page tells us that the OEM code page != the Windows code page:

Non-Unicode applications that create FAT files sometimes have to use the standard C runtime library conversion functions to translate between the Windows code page character set and the OEM code page character set. With Unicode implementations of the file system functions, it is not necessary to perform such translations.

On a typical American system, the OEM code page is "CP437", but the Windows code page is Windows-1252 (The FooA calls, I believe, use the Windows code page, typically Windows-1252 on an American machine, but depends on locale).

If you have a FAT volume available, you can see this in action. The character "Σ" (U+03a3) is not present in Windows-1252, however, it is in CP437. You can see both the short and long filenames with dir /X. With a file named asdfΣ.txt, you'll see:

ASDFΣ.TXT    asdfΣ.txt

However, with a file named "asdfΛ.txt" (Λ is not present in either CP437 or Windows-1252), you'll see:

ASDF~1.TXT   asdf?.txt

(You'll likely see ?, because cmd.exe's font cannot display a Λ.)

For information about long filenames, see this Wikipedia article.

Also, interestingly, if you name a file "asdf©.txt", you might get:

ASDFC.TXT    asdfc.txt

… I'm not 100% sure here, but I think Windows cleverly decided to substitute "c" for ©, and did likewise for displaying it. If you change the font to something not raster based, like Consolas, you'll see:

ASDFC.TXT    asdf©.txt

And this is why you should use the FooW functions.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Unicode normalization - filenames in text files vs filenames on filesystem

From Dev

Accessing filenames with accents / Unicode in the terminal

From Dev

32-bit unicode in python

From Dev

NTFS vs FAT32 Search Time

From Dev

Creating a zip archive with Unicode filenames using Go's archive/zip

From Dev

rsync on fat32 and ntfs

From Dev

Creating filenames with unicode characters

From Dev

Minizip and Unicode for filenames

From Dev

Normalize filenames to NFC or not (Unicode)

From Dev

FAT32 File Allocation Table size on Windows 7 formatted drive is out of FAT32 specification

From Dev

problems while formating to fat32

From Dev

Converting img file to FAT32

From Dev

problems while formating to fat32

From Dev

Converting img file to FAT32

From Dev

Can I copy files from NTFS to FAT32, and then open them using a FAT32 OS?

From Dev

Formatting FAT32 with 32KB cluster on Windows 8.1?

From Dev

Google Drive SDK and Unicode filenames

From Dev

Unicode filenames in Windows vs. Mac OS X

From Dev

How to access FAT32 HDD in Linux

From Dev

Linux, fat32 and etc/fstab

From Dev

FAT32 - Unallocated space within partition

From Dev

FAT32 / NTFS + isofs on USB

From Dev

During Windows installation it thinks that EFI partition is not in fat32, but it is in fat32

From Dev

Mount FAT32: Can't find a valid FAT filesystem

From Dev

Windows does not show FAT32 partition

From Dev

Install /boot on fat32 partition

From Dev

File order on FAT/FAT32/VFAT file systems

From Dev

What is this FAT32 partition on GParted?

From Dev

calculating maximum volume size of fat 32 and HDFS