Unpredictable FileWriter

grabantot

Why does FileWriter write different number of bytes? FileWriter delegates its write(int) method to StreamEncoder, but its code is not available. I know there are different encodings, but FileWriter doesn't provide a way to set one. Why should one use FileWriter, if its behavior is so strange?

public static void main(String[] args) {
    try (FileWriter fos = new FileWriter("out.txt")) {
        fos.write(127);                     //writes 1 byte (for i<128)
        fos.write(2047);                    //writes 2 bytes (for 127<i<2048)
        fos.write(Integer.MAX_VALUE);       //writes 3 bytes (for 2048<i)
    } catch (IOException ex) {
        Logger.getLogger(Experiments.class.getName()).log(Level.SEVERE, null, ex);
    }
}

Notepad shows only one symbol in file (if you comment the third fos.write there will be two symbols in notepad). So how can I make it work and read my file unambiguously?

chiastic-security

Nice little puzzle!

What's happening is that the int you're providing is being converted to a char, and then it's going through a CharsetEncoder to turn it into bytes. Since you're not specifying an encoding, I strongly suspect you're ending up with UTF-8. UTF-8 encodes characters variously as one, two or three bytes.

The conversion from int to char will leave you with a 16-bit unsigned value. You might think that this would be encoded as two bytes, but ASCII characters get encoded as they are in UTF-8, which is why anything up to 127 is being encoded as a single byte. This, of course, means that some will now need more than two bytes (by a simple counting argument). When you give it 2047, that manages to get encoded in UTF-8 as two bytes; but your last example of Integer.MAX_VALUE gets encoded as three.

Note that Integer.MAX_VALUE is being converted first to a 16-bit unsigned char, so its value is actually 65535.

The source for StreamEncoder isn't officially available, it seems, but it's there if you look for it.

What Notepad's doing, I've no idea, but I suspect it doesn't support UTF-8.

Although I've tried to explain here what's going on underneath, the bottom line is that you shouldn't be using FileWriter to write anything other than characters.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related