Why does FileWriter
write different number of bytes? FileWriter
delegates its write(int)
method to StreamEncoder
, but its code is not available. I know there are different encodings, but FileWriter
doesn't provide a way to set one. Why should one use FileWriter
, if its behavior is so strange?
public static void main(String[] args) {
try (FileWriter fos = new FileWriter("out.txt")) {
fos.write(127); //writes 1 byte (for i<128)
fos.write(2047); //writes 2 bytes (for 127<i<2048)
fos.write(Integer.MAX_VALUE); //writes 3 bytes (for 2048<i)
} catch (IOException ex) {
Logger.getLogger(Experiments.class.getName()).log(Level.SEVERE, null, ex);
}
}
Notepad shows only one symbol in file (if you comment the third fos.write there will be two symbols in notepad). So how can I make it work and read my file unambiguously?
Nice little puzzle!
What's happening is that the int
you're providing is being converted to a char
, and then it's going through a CharsetEncoder
to turn it into bytes. Since you're not specifying an encoding, I strongly suspect you're ending up with UTF-8. UTF-8 encodes characters variously as one, two or three bytes.
The conversion from int
to char
will leave you with a 16-bit unsigned value. You might think that this would be encoded as two bytes, but ASCII characters get encoded as they are in UTF-8, which is why anything up to 127 is being encoded as a single byte. This, of course, means that some will now need more than two bytes (by a simple counting argument). When you give it 2047, that manages to get encoded in UTF-8 as two bytes; but your last example of Integer.MAX_VALUE
gets encoded as three.
Note that Integer.MAX_VALUE
is being converted first to a 16-bit unsigned char
, so its value is actually 65535.
The source for StreamEncoder
isn't officially available, it seems, but it's there if you look for it.
What Notepad's doing, I've no idea, but I suspect it doesn't support UTF-8.
Although I've tried to explain here what's going on underneath, the bottom line is that you shouldn't be using FileWriter
to write anything other than characters.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments