what is the character encoding used in eclipse vm arguement?

Senthilkumar Annadurai

We read an important parameter as vm argument and it is a path to a file. Now, users are using vm argument with some korean characters (folders have been named with korean characters) and the program started to break since the korean characters are read as question marks! The below experiment shows the technical situation.

I tried to debug a program in eclipse and in "Debug Configurations" under "arguments" tab in "VM arguments", I gave the input like this

-Dfilepath=D:\XXXX\카운터

But when I read it from the program like this

String filepath = System.getProperty("filepath");

I get the output with question marks like below.

D:\XXXX\???

I understand that eclipse debug GUI uses the right encoding (?) to display the right characters, But when the value is read in program it uses different encoding which is not able to read the characters properly.

what is the default encoding does java uses to read vm arguments supplied to it?

How to change the encoding in eclipse so that the program reads the characters properly ?

Beck Yang

My conclusion is the conversion depended on default encoding(Windows setting "Language for non-Unicode programs") Here is the program for testing:

package test;
import java.io.FileOutputStream;
public class Test {
public static void main(String[] args) throws Exception {
    StringBuilder sb = new StringBuilder();
    sb.append("[카운터] sysprop=[").append(System.getProperty("cenv"));
    if (args.length > 0) {
        sb.append("], cmd args=[").append(args[0]);
    }
    sb.append("], file.encoding=").append(System.getProperty("file.encoding"));
    FileOutputStream fout = new FileOutputStream("/testout");
    fout.write(sb.toString().getBytes("UTF-8"));
    fout.close();//write result to a file instead of System.out
    //Thread.sleep(10000);//For checking arguments using Process Explorer
}
}

Test1: "Language for non-Unicode programs" is Korean(Korea)

Exceute in command prompt: java -Dcenv=카운터 test.Test 카운터(Korean chars are correct when I verify the arguments using Process Explorer)

Result: [카운터] sysprop=[카운터], cmd args=[카운터], file.encoding=MS949

Test2: "Language for non-Unicode programs" is Chinese(Traditional, Taiwan)

Exceute in command prompt(paste from clipboard): java -Dcenv=카운터 test.Test 카운터(I cannot see Korean chars in command windows. However, Korean chars are correct when I verify the arguments using Process Explorer)

Result: [카운터] sysprop=[???], cmd args=[???], file.encoding=MS950

Test3: "Language for non-Unicode programs" is Chinese(Traditional, Taiwan)

Launch from Eclipse by setting Program arguments and VM arguments (The command line in Process Explorer is C:\pg\jdk160\bin\javaw.exe -agentlib:jdwp=transport=dt_socket,suspend=y,address=localhost:50672 -Dcenv=카운터 -Dfile.encoding=UTF-8 -classpath S:\ws\wtest\bin test.Test 카운터 This is the same as you see in the Properties dialog of Eclipse Debug view)

Result: [카운터] sysprop=[???], cmd args=[bin], file.encoding=UTF-8

Change the Korean chars to "碁石",which exist in MS950/MS949 charset:

  • Test1 Result: [碁石] sysprop=[碁石], cmd args=[碁石], file.encoding=MS949
  • Test2 Result: [碁石] sysprop=[碁石], cmd args=[碁石], file.encoding=MS950
  • Test3 Result: [碁石] sysprop=[碁石], cmd args=[碁石], file.encoding=UTF-8

Change the Korean chars to "鈥焢",which exist in MS950 charset:

  • Test1 Result: [鈥焢] sysprop=[??], cmd args=[??], file.encoding=MS949
  • Test2 Result: [鈥焢] sysprop=[鈥焢], cmd args=[鈥焢], file.encoding=MS950
  • Test3 Result: [鈥焢] sysprop=[鈥焢], cmd args=[鈥焢], file.encoding=UTF-8

Change the Korean chars to "宽广",which exist in GBK charset:

  • Test1 Result: [宽广] sysprop=[??], cmd args=[??], file.encoding=MS949
  • Test2 Result: [宽广] sysprop=[??], cmd args=[??], file.encoding=MS950
  • Test3 Result: [宽广] sysprop=[??], cmd args=[??], file.encoding=UTF-8
  • Test4: to verify my assumption, I change "Language for non-Unicode programs" to Chinese(Simplified, PRC) and exceute java -Dcenv=宽广 test.Test 宽广 in command prompt

    Result: [宽广] sysprop=[宽广], cmd args=[宽广], file.encoding=GBK

During testing, I always check the command line via Process Explorer, and make sure all chars are correct. However, the command argument chars are converted using default encoding before invoke main(String[] args) of Java class. If one of char does not exist in the charset of default encoding, the program will get unexpected argument.

I'm not sure the problem is caused by java.exe/javaw.exe or Windows. But passing non-ASCII parameter via command arguments is not a good idea.

BTW, I also try to execute the command via .bat file(file encoding is UTF-8). Maybe someone is interest,

Test5: "Language for non-Unicode programs" is Korean(Korea)

The command line in Process Explorer is java -Dcenv=移댁슫?? test.Test 移댁슫??(The Korean chars are collapsed)

Result: [카운터] sysprop=[移댁슫??], cmd args=[移댁슫??], file.encoding=MS949

Test6: "Language for non-Unicode programs" is Korean(Korea)

Add another VM arguments. The command line in Process Explorer is java -Dfile.encoding=UTF-8 -Dcenv=移댁슫?? test.Test 移댁슫??(The Korean chars are collapsed)

Result: [카운터] sysprop=[移댁슫??], cmd args=[移댁슫??], file.encoding=UTF-8

Test7: "Language for non-Unicode programs" is Chinese(Traditional, Taiwan)

The command line in Process Explorer is java -cp s:\ws\wtest\bin -Dcenv=儦渥?? test.Test 儦渥??(The Korean chars are collapsed)

Result: [카운터] sysprop=[儦渥??], cmd args=[儦渥??], file.encoding=MS950

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

what is the character encoding used in eclipse vm arguement?

From Dev

What is the default unicode character encoding used in Windows?

From Dev

What character encoding is >?

From Dev

What is the character encoding of this file?

From Dev

What is the default character encoding?

From Dev

What is nokogiri % encoding $ character

From Dev

What character encoding is >?

From Dev

What is the character encoding of this file?

From Dev

What is the default character encoding?

From Dev

What is .<format> arguement for?

From Dev

What is google's character encoding?

From Java

What is base 64 encoding used for?

From Dev

What is encoding used for SAML conversations?

From Java

Sybase ASE: show the character encoding used by the database

From Dev

Eclipse, error: unmappable character for encoding GBK

From Dev

Eclipse, error: unmappable character for encoding GBK

From Dev

What are the different character sets used for?

From Dev

What is the "Character Map" application used for?

From Dev

What is the default character encoding for JAX-RS?

From Dev

What is the iOS proper URL encoding of "+" character?

From Dev

What's the default character encoding for Windows in India?

From Dev

ASP classic: What determines character encoding for ADODB?

From Dev

What charsets should be used for url encoding

From Dev

If an XSD does not provide explicitly what encoding should be used, then what encoding should be used by default?

From Dev

What is MarkSelection in Eclipse, it will be used in what scenario?

From Dev

In Oracle SQL, what is a single pipe character ( '|' ) used for?

From Dev

New equivalent for old onPlatform method used in C# WITH Type Arguement

From Dev

What's the character encoding for string representations of URIs in Redland RDF?

From Dev

What character encoding or file format does NSKeyedArchiver use?

Related Related

  1. 1

    what is the character encoding used in eclipse vm arguement?

  2. 2

    What is the default unicode character encoding used in Windows?

  3. 3

    What character encoding is &gt;?

  4. 4

    What is the character encoding of this file?

  5. 5

    What is the default character encoding?

  6. 6

    What is nokogiri % encoding $ character

  7. 7

    What character encoding is &gt;?

  8. 8

    What is the character encoding of this file?

  9. 9

    What is the default character encoding?

  10. 10

    What is .<format> arguement for?

  11. 11

    What is google's character encoding?

  12. 12

    What is base 64 encoding used for?

  13. 13

    What is encoding used for SAML conversations?

  14. 14

    Sybase ASE: show the character encoding used by the database

  15. 15

    Eclipse, error: unmappable character for encoding GBK

  16. 16

    Eclipse, error: unmappable character for encoding GBK

  17. 17

    What are the different character sets used for?

  18. 18

    What is the "Character Map" application used for?

  19. 19

    What is the default character encoding for JAX-RS?

  20. 20

    What is the iOS proper URL encoding of "+" character?

  21. 21

    What's the default character encoding for Windows in India?

  22. 22

    ASP classic: What determines character encoding for ADODB?

  23. 23

    What charsets should be used for url encoding

  24. 24

    If an XSD does not provide explicitly what encoding should be used, then what encoding should be used by default?

  25. 25

    What is MarkSelection in Eclipse, it will be used in what scenario?

  26. 26

    In Oracle SQL, what is a single pipe character ( '|' ) used for?

  27. 27

    New equivalent for old onPlatform method used in C# WITH Type Arguement

  28. 28

    What's the character encoding for string representations of URIs in Redland RDF?

  29. 29

    What character encoding or file format does NSKeyedArchiver use?

HotTag

Archive