convert character set (German)

debugcn 에 게시 Dev

Tomas By

I have a text file that uses various characters in the 128+ range in currently non-standard ways. The file command just says Non-ISO extended-ASCII.

From the context I can recognise these:

Octal 201: u + unlaut
      204: a + umlaut
      216: A + umlaut
      224: o + umlaut
      341: double s

(There are many others, which I suspect are graphical symbols, not characters.)

Addition, example:

 example:   E0X A ANCIENT.IMG 2 0 C:\DOS\DISKOPT.EXE A: /O /Sa /M2
              ДВД В ДДВДДДДДДДД В Д ДДДДДДДВДДДДДДДДДД ДДДДДДДВДДДДД
           і  і   і         і          і                  і
     load E0X ДЩ  АДДДДДДДДДї   і          і                  і
                      і     і   і          і                  і
     with ANCIENT.IMG Щ     і   і          і                  і
                            і   і          і                  і
     for drive A: ДДДДДДДДДДЩ   і          і                  і
                                і          і                  і
     let DISKOPT work ДДДДДДДДДДіДДДДДДДДДДБДДДДДДДДДДДДДДДДДДЩ
                    і
     and write the result back to disk if finished.

(The graphical chars are octal 263, 277, 302, 304, 331.)

And here is the link to the file: e0x.arj. It is the E0X.ENG, but I guess it is the same encoding in all the text files.

Which character set is this, and how can I make it readable on a modern computer?

Francesco Potortì

Most probably the character positions you mention are octal numbers: 201 (which is customarily written as 0201 to make it clear it's octal) is decimal 129, or 0x81.

Those characters are consistent with several DOC codepages:

VGA codepage 437 (VGA ROM charset)
Codepage 437 (IBM-PC: default)
Codepage 775 (IBM-PC: Baltic)
Codepage 850 (IBM-PC: European)
Codepage 852 (IBM-PC: East European)
Codepage 857 (IBM-PC: Turkish)
Codepage 861 (IBM-PC: Icelandic)
Codepage 865 (IBM-PC: Nordic European)

If it's German, I'd bet that it's 437 or 850. Any editor should be able to read that text file and write it in a different character set.

For example you can read it with Notepad++ and write it in UTF-8 if you are sure you need that.

P.S. after reading the file that you attached, I can see that E0X.ENG charset is MS-DOS codepage 437. You can see it converted to utf-8 at https://pastebin.com/LdnQCpk4.

If you run on Linux, you can automate conversion with GNU recode. If you run on DOS, I see this recode utility https://docs.seneca.nl/Smartsite-Docs/Features-Modules/Features/Tools/Recode-commandline-utility.html should do the same

이 기사는 인터넷에서 수집됩니다. 재 인쇄 할 때 출처를 알려주십시오.

침해가 발생한 경우 연락 주시기 바랍니다[email protected] 삭제

에서 수정2021-06-18

몇 마디 만하겠습니다

0리뷰

로그인참여 후 검토

Related 관련 기사

기사

convert character set (German)

convert character set (German)

convert character linestring to geometry in sf

how to convert a character array to decibel

VIM - Bat to convert character encodings

Trying to Convert String to Character Pointer?

How to convert decimal value to character in c language

How to convert unicode to its original character in Python

Swift Convert Hex String or Character to Integer

How to convert a json string into an individual character array in bash?

Convert an array of character to string from index i to j in c++

Convert keycode to character using javascript,like keyCode 27 to esc

Delete weird ANSI character and convert accented ones using Python

How do I convert a TTF into individual PNG character images?

INSERT 값은 SET NAMES, SET CHARACTER SET로 인코딩됩니까?

How to replace set of repeating String pattern with specific character?

Change character set of downloaded file via response headers

mysql_query("SET NAMES 'UTF8'"); to solve '???' character

Is there a termination character for bq in interactive mode? How do I set it?

set caret position to last character in masked text box - winform?

GAE의 SQLAlchemy 'character_set_name'오류

Set if-condition for new line every 10th character

How to use regex to select a character outside of and after a set of quotations

Can't display german umlaut

In bash, how can I convert a Unicode Codepoint [0-9A-F] into a printable character?

Convert escape characters in strings (like "\\n" - two characters) into ASCII character (newline)

Inconsistent date time format for German locale

LibreOffice - getting German spell checking to work (Debian)

How to change directory to desktop on german ubuntu?

How to convert string to variable and set attributes of them in jquery

Python Convert set to list, keep getting TypeError: 'list' object is not callable