如何使用PosteRazor使用的文本编码查找文件的路径?

改变

PosteRazor使用明显过时的GUI,无法正确显示我的文件名:

PosteRazor-加载输入图像

为了方便起见,我希望能够通过从Nautilus复制并粘贴其路径来在PosteRazor中打开任何文件。这可以在其他应用程序中使用,但是可悲的是,PosteRazor无法理解路径:

PosteRazor-加载输入图像

如何将Nautilus生成的路径转换为与PosteRazor兼容的文本编码?

The Ubuntu package for PosteRazor lists a dependency on the Fast Light Toolkit (FLTK). Its programmer's documentation on Unicode looks like it might contain the necessary information to answer my question, but I'm not sure how to interpret it.

Details

  • Some sample content:

    • A path as it natively appears in Nautilus:

      /home/ak/café/north-america.jpg
      
    • The same path as it natively appears in PosteRazor:

      PosteRazor中显示的路径<code>/home/ak/café/north-america.jpg</ code>

    • The clipboard contents after copying the path from Nautilus:

      $ xclip -out -selection clipboard -target TARGETS
      TIMESTAMP
      TARGETS
      MULTIPLE
      x-special/gnome-copied-files
      text/uri-list
      UTF8_STRING
      COMPOUND_TEXT
      TEXT
      STRING
      text/plain;charset=utf-8
      text/plain
      $ xclip -out -selection clipboard -target STRING | hexdump -C
      00000000  2f 68 6f 6d 65 2f 61 6b  2f 63 61 66 e9 2f 6e 6f  |/home/ak/caf./no|
      00000010  72 74 68 2d 61 6d 65 72  69 63 61 2e 6a 70 67     |rth-america.jpg|
      0000001f
      $ xclip -out -selection clipboard -target UTF8_STRING | hexdump -C
      00000000  2f 68 6f 6d 65 2f 61 6b  2f 63 61 66 c3 a9 2f 6e  |/home/ak/caf../n|
      00000010  6f 72 74 68 2d 61 6d 65  72 69 63 61 2e 6a 70 67  |orth-america.jpg|
      00000020
      $ xclip -out -selection clipboard -target text/plain | hexdump -C
      00000000  2f 68 6f 6d 65 2f 61 6b  2f 63 61 66 5c 75 30 30  |/home/ak/caf\u00|
      00000010  65 39 2f 6e 6f 72 74 68  2d 61 6d 65 72 69 63 61  |e9/north-america|
      00000020  2e 6a 70 67                                       |.jpg|
      00000024
      $ xclip -out -selection clipboard -target 'text/plain;charset=utf-8' | hexdump -C
      00000000  2f 68 6f 6d 65 2f 61 6b  2f 63 61 66 c3 a9 2f 6e  |/home/ak/caf../n|
      00000010  6f 72 74 68 2d 61 6d 65  72 69 63 61 2e 6a 70 67  |orth-america.jpg|
      00000020
      
    • The clipboard contents after copying the path from PosteRazor:

      $ xclip -out -selection clipboard -target TARGETS
      STRING
      $ xclip -out -selection clipboard -target STRING | hexdump -C
      00000000  2f 68 6f 6d 65 2f 61 6b  2f 63 61 66 c3 a9 2f 6e  |/home/ak/caf../n|
      00000010  6f 72 74 68 2d 61 6d 65  72 69 63 61 2e 6a 70 67  |orth-america.jpg|
      00000020
      
    • PosteRazor after copying the path from Nautilus and pasting it into PosteRazor:

      具有“确定”按钮的PosteRazor变灰

    • PosteRazor after copying the path from PosteRazor and pasting it into PosteRazor:

      确定按钮处于活动状态的PosteRazor

    • The path copied from PosteRazor and pasted into Chromium:

      /home/ak/café/norrth-america.jpg
      
    • The path copied from PosteRazor and pasted into Chromium and then copied from Chromium and pasted back into PosteRazor:

      确定按钮处于活动状态的PosteRazor

    • The clipboard contents after copying that from Chromium:

      $ xclip -out -selection clipboard -target TARGETS
      TIMESTAMP
      TARGETS
      MULTIPLE
      SAVE_TARGETS
      COMPOUND_TEXT
      STRING
      TEXT
      UTF8_STRING
      text/plain
      $ xclip -out -selection clipboard -target STRING | hexdump -C
      00000000  2f 68 6f 6d 65 2f 61 6b  2f 63 61 66 c3 a9 2f 6e  |/home/ak/caf../n|
      00000010  6f 72 72 74 68 2d 61 6d  65 72 69 63 61 2e 6a 70  |orrth-america.jp|
      00000020  67                                                |g|
      00000021
      $ xclip -out -selection clipboard -target UTF8_STRING | hexdump -C
      00000000  2f 68 6f 6d 65 2f 61 6b  2f 63 61 66 c3 a9 2f 6e  |/home/ak/caf../n|
      00000010  6f 72 72 74 68 2d 61 6d  65 72 69 63 61 2e 6a 70  |orrth-america.jp|
      00000020  67                                                |g|
      00000021
      $ xclip -out -selection clipboard -target text/plain | hexdump -C
      00000000  2f 68 6f 6d 65 2f 61 6b  2f 63 61 66 c3 a9 2f 6e  |/home/ak/caf../n|
      00000010  6f 72 72 74 68 2d 61 6d  65 72 69 63 61 2e 6a 70  |orrth-america.jp|
      00000020  67                                                |g|
      00000021
      
    • The path copied from PosteRazor and pasted into GNOME Terminal:

      路径在GNOME终端中正确显示

    • The path copied from PosteRazor and pasted into GNOME Terminal and then copied from GNOME Terminal and pasted back into PosteRazor:

      具有“确定”按钮的PosteRazor变灰

    • The clipboard contents after copying that from GNOME Terminal:

      $ xclip -out -selection clipboard -target TARGETS
      TIMESTAMP
      TARGETS
      MULTIPLE
      SAVE_TARGETS
      UTF8_STRING
      COMPOUND_TEXT
      TEXT
      STRING
      text/plain;charset=utf-8
      text/plain
      $ xclip -out -selection clipboard -target STRING | hexdump -C
      00000000  2f 68 6f 6d 65 2f 61 6b  2f 63 61 66 e9 2f 6e 6f  |/home/ak/caf./no|
      00000010  72 74 68 2d 61 6d 65 72  69 63 61 2e 6a 70 67     |rth-america.jpg|
      0000001f
      $ xclip -out -selection clipboard -target UTF8_STRING | hexdump -C
      00000000  2f 68 6f 6d 65 2f 61 6b  2f 63 61 66 c3 a9 2f 6e  |/home/ak/caf../n|
      00000010  6f 72 74 68 2d 61 6d 65  72 69 63 61 2e 6a 70 67  |orth-america.jpg|
      00000020
      $ xclip -out -selection clipboard -target 'text/plain' | hexdump -C
      00000000  2f 68 6f 6d 65 2f 61 6b  2f 63 61 66 5c 75 30 30  |/home/ak/caf\u00|
      00000010  65 39 2f 6e 6f 72 74 68  2d 61 6d 65 72 69 63 61  |e9/north-america|
      00000020  2e 6a 70 67                                       |.jpg|
      00000024
      $ xclip -out -selection clipboard -target 'text/plain;charset=utf-8' | hexdump -C
      00000000  2f 68 6f 6d 65 2f 61 6b  2f 63 61 66 c3 a9 2f 6e  |/home/ak/caf../n|
      00000010  6f 72 74 68 2d 61 6d 65  72 69 63 61 2e 6a 70 67  |orth-america.jpg|
      00000020
      
green

Update: Following command can be used:

xclip -out -selection clipboard -target STRING | iconv --from-code ISO-8859-15 --to-code UTF-8 | xclip -in -selection clipboard

For explanation read the full answer.


To completely understand the answer, you need to have an understanding of Unicode code points and unicode encoding.

Below are short definitions and explanations of the required terms, but I recommend you read about them from the sources mentioned at the end of the answer.

  • Unicode Code Space: A range of integers from 0 to 10FFFF16.

  • Unicode Code Points: Any value in the Unicode codespace. A code point corresponds to a character, though not all code points are assigned to encoded characters.

  • UTF-8: UTF-8 (UCS Transformation Format - 8-bit) is a variable-width encoding that can represent every character in the Unicode character set. UCS stands for Universal Character Set.

    The first 128 characters (US-ASCII) need one byte. The next 1,920 characters need two bytes to encode. This covers the remainder of almost all Latin-derived alphabets, and also Greek, Cyrillic, Coptic, Armenian, Hebrew, Arabic, Syriac and Tāna alphabets, as well as Combining Diacritical Marks.

    This indicates that the character é which is causing problems takes two bytes to encode in UTF-8. We will verify it using some commands.

  • ISO/IEC 8859-15: 8-bit single-byte coded graphic character sets.


To test, I made a directory /home/green/Pictures/café/.

After copying the location from nautilus, the outputs of the commands were as follows:

Command #1:

$ xclip -out -selection clipboard -target STRING | hexdump -C
00000000  2f 68 6f 6d 65 2f 67 72  65 65 6e 2f 50 69 63 74  |/home/green/Pict|
00000010  75 72 65 73 2f 63 61 66  e9 2f                    |ures/caf./|
0000001a

Note that the encoding of café is 63 61 66 e9, which is all right as the Unicode Code Point U+00E9 represents {LATIN SMALL LETTER E WITH ACUTE} or é.

Command #2:

$ xclip -out -selection clipboard -target UTF8_STRING | hexdump -C
00000000  2f 68 6f 6d 65 2f 67 72  65 65 6e 2f 50 69 63 74  |/home/green/Pict|
00000010  75 72 65 73 2f 63 61 66  c3 a9 2f                 |ures/caf../|
0000001b

In the above output, café is encoded as 63 61 66 c3 a9. It is all right too because the UTF-8 encoding of code point U+00E9 (corresponding to é) is \xC3\xA9 (\x is used to represent that the following characters are hexadecimal numbers).

\xC3 represents 1 byte and so does \xA9. Thus, UTF-8 needs 2 bytes to represent é.

After copying the same text from PosteRazor the outputs of the commands were:

Command #1:

$ xclip -out -selection clipboard -target STRING | hexdump -C
00000000  2f 68 6f 6d 65 2f 67 72  65 65 6e 2f 50 69 63 74  |/home/green/Pict|
00000010  75 72 65 73 2f 63 61 66  c3 a9 2f                 |ures/caf../|
0000001b

Clearly, the Unicode Code Points are messed up. Now, we have two code points (c3 and a9) where there should be only one (e9).

Unsurprisingly, the two code points i.e. U+00C3 and U+00A9 stand for {LATIN CAPITAL LETTER A WITH TILDE} AND {COPYRIGHT SIGN}, which is what we saw in PosteRazor.

Command #2:

$ xclip -out -selection clipboard -target UTF8_STRING | hexdump -C
00000000  2f 68 6f 6d 65 2f 67 72  65 65 6e 2f 50 69 63 74  |/home/green/Pict|
00000010  75 72 65 73 2f 63 61 66  c3 a9 2f                 |ures/caf../|
0000001b

The output for this command seems to have remained unchanged, but there is a subtle difference.

In the previous output \xc3\xa9 formed a single character whereas now \xc3 forms one character on its own and \xa9 forms another character (which are à and ©, respectively).


Now we know what is happening, but how is it happening? To simulate the same thing, we will use Python. I'm using Python 3.3.0 here.

>>> import unicodedata
>>> a = u'/home/green/Pictures/café'
>>> a
'/home/green/Pictures/café'
>>> a = a.encode('utf-8')
>>> a
b'/home/green/Pictures/caf\xc3\xa9'
>>> a = a.decode('iso-8859-15')
>>> a
'/home/green/Pictures/café'
>>> a = a.encode('utf-8')
>>> a
b'/home/green/Pictures/caf\xc3\x83\xc2\xa9'

You can see that if we first encode the string using UTF-8 and then decode using ISO-8859-15, then we get the same string which we get while using PosteRazor.

Now, notice the following code. Here too, we have copied and pasted the location from nautilus:

>>> z = u'/home/green/Pictures/café'
>>> z
'/home/green/Pictures/café'
>>> z = z.encode('iso-8859-15')
>>> z
b'/home/green/Pictures/caf\xe9'
>>> z = z.decode('iso-8859-15')
>>> z
'/home/green/Pictures/café'

Had we encoded the string using ISO-8859-15 initially, we'd have gotten the perfect result.

Note that \xe9 is the encoding for é in ISO-8859-15, which apparently needs one byte. This is the same as the Unicode code point U+00E9 which, when encoded in UTF-8, needs 2 bytes and is represented by \xc3\xa9.

现在我们知道发生了什么事情以及如何进行,我们如何纠正它?好了,您可以将路径转换为ISO-8859-15字符集,也可以仅使用GUI选择文件。


资料来源和更多信息:

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章

来自分类Dev

如何使用python查找文件中最频繁出现的单词对?

来自分类Dev

使用os.walk查找文件路径

来自分类Dev

使用Java查找文件

来自分类Dev

C ++如何使用带有多个通配符的路径查找文件

来自分类Dev

使用-perm查找文件并更改权限

来自分类Dev

如何使用CSV查找文件替换数据集中的列名

来自分类Dev

使用CRT例程查找文件大小

来自分类Dev

如何使用os.walk查找文件的路径?

来自分类Dev

使用-perm查找文件并更改权限

来自分类Dev

查找文件并使用Shell脚本打包

来自分类Dev

如何使用Powershell查找文件夹的大小?

来自分类Dev

如何使用特定的ACL查找文件/目录

来自分类Dev

如何使用XP cmd目录查找文件?

来自分类Dev

如何根据文件名查找文件的路径?

来自分类Dev

如何使用jQuery在输入元素html中查找文件

来自分类Dev

无法使用FileReader查找文件

来自分类Dev

如何使用find命令在Unix中查找文件?

来自分类Dev

如何使用终端在Linux中的Windows上查找文件

来自分类Dev

如何使用范围参数查找文件?

来自分类Dev

如何使用定位查找文件的出现

来自分类Dev

如何使用正则表达式查找文件夹的真实路径

来自分类Dev

如何从路径列表中使用正则表达式查找文件或文件夹

来自分类Dev

使用 /\.c/ 查找文件

来自分类Dev

vba 使用增量查找文件路径?

来自分类Dev

使用 find 查找文件

来自分类Dev

从终端查找文件的路径

来自分类Dev

如何使用powershell查找文件大小

来自分类Dev

VBScript 如何查找文件路径?

来自分类Dev

如何使用CMake遞歸查找文件的路徑?