How to read a UTF-8 encoded list of string tokens into a vector?

z--

I have a UTF-8 encoded text file with one token per line. I would like to read it into a vector. This is on MSWindows, version 3.0.1. I understand that the default encoding is UTF-8, right?

I am looking for a code snippet like the ones on

http://www.mayin.org/ajayshah/KB/R/html/r4.html

from 'R by example'

http://www.mayin.org/ajayshah/KB/R/index.html

However they do not have a UTF-8 example, only ASCII.

IRTFM

You can either read it in with read.table() and then extract the column as a vector, or with scan().

 vect <- scan(file="path/to/file1.txt", what=character(0) )

You would not need to use UTF-8 as the encoding, since you know that it is the default, but there is the option of doing so:

vect <- scan(file="path/to/file1.txt", what=character(0), encoding="UTF-8" )

The NEWS file for R 3.0.0 said:

" o readLines() and scan() (and hence read.table()) in a UTF-8 locale now discard a UTF-8 byte-order-mark (BOM). Such BOMs are allowed but not recommended by the Unicode Standard: however Microsoft applications can produce them and so they are sometimes found on websites.

The encoding name "UTF-8-BOM" for a connection will ensure that a UTF-8 BOM is discarded. "

So perhaps the need for the encoding argument indicated either that you were in a nonUTF-8 locale and didn't tell us or that you were using an outdated R version?

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

How to convert wrongly encoded data to UTF-8?

From Dev

How to read an UTF-8 encoded file containing Chinese characters and output them correctly on console?

From Dev

How can I know if url-encoded string is UTF-8 or Latin-1 with PHP?

From Dev

How to decode an utf8 encoded string split in two buffers right in between a 4 byte long char?

From Dev

Split a UTF-8 encoded string on blank characters without knowing about UTF-8 encoding

From Dev

How to print UTF-8 encoded charecters in JSoup

From Dev

how to get utf8 encoded html content

From Dev

How to output a utf-8 string list as it is in python?

From Dev

How to embed utf8 encoded html into element

From Dev

How to deal with non UTF-8 encoded urls in express

From Dev

Getting UTF-8 encoded from US-ASCII encoded string

From Dev

Converting url encoded string(utf-8) to string in python?

From Dev

How do you convert a base64 utf-8 encoded string to a binary file from bash?

From Dev

std::string and UTF-8 encoded unicode

From Dev

Getting a utf-8 encoded string from a database, then displaying in a webview

From Dev

How to convert wrongly encoded data to UTF-8?

From Dev

How to decode to UTF-8 String from Hex encoded string

From Dev

How to read a string into a vector

From Dev

How to read right a utf-8 string in the serlvet?

From Dev

How to embed utf8 encoded html into element

From Dev

Getting UTF-8 encoded from US-ASCII encoded string

From Dev

How to convert a string encoded in utf16 to a string encoded in UTF-8?

From Dev

Byte array is a valid UTF8 encoded String in Java but not in Python

From Dev

nodejs UTF-8 encoded string has black question mark

From Dev

How to convert large UTF-8 encoded char* string to CStringW (UTF-16)?

From Dev

How to read collapsed UTF-8 string

From Dev

How to get a file list as utf8 encoded string into gnuplot?

From Dev

How to read a GBK-encoded file into a String?

From Dev

comparing a url containing utf-8 encoded string with a string

Related Related

  1. 1

    How to convert wrongly encoded data to UTF-8?

  2. 2

    How to read an UTF-8 encoded file containing Chinese characters and output them correctly on console?

  3. 3

    How can I know if url-encoded string is UTF-8 or Latin-1 with PHP?

  4. 4

    How to decode an utf8 encoded string split in two buffers right in between a 4 byte long char?

  5. 5

    Split a UTF-8 encoded string on blank characters without knowing about UTF-8 encoding

  6. 6

    How to print UTF-8 encoded charecters in JSoup

  7. 7

    how to get utf8 encoded html content

  8. 8

    How to output a utf-8 string list as it is in python?

  9. 9

    How to embed utf8 encoded html into element

  10. 10

    How to deal with non UTF-8 encoded urls in express

  11. 11

    Getting UTF-8 encoded from US-ASCII encoded string

  12. 12

    Converting url encoded string(utf-8) to string in python?

  13. 13

    How do you convert a base64 utf-8 encoded string to a binary file from bash?

  14. 14

    std::string and UTF-8 encoded unicode

  15. 15

    Getting a utf-8 encoded string from a database, then displaying in a webview

  16. 16

    How to convert wrongly encoded data to UTF-8?

  17. 17

    How to decode to UTF-8 String from Hex encoded string

  18. 18

    How to read a string into a vector

  19. 19

    How to read right a utf-8 string in the serlvet?

  20. 20

    How to embed utf8 encoded html into element

  21. 21

    Getting UTF-8 encoded from US-ASCII encoded string

  22. 22

    How to convert a string encoded in utf16 to a string encoded in UTF-8?

  23. 23

    Byte array is a valid UTF8 encoded String in Java but not in Python

  24. 24

    nodejs UTF-8 encoded string has black question mark

  25. 25

    How to convert large UTF-8 encoded char* string to CStringW (UTF-16)?

  26. 26

    How to read collapsed UTF-8 string

  27. 27

    How to get a file list as utf8 encoded string into gnuplot?

  28. 28

    How to read a GBK-encoded file into a String?

  29. 29

    comparing a url containing utf-8 encoded string with a string

HotTag

Archive