How to make the output from Text::CSV utf8?

H. Shindoh

I have a CSV file, say win.csv, whose text is encoded in windows-1252. First I use iconv to make it in utf8.

$iconv -o test.csv -f windows-1252 -t utf-8 win.csv

Then I read the converted CSV file with the following Perl script (utfcsv.pl).

#!/usr/bin/perl 
use utf8;
use Text::CSV;
use Encode::Detect::Detector;

my $csv = Text::CSV->new({ binary => 1, sep_char => ';',});
open my $fh, "<encoding(utf8)", "test.csv";

while (my $row = $csv->getline($fh)) { 
  my $line = join " ", @$row;
  my $enc = Encode::Detect::Detector::detect($line);
  print "($enc) $line\n";
}

$csv->eof || $csv->error_diag();
close $fh;
$csv->eol("\r\n");
exit;

Then the output is like the following.

(UFT-8) .........
() .....

Namely the encoding of all lines are detected as UTF-8 (or ASCII). But the actual output does not seem to be UTF-8. In fact, if I save the output on a file

$./utfcsv.pl > output.txt

then the encoding of output.txt is detected as windows-1252.

Question: How can I get the output text in UFT-8?

Notes:

  1. Environment: openSUSE 13.2 x86_64, perl 5.20.1
  2. I do not use Text::CSV::Encoded because the installation fails. (Because test.csv is converted in UTF-8, so it is strange to use Text::CSV::Encoded.)
  3. I use the following script to check the encoding. (I also use it to find out the encoding of the initial CSV file win.csv.)

.

#!/usr/bin/perl 
use Encode::Detect::Detector;
open my $in,  "<","$ARGV[0]" || die "open failed";
while (my $line = <$in>) {
  my $enc = Encode::Detect::Detector::detect($line);
  chomp $enc;
  if ($enc) {
    print "$enc\n";
  }
}
Borodin

You have set the encoding of the input file handle (which, by the way, should be <:encoding(utf8) -- note the colon) but you haven't specified the encoding of the output channel, so Perl will send unencoded character values to the output

The Unicode values for characters that will fit in a single byte -- Basic Latin (ASCII) between 0 and 0x7F, and Latin-1 Supplement between 0x80 and 0xFF -- are very similar to Windows code page 1252. In particular a small letter u with a diaresis is 0xFC in both Unicode and CP1252, so the text will look like CP1252 if it is output unencoded, instead of the two-byte sequence 0xC3 0xBC which is the same codepoint encoded in UTF-8

If you use binmode on STDOUT to set the encoding then the data will be output correctly, but it is simplest to use the open pragma like this

use open qw/ :std :encoding(utf-8) /;

which will set the encoding for STDIN, STDOUT and STDERR, as well as any newly-opened file handles. That means you don't have to specify it when you open the CSV file, and your code will look like this

Note that I have also added use strict and use warnings, which are essential in any Perl program. I have also used autodie to remove the need for checks on the status of all IO operations, and I have taken advantage of the way Perl interpolates arrays inside double quotes by putting a space between the elements which avoids the need for a join call

#!/usr/bin/perl

use utf8;
use strict;
use warnings 'all';
use open qw/ :std :encoding(utf-8) /;
use autodie;

use Text::CSV;

my $csv = Text::CSV->new({ binary => 1, sep_char => ';' });

open my $fh, '<', 'test.csv';

while ( my $row = $csv->getline($fh) ) {
    print "@$row\n";
}

close $fh;

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

vb.net How to make html or text file with encode in utf8

From Dev

perl output - failing in printing utf8 text files correctly

From Dev

Storing standard output from native app with Utf8 characters

From Dev

How to output utf-8 text in php

From Dev

How to get Colored Build Output from Make in Sublime Text 3?

From Dev

how to convert unicode text to utf8 text readable?

From Dev

csv import from unicode to utf8 phpmyadmin

From Dev

csv import from unicode to utf8 phpmyadmin

From Dev

How to output characters 195 and 192 into HTML UTF8?

From Dev

How to show utf8 text with snap and heist?

From Dev

How to Read/Write UTF8 text files in C?

From Dev

How do you convert UTF8 number into written text

From Dev

How to preserve UTF8 string from app to webserver in Python

From Dev

how to convert from unicode to utf8 in python?

From Dev

How to encode UTF8 from database with PHP

From Dev

How to get UTF8 from a hex variable?

From Dev

How to fix my csv output to make it useable?

From Dev

Convert text value in SQL Server from UTF8 to ISO 8859-1

From Dev

Download blob (text file) from Azure storage to memory as UTF8?

From Dev

With Data (not NSData), in fact how actually do you make a utf8 version of a jpeg?

From Dev

how do I make thunderbird send outgoing mails with utf8 charset?

From Dev

How to save text field input to table as utf8 encoded data automatically?

From Dev

How to save text field input to table as utf8 encoded data automatically?

From Dev

How do I check if a std::string, containing utf8 text, starts with an uppercase letter in Windows?

From Dev

How to create CSV output from JSON?

From Dev

How to remove a line break from CSV output

From Dev

How to make emacs accept UTF-8 from the keyboard

From Java

Excel to CSV with UTF8 encoding

From Dev

Unable to save UTF8 to CSV in Python

Related Related

  1. 1

    vb.net How to make html or text file with encode in utf8

  2. 2

    perl output - failing in printing utf8 text files correctly

  3. 3

    Storing standard output from native app with Utf8 characters

  4. 4

    How to output utf-8 text in php

  5. 5

    How to get Colored Build Output from Make in Sublime Text 3?

  6. 6

    how to convert unicode text to utf8 text readable?

  7. 7

    csv import from unicode to utf8 phpmyadmin

  8. 8

    csv import from unicode to utf8 phpmyadmin

  9. 9

    How to output characters 195 and 192 into HTML UTF8?

  10. 10

    How to show utf8 text with snap and heist?

  11. 11

    How to Read/Write UTF8 text files in C?

  12. 12

    How do you convert UTF8 number into written text

  13. 13

    How to preserve UTF8 string from app to webserver in Python

  14. 14

    how to convert from unicode to utf8 in python?

  15. 15

    How to encode UTF8 from database with PHP

  16. 16

    How to get UTF8 from a hex variable?

  17. 17

    How to fix my csv output to make it useable?

  18. 18

    Convert text value in SQL Server from UTF8 to ISO 8859-1

  19. 19

    Download blob (text file) from Azure storage to memory as UTF8?

  20. 20

    With Data (not NSData), in fact how actually do you make a utf8 version of a jpeg?

  21. 21

    how do I make thunderbird send outgoing mails with utf8 charset?

  22. 22

    How to save text field input to table as utf8 encoded data automatically?

  23. 23

    How to save text field input to table as utf8 encoded data automatically?

  24. 24

    How do I check if a std::string, containing utf8 text, starts with an uppercase letter in Windows?

  25. 25

    How to create CSV output from JSON?

  26. 26

    How to remove a line break from CSV output

  27. 27

    How to make emacs accept UTF-8 from the keyboard

  28. 28

    Excel to CSV with UTF8 encoding

  29. 29

    Unable to save UTF8 to CSV in Python

HotTag

Archive