How can I properly read the sequence of bytes from a hyper::client::Request and print it to the console as a UTF-8 string?

Robert Rossmann

I am exploring Rust and trying to make a simple HTTP request (using the hyper crate) and print the response body to the console. The response implements std::io::Read. Reading various documentation sources and basic tutorials, I have arrived at the following code, which I compile & execute using RUST_BACKTRACE=1 cargo run:

use hyper::client::Client;
use std::io::Read;

pub fn print_html(url: &str) {
    let client = Client::new();
    let req = client.get(url).send();

    match req {
        Ok(mut res) => {
            println!("{}", res.status);

            let mut body = String::new();

            match res.read_to_string(&mut body) {
                Ok(body) => println!("{:?}", body),
                Err(why) => panic!("String conversion failure: {:?}", why)
            }
        },
        Err(why) => panic!("{:?}", why)
    }
}

Expected:

A nice, human-readable HTML content of the body, as delivered by the HTTP server, is printed to the console.

Actual:

200 OK
thread '<main>' panicked at 'String conversion failure: Error { repr: Custom(Custom { kind: InvalidData, error: StringError("stream did not contain valid UTF-8") }) }', src/printer.rs:16
stack backtrace:
   1:        0x109e1faeb - std::sys::backtrace::tracing::imp::write::h3800f45f421043b8
   2:        0x109e21565 - std::panicking::default_hook::_$u7b$$u7b$closure$u7d$$u7d$::h0ef6c8db532f55dc
   3:        0x109e2119e - std::panicking::default_hook::hf3839060ccbb8764
   4:        0x109e177f7 - std::panicking::rust_panic_with_hook::h5dd7da6bb3d06020
   5:        0x109e21b26 - std::panicking::begin_panic::h9bf160aee246b9f6
   6:        0x109e18248 - std::panicking::begin_panic_fmt::haf08a9a70a097ee1
   7:        0x109d54378 - libplayground::printer::print_html::hff00c339aa28fde4
   8:        0x109d53d76 - playground::main::h0b7387c23270ba52
   9:        0x109e20d8d - std::panicking::try::call::hbbf4746cba890ca7
  10:        0x109e23fcb - __rust_try
  11:        0x109e23f65 - __rust_maybe_catch_panic
  12:        0x109e20bb1 - std::rt::lang_start::hbcefdc316c2fbd45
  13:        0x109d53da9 - main
error: Process didn't exit successfully: `target/debug/playground` (exit code: 101)

Thoughts

Since I received 200 OK from the server, I believe I have received a valid response from the server (I can also empirically prove this by doing the same request in a more familiar programming language). Therefore, the error must be caused by me incorrectly converting the byte sequence into an UTF-8 string.

Alternatives

I also attempted the following solution, which gets me to a point where I can print the bytes to the console as a series of hex strings, but I know that this is fundamentally wrong because a UTF-8 character can have 1-4 bytes. Therefore, attempting to convert individual bytes into UTF-8 characters in this example will work only for a very limited (255, to be exact) subset of UTF-8 characters.

use hyper::client::Client;
use std::io::Read;

pub fn print_html(url: &str) {
    let client = Client::new();
    let req = client.get(url).send();

    match req {
        Ok(res) => {
            println!("{}", res.status);

            for byte in res.bytes() {
                print!("{:x}", byte.unwrap());
            }
        },
        Err(why) => panic!("{:?}", why)
    }
}
malbarbo

We can confirm with the iconv command that the data returned from http://www.google.com is not valid UTF-8:

$ wget http://google.com -O page.html
$ iconv -f utf-8 page.html > /dev/null
iconv: illegal input sequence at position 5591

For some other urls (like http://www.reddit.com) the code works fine.

If we assume that the most part of the data is valid UTF-8, we can use String::from_utf8_lossy to workaround the problem:

pub fn print_html(url: &str) {
    let client = Client::new();
    let req = client.get(url).send();

    match req {
        Ok(mut res) => {
            println!("{}", res.status);

            let mut body = Vec::new();

            match res.read_to_end(&mut body) {
                Ok(_) => println!("{:?}", String::from_utf8_lossy(&*body)),
                Err(why) => panic!("String conversion failure: {:?}", why),
            }
        }
        Err(why) => panic!("{:?}", why),
    }
}

Note that that Read::read_to_string and Read::read_to_end return Ok with the number of read bytes on success, not the read data.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

How can I print a utf8 string in the console using nodejs

From Dev

In Jython, how can I create unicode string from UTF-8 byte sequence?

From Dev

How can I print how many bytes has been read to console with Assembly?

From Dev

How can i read client request?

From Dev

Can I mix UTF-16 conversion with UTF-8 conversion between bytes and string?

From Dev

How can I enable UTF-8 support in the Linux console?

From Dev

How can I read a file into bytes, and find a string of hex to match?

From Dev

Is it possible to print UTF-8 string with Boost and STL in windows console?

From Dev

How can I read Swift message from file properly?

From Dev

how to print the data from post request on console

From Dev

How can I print 4 bytes as an integer

From Dev

How can I print 4 bytes as an integer

From Dev

Python: How can I print bytes?

From Dev

How can i read bytes from a very heavy file? then store store them in a String e.g. .pdf .zip .xlsx files

From Dev

How to read collapsed UTF-8 string

From Dev

How do I detect that a string ends in the middle of a UTF-8 sequence?

From Dev

In Python 3.8.2, how do I convert a string that contains a '\uxxxx' sequence into utf-8?

From Dev

How can I remove the BOM from a UTF-8 file?

From Dev

How to print Persian and Arabic (utf-8) characters in Java console?

From Dev

How can I convert a String in ASCII(Unicode Escaped) to Unicode(UTF-8) if I am reading from a file?

From Dev

How can a 21 byte UTF-8 sequence come from just 5 characters?

From Java

How do I properly use std::string on UTF-8 in C++?

From Dev

How can I get the client IP from a request object with Restify?

From Dev

How can I request a client certificate only from a particular CA

From Dev

How can I read files from directory and send as JSON to client?

From Dev

How can I remove escape characters from string? UTF issue?

From Dev

How can I remove escape characters from string? UTF issue?

From Dev

Check if the bytes sequence is valid UTF-8 sequence in Javascript

From Java

How can I read a header from an http request in golang?

Related Related

  1. 1

    How can I print a utf8 string in the console using nodejs

  2. 2

    In Jython, how can I create unicode string from UTF-8 byte sequence?

  3. 3

    How can I print how many bytes has been read to console with Assembly?

  4. 4

    How can i read client request?

  5. 5

    Can I mix UTF-16 conversion with UTF-8 conversion between bytes and string?

  6. 6

    How can I enable UTF-8 support in the Linux console?

  7. 7

    How can I read a file into bytes, and find a string of hex to match?

  8. 8

    Is it possible to print UTF-8 string with Boost and STL in windows console?

  9. 9

    How can I read Swift message from file properly?

  10. 10

    how to print the data from post request on console

  11. 11

    How can I print 4 bytes as an integer

  12. 12

    How can I print 4 bytes as an integer

  13. 13

    Python: How can I print bytes?

  14. 14

    How can i read bytes from a very heavy file? then store store them in a String e.g. .pdf .zip .xlsx files

  15. 15

    How to read collapsed UTF-8 string

  16. 16

    How do I detect that a string ends in the middle of a UTF-8 sequence?

  17. 17

    In Python 3.8.2, how do I convert a string that contains a '\uxxxx' sequence into utf-8?

  18. 18

    How can I remove the BOM from a UTF-8 file?

  19. 19

    How to print Persian and Arabic (utf-8) characters in Java console?

  20. 20

    How can I convert a String in ASCII(Unicode Escaped) to Unicode(UTF-8) if I am reading from a file?

  21. 21

    How can a 21 byte UTF-8 sequence come from just 5 characters?

  22. 22

    How do I properly use std::string on UTF-8 in C++?

  23. 23

    How can I get the client IP from a request object with Restify?

  24. 24

    How can I request a client certificate only from a particular CA

  25. 25

    How can I read files from directory and send as JSON to client?

  26. 26

    How can I remove escape characters from string? UTF issue?

  27. 27

    How can I remove escape characters from string? UTF issue?

  28. 28

    Check if the bytes sequence is valid UTF-8 sequence in Javascript

  29. 29

    How can I read a header from an http request in golang?

HotTag

Archive