如何在urlopen中正确编码字符串?

乔什·约翰逊

问题:我有一个用俄语写的名字的文本文件。我从文本文件中获取每个名称,并以文本文件中的行作为页面标题来向Wikipidea发出请求。然后,我想获取有关该网站上所有现有图像的信息。

程序:

    with open('names-video.txt', "r", encoding='Windows-1251') as file:
            for line in file.readlines():
                print(line)
                name = "_".join(line.split())
                print(name)
                html = urlopen(f'https://ru.wikipedia.org/wiki/{name}')
                bs = BeautifulSoup(html, 'html.parser')
                images = bs.findAll('img', {'src': re.compile('.jpg')})

                print(images[0])

名称-video.txt

Алимпиев, Виктор Гелиевич 
Андреев, Алексей Викторович (художник)
Баевер, Антонина
Булдаков, Алексей Александрович
Жестков, Максим Евгеньевич
Канис, Полина Владимировна
Мустафин, Денис Рафаилович
Преображенский, Кирилл Александрович
Селезнёв, Владимир Викторович
Сяйлев, Андрей Фёдорович
Шерстюк, Татьяна Александровна

错误信息:

error from callback <bound method SocketHandler.handle_message of <amino.socket.SocketHandler object at 0x0000018B92600FA0>>: 'ascii' codec can't encode characters in position 10-17: ordinal not in range(128)
  File "C:\Users\1\Desktop\ИНФА\pycharm\venv\lib\site-packages\websocket\_app.py", line 344, in _callback
    callback(*args)
  File "C:\Users\1\Desktop\ИНФА\pycharm\venv\lib\site-packages\amino\socket.py", line 80, in handle_message
    self.client.handle_socket_message(data)
  File "C:\Users\1\Desktop\ИНФА\pycharm\venv\lib\site-packages\amino\client.py", line 345, in handle_socket_message
    return self.callbacks.resolve(data)
  File "C:\Users\1\Desktop\ИНФА\pycharm\venv\lib\site-packages\amino\socket.py", line 204, in resolve
    return self.methods.get(data["t"], self.default)(data)
  File "C:\Users\1\Desktop\ИНФА\pycharm\venv\lib\site-packages\amino\socket.py", line 192, in _resolve_chat_message
    return self.chat_methods.get(key, self.default)(data)
  File "C:\Users\1\Desktop\ИНФА\pycharm\venv\lib\site-packages\amino\socket.py", line 221, in on_text_message
    def on_text_message(self, data): self.call(getframe(0).f_code.co_name, objects.Event(data["o"]).Event)
  File "C:\Users\1\Desktop\ИНФА\pycharm\venv\lib\site-packages\amino\socket.py", line 209, in call
    handler(data)
  File "C:\Users\1\Desktop\python-bots\music_bot\bot.py", line 56, in on_text_message
    html = urlopen(f'https://ru.wikipedia.org/wiki/{name}')
  File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 214, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 517, in open
    response = self._open(req, data)
  File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 534, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 494, in _call_chain
    result = func(*args)
  File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 1385, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 1342, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1255, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1266, in _send_request
    self.putrequest(method, url, **skips)
  File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1104, in putrequest
    self._output(self._encode_request(request))
  File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1184, in _encode_request
    return request.encode('ascii')

问题:由于某种原因,代码在上中断urlopen()print(line)并且print(name)工作正常。这可能是什么问题?我已经尝试解决这个问题已经有一段时间了,感谢您提出的任何解决方案。

阿农·科沃德(Anon Coward)

您需要对非ASCII字符进行百分比编码,以使其成为正确的URI:

from urllib.parse import quote
...
        name = "_".join(line.split())
        # Percent encode the UTF-8 characters
        name = quote(name)
        print(name)
...

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章

来自分类Dev

如何在urlopen中正确编码字符串?

来自分类Dev

如何根据json文件正确编码字符串?

来自分类Dev

如何在 Ruby 中使用带引号的可打印编码正确解码字符串

来自分类Dev

编码字符串以在浏览器中正确显示

来自分类Dev

如何在Angularjs中编码字符串?

来自分类Dev

如何在Eclipse中编码字符串?

来自分类Dev

如何在Android中制作编码字符串?

来自分类Dev

如何在文件内容中编码字符串?

来自分类Dev

如何在SQL CHAR中编码字符串

来自分类Dev

在NodeJS中正确的字符串编码

来自分类Dev

Python-无法使它正确编码字符串

来自分类Dev

Java编码字符串无法正确转换

来自分类Dev

无法正确编码或解码字符串

来自分类Dev

如何在SonataAdminBundle中正确翻译字符串

来自分类Dev

如何在C中正确输入字符串

来自分类Dev

如何在C中正确返回字符串

来自分类Dev

如何在C中正确操作字符串

来自分类Dev

如何在JSF中正确显示字符串的输出?

来自分类Dev

如何在C中正确解析字符串

来自分类Dev

如何在字符串替换脚本中正确转义'='?

来自分类Dev

如何在C中正确使用存储的字符串

来自分类Dev

如何在Android中正确提取字符串

来自分类Dev

如何在UTF-8中从readline()编码字符串

来自分类Dev

如何通过套接字在Python 3.0上编码字符串并在Python 2.7上正确解码字符串

来自分类Dev

编码和解码字符串

来自分类Dev

URL编码字符串

来自分类Dev

编码字符串数组

来自分类Dev

PHP中的编码字符串

来自分类Dev

从strftime编码字符串