I have been trying to obtain a JSON from a URL with a typical method such as:
import urllib.request, json
with urllib.request.urlopen("my_url") as url:
data = json.loads(url.read().decode())
Nevertheless this fails with JSONDecodeError
since there is a control character inside some bracket
{..."\tvalue"...}
I DID modify my source data to not include control characters (something I might not always be able to do), and nevertheless python keeps saying the control character is there.
I decided to take my url response into a string, and there replace inner control characters
my_str = url.read()
my_str = my_str.replace('"\\t','"')
But in this way, the special characters throughout the JSON file get replaced in odd ways: {...sábado...}
becomes {...sa\cxx\c1bado}
or something like that.
How can I sanitize my JSON input from control characters without destroying my special characters?
EDIT:
Sorry, forgot to mention something given the first answer:
I did try adding strict=False
, but then my JSON went... well, nuts. The double quotes became single quotes, and some of them would dissapear, so when I would print it, I got something like:
{
'some_key':'some_value',
'another_key':'another_value_without_closing_quote,
a_key_without_opening_quote': 'value'
}
Not a single clue of why was that the case.
I ended up solving my predicament by first reading the original JSON into a string. There, I would make the string go through a sanitizing method that would remove escape characters and would replace 'damaged' characters for their original special characters.
この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。
侵害の場合は、連絡してください[email protected]
コメントを追加