当我处理字符串时，为什么会得到“预期的字符串或缓冲区”？

debugcn 发表于 Dev

蒙大拿伯尔

我正在使用 Python 的 re.sub 函数。它抛出一个类型错误：“预期的字符串或缓冲区。” 在调试并添加大量断言语句以检查我是否将字符串传递给 re.sub 之后，我仍然不确定为什么我会收到异常。下面，请参阅：我的代码、错误堆栈以及我仔细阅读的其他相关问题。

import json
import re
import string
def readFile(filename):
    p = re.compile('[1-9]*[1-9]')
    def n2w(_string):
        isInt = True
        stringToReturn = ""
        try:
            stringToReturn = num2words(int(_string))
        except:
            stringToReturn = _string
        assert isinstance(stringToReturn,str)
        return stringToReturn
    def convertNumbersToWords(_string):
        #Error: expected string?
        assert isinstance(_string,str)
        _string_copy = p.sub(_string,n2w)
        return _string_copy
    questions = []
    articleTitles = []
    articleTexts = []
    answers = [] # Stores questions and article titles and article contents and their associated answers, which are stored as strings.
    # I can access the questions by using [:,0]
    #TODO: Find a way to store questions and article content as keys.
    # TODO: Convert unicode to string.
    #NOTE: I use questions_answers rather than articleTitles_answers because articles can have multiple answers.
    with open(filename) as file:
        data = json.load(file)
        articles = data["data"]
        # Iterate through articles, looking for question/answer pairs.
        for article in articles:
            article_title = str(article["title"].encode('utf-8','replace')) # Converts Unicode object to string.
            article_paragraphs = article["paragraphs"]

            article_text = "".join([str(paragraph["context"].encode('ascii','replace')) for paragraph in article_paragraphs])
            if (len(article_paragraphs) == 0):
                print("O")
            for paragraph in article_paragraphs:
                qas_pairs = paragraph["qas"]
                # Check if this paragraph has questions.
                if (len(qas_pairs) == 0):
                    print("O")
                for qas_pair in qas_pairs:
                    # Note: There's another attribute called "context", which may come in handy.
                    answer = qas_pair["answers"][0]
                    answer_text = str(answer["text"].encode('ascii','replace')) # Converts Unicode object to string.
                    # Get where to find the answers.
                    #answer_start = answer["answer_start"]
                    #answer_end = answer_start + len(answer_text) - 1
                    question = str(qas_pair["question"].encode('ascii','replace'))
                    # Replace numeric characters with English words.
                    question = convertNumbersToWords(question)
                    answer_text = convertNumbersToWords(answer_text)
                    article_title = convertNumbersToWords(article_title)
                    article_text = convertNumbersToWords(article_text)
                    # Remove special characters.
                    from string import punctuation
                    question = question.strip(punctuation)
                    answer_text = answer_text.strip(punctuation)
                    article_title = article_title.strip(punctuation)
                    article_text = article_text.strip(punctuation)
                    questions.append(question)
                    articleTitles.append(article_title)
                    articleTexts.append(article_text)
                    answers.append(answer_text)
    print("All done")
    extractedData = np.array(questions,articleTitles,articleTexts,answers)
    return extractedData

-------------------------------------------------- ------------------------- TypeError Traceback (最近一次调用最后一次) in () ----> 1 trainingData = readFile("train-v1 .1.json") 2 from sys import getsizeof 3 print("完成加载训练数据。") 4 print("训练数据的大小：",getsizeof(trainingData))

in readFile(filename) 51 question = str(qas_pair["question"].encode('ascii','replace')) 52 # 用英文单词替换数字字符。---> 53 问题 = convertNumbersToWords(question) 54 answer_text = convertNumbersToWords(answer_text) 55 article_title = convertNumbersToWords(article_title)

在 convertNumbersToWords(_string) 16 #Error: 预期的字符串？17 断言 isinstance(_string,str) ---> 18 _string_copy = p.sub(_string,n2w) 19 返回 _string_copy 20 个问题 = []

类型错误：预期的字符串或缓冲区

其他问题

TypeError：预期的字符串或缓冲区 TypeError：在 python re.search 中使用正则表达式时预期的字符串或缓冲区错误 TypeError：预期的字符串或缓冲区

这些问题专门针对正则表达式函数接收字符串的情况；因为我已经做了大量工作以确保这是真的，所以我觉得这些问题无关紧要。

贾斯汀以西结

对于初学者，您可能想要更改_string_copy = p.sub(_string,n2w)为_string_copy = p.sub(n2w,_string). 此外，如果您还可以提供 JSON 文件的示例，那将会有所帮助。然后，虽然不确定你想要什么，你可以考虑extractedData = np.array(questions,articleTitles,articleTexts,answers)改为extractedData = np.array([questions,articleTitles,articleTexts,answers])

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-07-24

我来说两句

0条评论

登录后参与评论

来自分类Dev

Related 相关文章

文章

当我处理字符串时，为什么会得到“预期的字符串或缓冲区”？

当我处理字符串时，为什么会得到“预期的字符串或缓冲区”？

当我尝试添加拆分后的字符串时，为什么会得到空字符串？

Python TypeError：预期的字符串或缓冲区

JSON TypeError：预期的字符串或缓冲区

Pyglet-TypeError：预期的字符串或缓冲区

TypeError：预期的字符串或缓冲区 Python

Django TypeError“预期的字符串或缓冲区”

ElasticSearch：TypeError：预期的字符串或缓冲区？

类型错误：预期的字符串或缓冲区

缓冲区到字符串？

字符串的循环缓冲区

为什么strstr无法从环形缓冲区中找到子字符串？

为什么printf没有在缓冲区溢出中打印出字符串？

为什么更改缓冲区比在Javascript中追加字符串要慢？

字符串/缓冲区中的EOF文件字符

字符串缓冲区不打印“字符”

使用正则表达式过滤列表时出现“ TypeError：预期的字符串或缓冲区”

为什么我们使用char *作为缓冲区，为什么不使用boost :: asio中的字符串呢？

python-TypeError：预期的字符串或SQL查询上的缓冲区

预期的字符串或缓冲区（在re.sub中）

malayalam中的模式匹配使TypeError：预期的字符串或缓冲区

预期的字符串或缓冲区，date_re.match（value）django错误

word_tokenize TypeError：预期的字符串或缓冲区

使用Beautiful Soup的“预期字符串或缓冲区”错误

Python2.7.11：TypeError：预期的字符串或缓冲区= re.findall

TypeError：Google App Engine的Python中预期的字符串或缓冲区

预期的字符串或缓冲区（在re.sub中）

预期的字符串或缓冲区：私人消息应用程序

带有re和csv的Python“ TypeError：预期的字符串或缓冲区”

json加载异常预期字符串或python缓冲区