시리즈를 호출 할 때 Python, Pandas 및 NLTK 유형 오류 'int'개체를 호출 할 수 없습니다.

debugcn 에 게시 Dev

WaywardLerka

데이터 프레임에 포함 된 각 트윗 내의 용어에 대한 단어 빈도를 얻으려고합니다 . 이것은 내 코드입니다.

import pandas as pd
import numpy as np
import nltk
import string
import collections
from collections import Counter
nltk.download('stopwords')
sw= set(nltk.corpus.stopwords.words ('english'))
punctuation = set (string.punctuation)
data= pd.read_csv('~/Desktop/tweets.csv.zip', compression='zip')

print (data.columns)
print(data.text)
data['text'] = [str.lower () for str in data.text if str.lower () not in sw and str.lower () not in punctuation] 
print(data.text)
data["text"] = data["text"].str.split()
data['text'] = data['text'].apply(lambda x: [item for item in x if item not in sw])
print(data.text)
data['text'] = data.text.astype(str)
print(type(data.text))
tweets=data.text

data['words']= tweets.apply(nltk.FreqDist(tweets))
print(data.words)

그리고 이것은 내 오류와 역 추적입니다.

이름 : 텍스트, 길이 : 14640, dtype : 개체 추적 (가장 최근 호출 마지막) :

runfile ( 'C : /Users/leska/.spyder-py3/untitled1.py', wdir = 'C : /Users/leska/.spyder-py3')의 파일 "", 줄 1

파일 "C : \ Users \ leska \ Anaconda3 \ lib \ site-packages \ spyder_kernels \ customize \ spydercustomize.py", 827 행, 실행 파일 execfile (filename, namespace)

파일 "C : \ Users \ leska \ Anaconda3 \ lib \ site-packages \ spyder_kernels \ customize \ spydercustomize.py", 110 행, 실행 파일 exec (compile (f.read (), filename, 'exec'), namespace)

파일 "C : /Users/leska/.spyder-py3/untitled1.py", 30 행, 데이터 [ 'words'] = tweets.apply (nltk.FreqDist (tweets))

파일 "C : \ Users \ leska \ Anaconda3 \ lib \ site-packages \ pandas \ core \ series.py", 4018 행, apply return self.aggregate (func, * args, ** kwds)

파일 "C : \ Users \ leska \ Anaconda3 \ lib \ site-packages \ pandas \ core \ series.py", 3883 행, 집계 결과, how = self._aggregate (func, * args, ** kwargs)

파일 "C : \ Users \ leska \ Anaconda3 \ lib \ site-packages \ pandas \ core \ base.py", 506 행, _aggregate 결과 = _agg (arg, _agg_1dim)

파일 "C : \ Users \ leska \ Anaconda3 \ lib \ site-packages \ pandas \ core \ base.py", 456 행, _agg 결과 [fname] = func (fname, agg_how)

파일 "C : \ Users \ leska \ Anaconda3 \ lib \ site-packages \ pandas \ core \ base.py", 라인 440, _agg_1dim return colg.aggregate (how, _level = (_ level or 0) + 1)

파일 "C : \ Users \ leska \ Anaconda3 \ lib \ site-packages \ pandas \ core \ series.py", 3902 행, 집계 결과 = func (self, * args, ** kwargs)

TypeError : 'int'개체를 호출 할 수 없습니다.

data.text 유형이 Pandas 시리즈임을 확인했습니다.

나는 토큰 화를 사용하고 단어 수를 얻기 위해 단어 목록을 만드는 작업을 수행했다고 생각했던 솔루션을 이전에 시도했지만 각각이 아닌 모든 트위트 에 대한 빈도 분포가 발생했습니다 . 이것은 이전 질문을 기반으로 시도한 코드입니다.

import pandas as pd
import numpy as np
import nltk
import string
import collections
from collections import Counter
nltk.download('stopwords')
sw= set(nltk.corpus.stopwords.words ('english'))
punctuation = set (string.punctuation)
data= pd.read_csv('~/Desktop/tweets.csv.zip', compression='zip')

print (data.columns)
print (len(data.tweet_id))
tweets = data.text
test = pd.DataFrame(data)
test.column = ["text"]
# Exclude stopwords with Python's list comprehension and pandas.DataFrame.apply.
test['tweet_without_stopwords'] = test['text'].apply(lambda x: ' '.join([word for word in x.split() if word not in (sw) and word for word in x.split() if word not in punctuation]))
print(test)
chirps = test.text
splitwords = [ nltk.word_tokenize( str(c) ) for c in chirps ]
allWords = []
for wordList in splitwords:
    allWords += wordList
allWords_clean = [w.lower () for w in allWords if w.lower () not in sw and w.lower () not in punctuation]   
tweets2 = pd.Series(allWords)

words = nltk.FreqDist(tweets2)

나는 정말로 용어가 필요하고 각 트윗 에 대해 계산 하며 내가 뭘 잘못하고 있는지에 대해 난처합니다.

Redowan Delowar

첫 번째 코드 스 니펫에서 열에 함수를 적용한 방식이 문제의 원인입니다.

# this line caused the problem
data['words']= tweets.apply(nltk.FreqDist(tweets))

트윗을 정리 한 후이 간단한 데이터 프레임을 얻고 nltk.FreqDist각 트윗의 단어 빈도를 계산하는 데 적용하려고한다고 가정 해 보겠습니다 . 이 함수는 모든 콜 러블을받습니다.

import pandas as pd

df = pd.DataFrame(
    {
        "tweets": [
            "Hello world",
            "I am the abominable snowman",
            "I did not copy this text",
        ]
    }
)

데이터 프레임은 다음과 같습니다.

|    | tweets                      |
|---:|:----------------------------|
|  0 | Hello world                 |
|  1 | I am the abominable snowman |
|  2 | I did not copy this text    |

이제 여기 세 문장 각각에서 단어 빈도를 알아 봅시다.

import nltk

# define the fdist function
def find_fdist(sentence):
    tokens = nltk.tokenize.word_tokenize(sentence)
    fdist = FreqDist(tokens)

    return dict(fdist)

# apply the function on `tweets` column
df["words"] = df["tweets"].apply(find_fdist)

결과 데이터 프레임은 다음과 같아야합니다.

|    | tweets                      | words                                                         |
|---:|:----------------------------|:--------------------------------------------------------------|
|  0 | Hello world                 | {'Hello': 1, 'world': 1}                                      |
|  1 | I am the abominable snowman | {'I': 1, 'am': 1, 'the': 1, 'abominable': 1, 'snowman': 1}    |
|  2 | I did not copy this text    | {'I': 1, 'did': 1, 'not': 1, 'copy': 1, 'this': 1, 'text': 1} |

이 기사는 인터넷에서 수집됩니다. 재 인쇄 할 때 출처를 알려주십시오.

침해가 발생한 경우 연락 주시기 바랍니다[email protected] 삭제

에서 수정2021-04-2

몇 마디 만하겠습니다

0리뷰

로그인참여 후 검토

Related 관련 기사

기사

시리즈를 호출 할 때 Python, Pandas 및 NLTK 유형 오류 'int'개체를 호출 할 수 없습니다.

시리즈를 호출 할 때 Python, Pandas 및 NLTK 유형 오류 'int'개체를 호출 할 수 없습니다.

유형 오류 : Int 개체를 호출 할 수 없습니다.

유형 오류 : 'bool'개체를 호출 할 수 없습니다.

Python NLTK 'LazyCorpusLoader'개체를 호출 할 수 없습니다.

Python 오류 : TypeError : 'list'개체를 호출 할 수 없습니다.

pandas- 'int'개체를 호출 할 수 없습니다.

유형 오류 : '_curses.curses window'개체를 호출 할 수 없습니다.

Python Script TypeError : 'int'개체를 호출 할 수 없습니다.

Python setter TypeError : 'int'개체를 호출 할 수 없습니다.

min을 계산할 때 'int'개체를 호출 할 수 없습니다.

min을 계산할 때 'int'개체를 호출 할 수 없습니다.

Python 3.x에서 오류 발생 : TypeError : 'int'개체를 호출 할 수 없습니다.

작업을 호출 할 때 ReturnToAction에서 표현식 유형 오류를 변환 할 수 없습니다.

Python 프로그램 오류 '목록'개체를 호출 할 수 없습니다.

"String [] 배열 유형에서 set (int, String)를 호출 할 수 없습니다."오류가 발생합니다.

sympy 오류 'Symbol'개체를 호출 할 수 없습니다.

django 1.8 오류 : 'NoneType'개체를 호출 할 수 없습니다.

Django 1.10 오류, 'NoneType'개체를 호출 할 수 없습니다.

TypeError : print를 호출 할 때 'str'개체를 호출 할 수 없습니다.

배열 유형 int [] []에서 splice (int, int)를 호출 할 수 없습니다.

오류 : logmmse를 사용할 때 '모듈'개체를 호출 할 수 없습니다.

TypeError : 'Int64Index'개체를 호출 할 수 없습니다.

Java 오류 : 기본 유형 double에서 size ()를 호출 할 수 없습니다.

Int 개체를 호출 할 수 없음 (순열 및 조합)

원시 유형 int에서 equals (String)를 호출 할 수 없습니다.

TypeError : Pandas에서 열을 할당하려고 할 때 'RangeIndex'개체를 호출 할 수 없습니다.

zip () 함수를 사용할 때 다음 오류를 어떻게 수정할 수 있습니까? TypeError : '목록'개체를 호출 할 수 없습니다.

Python / Pygame : TypeError : '모듈'개체를 호출 할 수 없습니다.

Python-TypeError : 'list'개체를 호출 할 수 없습니다.

Python TypeError : '모듈'개체를 호출 할 수 없습니다.