如何通过nltk.pos_tag()函数使用通用POS标签?

背后

我有一段文字,我想找到数量的“ ADJ”,“ PRON”,“ VERB”,“名词”等。我知道有.pos_tag()功能,但是它给我不同的结果,并且我希望得到的结果为“ ADJ” ','PRON','VERB','NOUN'。这是我的代码:

import nltk
from nltk.corpus import state_union, brown
from nltk.corpus import stopwords
from nltk import ne_chunk

from nltk.tokenize import PunktSentenceTokenizer
from nltk.tokenize import word_tokenize
from nltk.tokenize import RegexpTokenizer
from nltk.stem import WordNetLemmatizer 

from collections import Counter

sentence = "this is my sample text that I want to analyze with programming language"

# tokenizing text (make list with evey word)
sample_tokenization = word_tokenize(sample)
print("THIS IS TOKENIZED SAMPLE TEXT, LIST OF WORDS:\n\n", sample_tokenization)
print()

# tagging words
taged_words = nltk.pos_tag(sample_tokenization.split(' '))
print(taged_words)
print()


# showing the count of every type of word for new text
count_of_word_type = Counter(word_type for word,word_type in taged_words)
count_of_word_type_list = count_of_word_type.most_common() # making a list of tuples counts
print(count_of_word_type_list)


for w_type, num in count_of_word_type_list:
     print(w_type, num)
print() 

上面的代码有效,但是我想找到一种获取此类标签的方法:

Tag Meaning English Examples
ADJ adjective   new, good, high, special, big, local
ADP adposition  on, of, at, with, by, into, under
ADV adverb  really, already, still, early, now
CONJ    conjunction and, or, but, if, while, although
DET determiner, article the, a, some, most, every, no, which
NOUN    noun    year, home, costs, time, Africa
NUM numeral twenty-four, fourth, 1991, 14:24
PRT particle    at, on, out, over per, that, up, with
PRON    pronoun he, their, her, its, my, I, us
VERB    verb    is, say, told, given, playing, would
.   punctuation marks   . , ; !
X   other   ersatz, esprit, dunno, gr8, univeristy

我看到这里有一章:https : //www.nltk.org/book/ch05.html

说的是:

from nltk.corpus import brown
brown_news_tagged = brown.tagged_words(categories='news', tagset='universal')

但是我不知道如何将其应用于我的例句。谢谢你的帮助。

睡觉

来自https://github.com/nltk/nltk/blob/develop/nltk/tag/ init .py#L135

>>> from nltk.tag import pos_tag
>>> from nltk.tokenize import word_tokenize

# Default Penntreebank tagset.
>>> pos_tag(word_tokenize("John's big idea isn't all that bad."))
[('John', 'NNP'), ("'s", 'POS'), ('big', 'JJ'), ('idea', 'NN'), ('is', 'VBZ'),
("n't", 'RB'), ('all', 'PDT'), ('that', 'DT'), ('bad', 'JJ'), ('.', '.')]

# Universal POS tags.
>>> pos_tag(word_tokenize("John's big idea isn't all that bad."), tagset='universal')
[('John', 'NOUN'), ("'s", 'PRT'), ('big', 'ADJ'), ('idea', 'NOUN'), ('is', 'VERB'),
("n't", 'ADV'), ('all', 'DET'), ('that', 'DET'), ('bad', 'ADJ'), ('.', '.')]

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章

来自分类Dev

如何从nltk pos_tag获取标签集?

来自分类Dev

如何使用NLTK pos_tag()提取名词?

来自分类Dev

如何使用NLTK pos_tag()提取名词?

来自分类Dev

NLTK 3 POS_TAG引发UnicodeDecodeError

来自分类Dev

NLTK:lemmatizer和pos_tag

来自分类Dev

Python NLTK pos_tag引发URLError

来自分类Dev

How to extract nouns using NLTK pos_tag()?

来自分类Dev

NLTK v3.2:无法使用nltk.pos_tag()

来自分类Dev

如何在nltk中的斜杠前删除POS标签?

来自分类Dev

如何通过nltk python中的标签获取树中的节点?

来自分类Dev

如何通过nltk python中的标签获取树中的节点?

来自分类Dev

如何在pyspark数据帧上应用nltk.pos_tag

来自分类Dev

在python 3.4上使用nltk 3.0的pos标签中的编码错误

来自分类Dev

Python:将NLTK Stanford POS标签映射到WordNet POS标签

来自分类Dev

NLTK POS标签集帮助不起作用

来自分类Dev

打印带有删除形容词的 pos 标签 (NLTK)

来自分类Dev

如何使用NLTK RegexpParser Chunk为Python中的POS_tagged单词提取特殊字符

来自分类Dev

如何从POS标记单词列表中提取模式?NLTK

来自分类Dev

如何保存经过训练的 NLTK POS 标记器

来自分类Dev

如何在斯坦福 NER 培训中使用 pos 标签作为特征?

来自分类Dev

nltk pos tagger希望合并“。”。

来自分类Dev

使用nltk pos标记器时出现zip文件错误

来自分类Dev

使用NLTK的自定义POS标记(错误)

来自分类Dev

使用NLTK通过分块提取关系

来自分类Dev

如何使用ESC / POS命令打印图像?

来自分类Dev

如何在MigLayout中使用pos?

来自分类Dev

如何使用Pos机N8110?

来自分类Dev

如何在MigLayout中使用pos?

来自分类Dev

如何使用ESC / POS命令打印图像?

Related 相关文章

热门标签

归档