我正在使用nltk和python 2.7中的Tree-package,我想从树的祖父母节点中提取所有规则。我有下面的树
t = Tree('S', [Tree('NP', [Tree('D', ['the']), Tree('N', ['dog'])]), Tree('VP', [Tree('V', ['chased']), Tree('NP', [Tree('D', ['the']), Tree('N', ['cat'])])])])
和生产
t.productions
[S -> NP VP, NP -> D N, D -> 'the', N -> 'dog', VP -> V NP, V -> 'chased', NP -> D N, D -> 'the', N -> 'cat']
对于树:
S
________|_____
| VP
| _____|___
NP | NP
___|___ | ___|___
D N V D N
| | | | |
the dog chased the cat
我想要的是表格上的内容:
[S -> NP VP, S ^ NP -> D N, NP ^ D -> 'the', NP ^ N -> 'dog'.......]
我看过ParentedTree类,但是我不知道如何使用它来解决我的问题。
您需要修改/覆盖生产方法。
代码:
from nltk.tree import Tree
from nltk.compat import string_types
from nltk.grammar import Production, Nonterminal
from nltk.tree import _child_names
def productions(t, parent):
if not isinstance(t._label, string_types):
raise TypeError('Productions can only be generated from trees having node labels that are strings')
# t._label ==> parent + " ^ " + t._label
prods = [Production(Nonterminal(parent + " ^ " + t._label), _child_names(t))]
for child in t:
if isinstance(child, Tree):
prods += productions(child, t._label)
return prods
t = Tree('S', [Tree('NP', [Tree('D', ['the']), Tree('N', ['dog'])]), Tree('VP', [Tree('V', ['chased']), Tree('NP', [Tree('D', ['the']), Tree('N', ['cat'])])])])
# To Add Parent of 'S' as 'Start'
# prods = productions(t, "Start")
# To Skip Parent of 'S'
prods = [Production(Nonterminal(t._label), _child_names(t))]
for child in t:
if isinstance(child, Tree):
prods += productions(child, t._label)
print prods
输出:
[S -> NP VP, S ^ NP -> D N, NP ^ D -> 'the',
NP ^ N -> 'dog', S ^ VP -> V NP, VP ^ V -> 'chased',
VP ^ NP -> D N, NP ^ D -> 'the', NP ^ N -> 'cat']
欲了解更多信息检查productions
的方法nltk.tree
-在这里
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句