Stanford CoreNLP를 사용한 상호 참조 해결

debugcn 에 게시 Dev

tradt

저는 Stanford CoreNLP 툴킷을 처음 접했으며 뉴스 텍스트에서 상호 참조를 해결하는 프로젝트에 사용하려고합니다. Stanford CoreNLP 공동 참조 시스템을 사용하기 위해 일반적으로 토큰 화, 문장 분할, 품사 태깅, lemmarization, 명명 된 엔티티 인식 및 구문 분석이 필요한 파이프 라인을 생성합니다. 예를 들면 :

Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

// read some text in the text variable
String text = "As competition heats up in Spain's crowded bank market, Banco Exterior de Espana is seeking to shed its image of a state-owned bank and move into new activities.";

// create an empty Annotation just with the given text
Annotation document = new Annotation(text);

// run all Annotators on this text
pipeline.annotate(document);

그러면 다음과 같이 문장 주석을 쉽게 얻을 수 있습니다.

List<CoreMap> sentences = document.get(SentencesAnnotation.class);

그러나 전처리를 위해 다른 도구를 사용하고 있으며 독립형 상호 참조 확인 시스템이 필요합니다. 토큰을 만들고 트리 주석을 구문 분석하고 주석으로 설정하는 것은 매우 쉽습니다.

// create new annotation
Annotation annotation = new Annotation();

// create token annotations for each sentence from the input file
List<CoreLabel> tokens = new ArrayList<>();
for(int tokenCount = 0; tokenCount < parsedSentence.size(); tokenCount++) {

        ArrayList<String> parsedLine = parsedSentence.get(tokenCount);
        String word = parsedLine.get(1);
        String lemma = parsedLine.get(2);
        String posTag = parsedLine.get(3);
        String namedEntity = parsedLine.get(4); 
        String partOfParseTree = parsedLine.get(6);

        CoreLabel token = new CoreLabel();
        token.setWord(word);
        token.setWord(lemma);
        token.setTag(posTag);
        token.setNER(namedEntity);
        tokens.add(token);
    }

// set tokens annotations to annotation
annotation.set(TokensAnnotation.class, tokens);

// set parse tree annotations to annotation
Tree stanfordParseTree = Tree.valueOf(inputParseTree);
annotation.set(TreeAnnotation.class, stanfordParseTree);

그러나 문장 주석을 만드는 것은 매우 까다 롭습니다. 내 지식으로는 자세히 설명 할 문서가 없기 때문입니다. 문장 주석에 대한 데이터 구조를 만들고 주석으로 설정할 수 있습니다.

List<CoreMap> sentences = new ArrayList<CoreMap>();
annotation.set(SentencesAnnotation.class, sentences);

그렇게 어려울 수는 없지만 토큰 주석에서 문장 주석을 만드는 방법, 즉 ArrayList를 실제 문장 주석으로 채우는 방법에 대한 문서는 없습니다.

어떤 아이디어?

Btw, 처리 도구에서 제공하는 토큰 및 구문 분석 트리 주석을 사용하고 StanfordCoreNLP 파이프 라인에서 제공하는 문장 주석 만 사용하고 StanfordCoreNLP 독립 실행 형 상호 참조 해결 시스템을 적용하면 올바른 결과를 얻고 있습니다. 따라서 완전한 독립형 상호 참조 해결 시스템에서 누락 된 유일한 부분은 토큰 주석에서 문장 주석을 생성하는 기능입니다.

세바스찬 슈스터

이미 토큰 화 된 문장 목록이있는 경우 문서를 설정 하는 인수가 있는 Annotation 생성자 가 있습니다 List<CoreMap> sentences.

각 문장에 대해 CoreMap다음과 같이 개체 를 만들고자합니다 . (또한 각 문장과 토큰 객체에 각각 문장과 토큰 인덱스를 추가했습니다.)

int sentenceIdx = 1;
List<CoreMap> sentences = new ArrayList<CoreMap>();
for (parsedSentence : parsedSentences) {
    CoreMap sentence = new CoreLabel();
    List<CoreLabel> tokens = new ArrayList<>();
    for(int tokenCount = 0; tokenCount < parsedSentence.size(); tokenCount++) {

        ArrayList<String> parsedLine = parsedSentence.get(tokenCount);
        String word = parsedLine.get(1);
        String lemma = parsedLine.get(2);
        String posTag = parsedLine.get(3);
        String namedEntity = parsedLine.get(4); 
        String partOfParseTree = parsedLine.get(6);

        CoreLabel token = new CoreLabel();
        token.setWord(word);
        token.setLemma(lemma);
        token.setTag(posTag);
        token.setNER(namedEntity);
        token.setIndex(tokenCount + 1);
        tokens.add(token);
    }

    // set tokens annotations and id of sentence 
    sentence.set(TokensAnnotation.class, tokens);
    sentence.set(SentenceIndexAnnotation.class, sentenceIdx++);

    // set parse tree annotations to annotation
    Tree stanfordParseTree = Tree.valueOf(inputParseTree);
    sentence.set(TreeAnnotation.class, stanfordParseTree);

    // add sentence to list of sentences
    sentences.add(sentence);
}

그런 다음 목록으로 Annotation인스턴스를 만들 수 있습니다 sentences.

Annotation annotation = new Annotation(sentences);

이 기사는 인터넷에서 수집됩니다. 재 인쇄 할 때 출처를 알려주십시오.

침해가 발생한 경우 연락 주시기 바랍니다[email protected] 삭제

에서 수정2021-06-4

몇 마디 만하겠습니다

0리뷰

로그인참여 후 검토

Related 관련 기사

기사

Stanford CoreNLP를 사용한 상호 참조 해결

Stanford CoreNLP를 사용한 상호 참조 해결

Stanford CoreNLP에서 특정 명사 구에 대한 상호 참조를 해결하는 방법

Stanford OpenIE를 사용한 조항 분할

Stanford CoreNLP 토큰 이해 임의의 구문 일치를위한 Regex 구문

Python : 함수 내에서 스크립트 상단에 선언 한 변수를 참조하고 사용하려고 할 때 해결되지 않은 참조 (variableName)

한 셀의 참조를 사용하여 상대 참조 생성

단위 상호 참조 문제 해결

참조를 통해 constexpr 메서드 호출-결과가 상수 식입니까?

Stanford CoreNLP를 사용하여 URL을 인식하는 방법

CoreNLP CorefChainAnnotation.class에 의한 상호 참조 확인이 작동하지 않음

괄호를 사용한 C ++ 역 참조 (반복자 사용)

단어 : 상호 참조를 효율적으로 결합

이러한 "미해결"참조를 사용하는 코드가 잘 작동하더라도 Pycharm 미해결 참조

JPA를 사용한 관계 및 참조 무결성 제약

Stanford Parser 또는 Stanford CoreNLP를 사용하여 명사구의 문법적 관계를 찾는 방법

Stanford CoreNLP TokensRegex의 일치 토큰에서 결과 데이터를 가져 오는 데 사용되는 Annotation 클래스는 무엇입니까?

서버를 호출하는 동안 텍스트 한 조각이 아닌 Stanza (stanford corenlp 클라이언트)에 입력 파일을 제공하는 방법이 있습니까?

Jython과 함께 CoreNLP를 사용할 때 edu.stanford.nlp.util.ReflectionLoading $ ReflectionLoadingException

문서를 반복하고 VBA를 사용하여 상호 참조 삽입

XSLT : 하나의 키를 사용하여 두 번째 키를 상호 참조

MongoDB에서 참조를 해결하는 모범 사례?

nvcc 및 CUSP와 함께 IFORT를 사용하는 미해결 참조

상호 참조를 사용하여 단어에서 표의 번호 및 문자 번호 매기기

initializer_list를 사용한 모호한 과부하 해결

하나 이상의 참조를 사용하여 Spring에서 자동 연결이 가능하지만 Bean은 참조 중 하나만 사용하면됩니다.

defalt 인수를 사용할 때 오류 LNK2019 미해결 외부 기호가 _main 함수에서 참조 됨

LGHT0094 : 'Component : MsiFilesGroup'기호에 대한 해결되지 않은 참조

Any를 사용할 때 모호한 오버로드 된 메서드 참조

포인터 역 참조를 사용하여 생성 된 모호한 출력?

입력 상자 메서드를 사용하는 수식에 대한 참조