Python 3.5로 바이너리 파일에서 읽기

debugcn 에 게시 Dev

Lefbar

이 코드를 사용합니다.

from struct import Struct

import struct

def read_chunk(fmt, fileobj):

    chunk_struct = Struct(fmt)
    chunk = fileobj.read(chunk_struct.size)
    return chunk_struct.unpack(chunk)

def read_record(fileobj):
    author_id, len_author_name = read_chunk('ii', f)
    author_name, nu_of_publ = read_chunk(str(len_author_name)+'si', f) # 's' or 'c' ?
    record = {  'author_id': author_id,
                'author_name': author_name,
                'publications': [] }
    for pub in range(nu_of_publ):
        pub_id, len_pub_title = read_chunk('ii', f)
        pub_title, num_pub_auth = read_chunk(str(len_pub_title)+'si', f)
        record['publications'].append({
                'publication_id': pub_id,
                'publication_title': pub_title,
                'publication_authors': [] })
        for auth in range(num_pub_auth):
            len_pub_auth_name = read_chunk('i', f)
            pub_auth_name = read_chunk(str(len_pub_auth_name)+'s', f)
            record['publications']['publication_authors'].append({'name': pub_auth_name})
        year_publ, nu_of_cit = read_chunk('ii', f)
        # Finish building your record with the remaining fields...
        for cit in range(nu_of_cit):
            cit_id, len_cit_title = read_chunk('ii', f)
            cit_title, num_cit_auth = read_chunk(str(len_cit_title)+'si', f)
        for cit_auth in range(num_cit_auth):
            len_cit_auth_name = read_chunk('i', f)
            cit_auth_name = read_chunk(str(len_cit_auth_name)+'s', f)
        year_cit_publ = read_chunk('i', f)
    return record

def parse_file(filename):
    records = []
    with open(filename, 'rb') as f:
        while True:
            try:
                records.append(read_record(f))
            except struct.error:
                break

이 파일을 읽으려면 :

https://drive.google.com/open?id=0B3SYAHrxLP69NHlWc25KeXFHNVE

이 형식으로 :

read_record 함수 내에서 올바른 변수 author_id, len_author_name, author_name을 읽었지만 nu_of_publ 및 아래 변수는 올바르게 읽히지 않습니다.

무엇이 잘못되었는지 아십니까?

이 코드를 실행할 때 :

author_id, len_author_name = read_chunk('LL', f)
    author_name, nu_of_publ= read_chunk(str(len_author_name)+'sL', f)
    #nu_of_publ = read_chunk('I', f)# 's' or 'c' ?
    record = {  'author_id': author_id,
                'author_name': author_name,
                'publications': [] }
    print (record, nu_of_publ)


for pub in range(nu_of_publ):
        pub_id, len_pub_title = read_chunk('LL', f)
        print (pub_id, len_pub_title)

나는이 결과를 취합니다.

{ 'author_name': b'Scott Shenker ','author_id ': 1,'출판물 ': []} 256 15616 1953384704

그러나 256 대신 200, 15616 대신 1을 인쇄합니다.

반지

이 형식은 올바르지 않습니다.

author_name, nu_of_publ = read_chunk(str(len_author_name)+'si', f)

N 개의 문자와 정수의 구조를 정의하고 있습니다. 이러한 구조는 c에 정의 된 구조가있는 경우와 동일한 방식으로 정렬 됩니다 .

struct {
    char author_name[N];
    int nu_of_publ;
};

정렬은 4의 배수 인 위치에 모든 int의 시작을 배치합니다. 이는 CPU가 이러한 주소에 액세스하도록 최적화되어 있기 때문에 수행됩니다 (C에서).

따라서 저자의 이름 길이가 6이면 다음 정수를 읽기 전에 다음 2 바이트를 건너 뜁니다.

구조를 분리하는 하나의 솔루션 :

author_name = read_chunk(str(len_author_name)+'s', f)
nu_of_publ, = read_chunk('i', f)

참고 : nu_of_publ ( nu_of_publ,) 뒤의 쉼표 는에서 반환 한 튜플을 풀기위한 것 read_chunk입니다.

또 다른 해결책은 spec= 의 테이블을 기반으로 처음에 구조를 지정 하는 것입니다 .

author_name, nu_of_publ = read_chunk('={}si'.format(len_author_name), f)

이 기사는 인터넷에서 수집됩니다. 재 인쇄 할 때 출처를 알려주십시오.

침해가 발생한 경우 연락 주시기 바랍니다[email protected] 삭제

에서 수정2021-06-14

몇 마디 만하겠습니다

0리뷰

로그인참여 후 검토

Related 관련 기사

기사

Python 3.5로 바이너리 파일에서 읽기

Python 3.5로 바이너리 파일에서 읽기

Clisp에서 바이너리 모드로 파일 쓰기 / 읽기

Python에서 바이너리 파일 읽기

R로 바이너리 파일 읽기

Deno로 바이너리 파일 읽기

memoryview로 바이너리 파일 읽기

바이너리 파일을 C에서 bool 배열로 읽기

mongodb에서 바이너리 로그 파일 읽기

Python 2.7에서 바이너리 모드로 읽은 모의 파일

Mac에서 실행 가능한 바이너리로 Python3 컴파일

바이너리 파일에서 읽기

바이너리 파일에서 읽기

바이너리 파일에서 읽기

Python에서 바이너리 파일을 바이트 단위로 읽는 가장 빠른 방법

C-바이너리 파일에서 모든 구조체로 읽기

Python : 문자 오류를 파일에서 읽기로 바꾸기

C # BinaryReader로 파이썬 바이너리 파일 읽기

Python 3.6 : 비어 있지 않은 바이너리 파일 읽기는 Python에서 비어있는 것으로 해석됩니다.

바이너리 파일에서 읽지 않는 프로그램

바이너리 / 16 진수로 파일 읽기

hexdump 및 바이너리 파일을 텍스트로 읽기

MIPS의 배열로 바이너리 파일 읽기

바이너리 파일을 ints 배열로 읽기 C ++

부분별로 바이너리 파일 읽기

C ++로 압축 된 바이너리 파일 읽기

바이너리 파일을 std :: vector <bool>로 읽기

ifstream으로 바이너리 파일 읽기

Swift에서 바이너리 (.hgt) 파일 읽기 (C ++에서 swift로 코드 마이그레이션)

python2-파일에서 서명 된 문자로 바이트 읽기

Python의 로그 파일에서 데이터 읽기