TumblrAPIから20以上の投稿を印刷する

debugcn 投稿 Dev

イジー

こんにちは、

私はPythonを初めて使用しますが、指定したTumblrアカウントからすべての投稿（「メモ」を含む）をコンピューターにダウンロードできるようにするコードを作成しようとしています。

コーディングの経験が浅いので、これを可能にする既成のスクリプトを見つけようとしていました。GitHubでいくつかのすばらしいスクリプトを見つけましたが、実際にはTumblrの投稿からメモを返すものはありません（私が見る限り、誰かが知っている場合は訂正してください！）。

そのため、自分でスクリプトを書いてみました。私は以下のコードである程度の成功を収めました。指定されたTumblrからの最新の20件の投稿を印刷します（かなり醜い形式ですが、基本的に数百行のテキストがすべてメモ帳ファイルの1行に印刷されます）。

#This script prints all the posts (including tags, comments) and also the 
#first 20notes from all the Tumblr blogs.

import pytumblr

# Authenticate via API Key
client = pytumblr.TumblrRestClient('myapikey')

#offset = 0

# Make the request
client.posts('staff', limit=2000, offset=0, reblog_info=True, notes_info=True, 
filter='html')
#print out into a .txt file
with open('out.txt', 'w') as f:
print >> f, client.posts('staff', limit=2000, offset=0, reblog_info=True, 
notes_info=True, filter='html')

ただし、指定したブログの最後に到達するまで、スクリプトで投稿を継続的に印刷する必要があります。

このサイトを検索したところ、非常によく似た質問（PyTumblrから返される投稿は20件のみ）が見つかりました。これは、stackoverflowユーザーのポークによって回答されました。ただし、実際にpokeのソリューションを実装して、データに対して機能するようにすることはできないようです。実際、次のスクリプトを実行すると、出力はまったく生成されません。

import pytumblr

# Authenticate via API Key
client = pytumblr.TumblrRestClient('myapikey')
blog = ('staff')
def getAllPosts (client, blog):
offset = 0
while True:
    posts = client.posts(blog, limit=20, offset=offset, reblog_info=True, notes_info=True)
    if not posts:
        return

    for post in posts:
        yield post


    offset += 20

このサイトにはTumblrノートに関するいくつかの投稿があり（たとえば、Tumblr APIで50以上のノートを取得する）、それらのほとんどは投稿ごとに50以上のノートをダウンロードする方法を尋ねていることに注意してください。投稿あたり50のメモだけで完全に満足しています。それは、私が増やしたい投稿の数です。

また、この投稿にPythonのタグを付けましたが、別のプログラミング言語を使用して必要なデータを取得するためのより良い方法があれば、それで十分です。

よろしくお願いします！

wkl

tl; dr答えだけを見たい場合は、「修正されたバージョン」の見出しの後の下部にあります。

2番目のコードスニペットは投稿を1つずつ生成するジェネレーターであるため、ループのようなものの一部として使用してから、出力で何かを行う必要があります。これは、ジェネレーターを反復処理し、ジェネレーターが取得したデータを出力する追加のコードを含むコードです。

import pytumblr

def getAllPosts (client, blog):
    offset = 0
    while True:
        posts = client.posts(blog, limit=20, offset=offset, reblog_info=True, notes_info=True)
        if not posts:
            return

        for post in posts:
            yield post

        offset += 20

# Authenticate via API Key
client = pytumblr.TumblrRestClient('myapikey')
blog = ('staff')

# use the generator getAllPosts
for post in getAllPosts(client, blog):
    print(post)

ただし、そのコードにはいくつかのバグがあります。シェルでgetAllPosts実行したこの例からわかるように、各投稿だけを生成するのではなく、API応答を反復処理するため、他のものも返しipythonます。

In [7]: yielder = getAllPosts(client, 'staff')

In [8]: next(yielder)
Out[8]: 'blog'

In [9]: next(yielder)
Out[9]: 'posts'

In [10]: next(yielder)
Out[10]: 'total_posts'

In [11]: next(yielder)
Out[11]: 'supply_logging_positions'

In [12]: next(yielder)
Out[12]: 'blog'

In [13]: next(yielder)
Out[13]: 'posts'

In [14]: next(yielder)
Out[14]: 'total_posts'

これは、のpostsオブジェクトがブログのgetAllPosts各投稿だけでなくstaff、ブログに含まれる投稿の数、ブログの説明、最後に更新された日時などの項目も含む辞書であるために発生します。コードはそのままです。次の条件により、無限ループが発生する可能性があります。

if not posts:
    return

からの空のTumblrAPI応答はpytumblr次のようになるため、応答構造のために真になることはありません。

{'blog': {'ask': False,
  'ask_anon': False,
  'ask_page_title': 'Ask me anything',
  'can_send_fan_mail': False,
  'can_subscribe': False,
  'description': '',
  'followed': False,
  'is_adult': False,
  'is_blocked_from_primary': False,
  'is_nsfw': False,
  'is_optout_ads': False,
  'name': 'asdfasdf',
  'posts': 0,
  'reply_conditions': '3',
  'share_likes': False,
  'subscribed': False,
  'title': 'Untitled',
  'total_posts': 0,
  'updated': 0,
  'url': 'https://asdfasdf.tumblr.com/'},
 'posts': [],
 'supply_logging_positions': [],
 'total_posts': 0}

if not postspostsフィールド（ここでは空のリスト）ではなく、その構造に対してチェックされるため、応答ディクショナリが空ではないために条件が失敗することはありません（Pythonでの真理値テストを参照）。

修正されたバージョン

これは、getAllPosts実装からのループを修正し、関数を使用して投稿を取得し、名前のファイルにダンプするコード（主にテスト/検証済み）です(BLOG_NAME)-posts.txt。

import pytumblr


def get_all_posts(client, blog):
    offset = 0
    while True:
        response = client.posts(blog, limit=20, offset=offset, reblog_info=True, notes_info=True)

        # Get the 'posts' field of the response        
        posts = response['posts']

        if not posts: return

        for post in posts:
            yield post

        # move to the next offset
        offset += 20


client = pytumblr.TumblrRestClient('secrety-secret')
blog = 'staff'

# use our function
with open('{}-posts.txt'.format(blog), 'w') as out_file:
    for post in get_all_posts(client, blog):
        print >>out_file, post
        # if you're in python 3.x, use the following
        # print(post, file=out_file)

これは、APIの投稿応答のストレートテキストダンプになります。したがって、見栄えを良くする必要がある場合は、それはあなた次第です。

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集2021-05-31

コメントを追加

サインイン

分類Dev

Related 関連記事

記事