美しいスープでラムダ関数を使用する

debugcn 投稿 Dev

セム

特定のテキストを含むリンクを照合しようとしています。私がやっている

links = soup.find_all('a',href=lambda x: ".org" in x)

しかし、それはTypeErrorをスローします：タイプ 'NoneType'の引数は反復可能ではありません。

それを行う正しい方法は明らかにです

links = soup.find_all('a',href=lambda x: x and ".org" in x)

x andここで追加が必要なのはなぜですか？

アラン-フェイ

単純な理由があり<a>ます。HTMLのタグの1つにhrefプロパティがありません。

例外を再現する最小限の例を次に示します。

html = '<html><body><a>bar</a></body></html>'
soup = BeautifulSoup(html, 'html.parser')

links = soup.find_all('a', href=lambda x: ".org" in x)
# result:
# TypeError: argument of type 'NoneType' is not iterable

ここで、hrefプロパティを追加すると、例外はなくなります。

html = '<html><body><a href="foo.org">bar</a></body></html>'
soup = BeautifulSoup(html, 'html.parser')

links = soup.find_all('a', href=lambda x: ".org" in x)
# result:
# [<a href="foo.org">bar</a>]

何が起こっているのかというと、BeautifulSoupが<a>タグのhrefプロパティにアクセスしようとしNoneていて、プロパティが存在しない場合に返されます。

html = '<html><body><a>bar</a></body></html>'
soup = BeautifulSoup(html, 'html.parser')

print(soup.a.get('href'))
# output: None

これがNone、ラムダで値を許可する必要がある理由です。以来Nonefalsy値であり、コードがx and ...右側防止andときに実行されることから文をxあるNoneあなたがここに見ることができるように、：

>>> None and 1/0
>>> 'foo.org' and 1/0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: division by zero

これは短絡と呼ばれます。

それは、言ったx and ...のを確認truthiness x、およびNonefalsyと考えています唯一の値ではありません。したがってx、次のNoneように比較する方が正しいでしょう。

lambda x: x is not None and ".org" in x

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集2021-06-2

コメントを追加

サインイン

分類Dev