我对如何从给定的xml页面中删除所有链接(仅包含字符串“ mp3”)感到困惑。以下代码仅返回空括号:
# Import required modules
from lxml import html
import requests
# Request the page
page = requests.get('https://feeds.megaphone.fm/darknetdiaries')
# Parsing the page
# (We need to use page.content rather than
# page.text because html.fromstring implicitly
# expects bytes as input.)
tree = html.fromstring(page.content)
# Get element using XPath
buyers = tree.xpath('//enclosure[@url="mp3"]/text()')
print(buyers)
我使用@url错误吗?
我正在寻找的链接:
任何帮助将不胜感激!
以下将xpath
无法正常工作,正如您提到的那样,它也是@url
和text()
//enclosure[@url="mp3"]/text()
url
任何属性//enclosure
都应包含mp3
然后返回/@url
更改此行:
buyers = tree.xpath('//enclosure[@url="mp3"]/text()')
至
buyers = tree.xpath('//enclosure[contains(@url,"mp3")]/@url')
输出量
['https://www.podtrac.com/pts/redirect.mp3/traffic.megaphone.fm/ADV9231072845.mp3?updated=1610644901',
'https://www.podtrac.com/pts/redirect.mp3/traffic.megaphone.fm/ADV2643452814.mp3?updated=1609788944',
'https://www.podtrac.com/pts/redirect.mp3/traffic.megaphone.fm/ADV5381316822.mp3?updated=1607279433',
'https://www.podtrac.com/pts/redirect.mp3/traffic.megaphone.fm/ADV9145504181.mp3?updated=1607280708',
'https://www.podtrac.com/pts/redirect.mp3/traffic.megaphone.fm/ADV4345070838.mp3?updated=1606110384',
'https://www.podtrac.com/pts/redirect.mp3/traffic.megaphone.fm/ADV8112097820.mp3?updated=1604866665',
'https://www.podtrac.com/pts/redirect.mp3/traffic.megaphone.fm/ADV2164178070.mp3?updated=1603781321',
'https://www.podtrac.com/pts/redirect.mp3/traffic.megaphone.fm/ADV1107638673.mp3?updated=1610220449',
...]
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句