这是我的代码:
from bs4 import BeautifulSoup
import urllib.request
import re
url = urllib.request.urlopen("http://www.djmaza.info/Abhi-Toh-Party-Khubsoorat-Full-Song-MP3-2014-Singles.html")
content = url.read()
soup = BeautifulSoup(content)
for a in soup.findAll('a',href=True):
if re.findall('http',a['href']):
print ("URL:", a['href'])
此代码的输出:
URL: http://twitter.com/mp3khan
URL: http://www.facebook.com/pages/MP3KhanCom-Music-Updates/233163530138863
URL: https://plus.google.com/114136514767143493258/posts
URL: http://www.djhungama.com
URL: http://www.djhungama.com
URL: http://songs.djmazadownload.com/music/Singles/Abhi Toh Party (Khoobsurat) -190Kbps [DJMaza.Info].mp3
URL: http://songs.djmazadownload.com/music/Singles/Abhi Toh Party (Khoobsurat) -190Kbps [DJMaza.Info].mp3
URL: http://songs.djmazadownload.com/music/Singles/Abhi Toh Party (Khoobsurat) -320Kbps [DJMaza.Info].mp3
URL: http://songs.djmazadownload.com/music/Singles/Abhi Toh Party (Khoobsurat) -320Kbps [DJMaza.Info].mp3
URL: http://www.htmlcommentbox.com
URL: http://www.djmaza.com
URL: http://www.djhungama.com
我只需要.mp3链接。
那么,我应该如何重写代码?
谢谢你
更改findAll
为使用正则表达式进行匹配,例如:
for a in soup.findAll('a',href=re.compile('http.*\.mp3')):
print ("URL:", a['href'])
有关评论的更新:
我需要将这些链接存储在数组中以便下载。我怎样才能做到这一点 ?
您可以使用列表理解来构建列表:
links = [a['href'] for a in soup.find_all('a',href=re.compile('http.*\.mp3'))]
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句