使用BeautifulSoup遍历并检索特定的URL

Edward Lin 发表于 Dev

林爱德

我想使用BeautifulSoup并在特定位置重复检索特定的URL。您可能会想像有4个不同的URL列表，每个列表包含100个不同的URL链接。

我需要始终在每个列表上获取并打印第三个URL，而先前的URL（例如，第一个列表上的第三个URL）将导致第二个列表（然后需要获取并打印第三个URL，依此类推，直到第四次检索）。

但是，我的循环仅获得第一个结果（列表1中的第三个URL），而且我不知道如何将新URL循环回到while循环并继续该过程。

这是我的代码：

import urllib.request
import json
import ssl
from bs4 import BeautifulSoup


num=int(input('enter count times: ' ))
position=int(input('enter position: ' ))

url='https://pr4e.dr-chuck.com/tsugi/mod/python-   
data/data/known_by_Fikret.html'
print (url)

count=0
order=0
while count<num:
    context = ssl._create_unverified_context()
    htm=urllib.request.urlopen(url, context=context).read()
    soup=BeautifulSoup(htm)
    for i in soup.find_all('a'):
        order+=1
        if order ==position:
            x=i.get('href')
            print (x)
    count+=1
    url=x        
print ('done')

只需find_all()按索引获取链接：

while count < num:
    context = ssl._create_unverified_context()
    htm = urllib.request.urlopen(url, context=context).read()

    soup = BeautifulSoup(htm)
    url = soup.find_all('a')[position].get('href')

    count += 1

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。