我的代码无法打印我想要的所有字符串时遇到了麻烦,我不确定如何编辑代码来更改它。
我正在尝试刮除所有字符串,包括460 hp @ 7000 rpm之类的东西,目前它还没有刮擦。理想情况下,强元素中的字符串应保持分开。我尝试添加另一个.next_sibling,将br更改为p,然后将它们强返回一个错误。
HTML如下:
<div class="specs-content">
<p>
<strong>Displacement:</strong>
" 307 cu in, 5038 "
<br>
<strong>Power:</strong>
" 460 hp @ 7000 rpm "
<br>
<strong>Torque:</strong>
" 420 lb-ft @ 4600 rpm "
</p>
<p>
<strong>TRANSMISSION:</strong>
" 10-speed automatic with manual shifting mode "
</p>
<p>
<strong>CHASSIS</strong>
<br>
" Suspension (F/R): struts/multilink "
<br>
" Brakes (F/R): 15.0-in vented disc/13.0-in vented disc "
<br>
" Tires: Michelin Pilot Sport 4S, F: 255/40ZR-19 (100Y) R: 275/40ZR-19 (105Y) "
</p>
</div>
到目前为止,我已经编写了以下代码:
import requests
from bs4 import BeautifulSoup
URL = requests.get('https://www.LinkeHere.com')
soup = BeautifulSoup(URL.text, 'html.parser')
FindClass = soup.find(class_='specs-content')
FindElement = FindClass.find_all('br')
for Specs in FindElement:
Specs = Specs.next_sibling
print(Specs.string)
返回:
功率:
扭矩:
悬架(F / R):撑杆/多连杆
刹车(F / R):13.9英寸通风盘/13.0英寸通风盘
轮胎:米其林Pilot Sport 4S,255 / 40ZR-19(100Y)
您可以在get_text()
方法中添加换行符\n
作为separator
参数:
from bs4 import BeautifulSoup
html = """THE ABOVE HTML SNIPPET"""
soup = BeautifulSoup(html, "html.parser")
for tag in soup.find_all(class_="specs-content"):
print(tag.get_text(strip=True, separator="\n").replace('"', ""))
输出:
Displacement:
307 cu in, 5038
Power:
460 hp @ 7000 rpm
Torque:
420 lb-ft @ 4600 rpm
TRANSMISSION:
10-speed automatic with manual shifting mode
CHASSIS
Suspension (F/R): struts/multilink
Brakes (F/R): 15.0-in vented disc/13.0-in vented disc
Tires: Michelin Pilot Sport 4S, F: 255/40ZR-19 (100Y) R: 275/40ZR-19 (105Y)
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句