如何到达HTML中的下一个元素

debugcn 发表于 Dev

汤匙

我的代码无法打印我想要的所有字符串时遇到了麻烦，我不确定如何编辑代码来更改它。

我正在尝试刮除所有字符串，包括460 hp @ 7000 rpm之类的东西，目前它还没有刮擦。理想情况下，强元素中的字符串应保持分开。我尝试添加另一个.next_sibling，将br更改为p，然后将它们强返回一个错误。

HTML如下：

<div class="specs-content">
   <p>
     <strong>Displacement:</strong>
     " 307 cu in, 5038 "
     <br>
     <strong>Power:</strong> 
     " 460 hp @ 7000 rpm "
     <br>
     <strong>Torque:</strong>
     " 420 lb-ft @ 4600 rpm "
   </p>
   <p>
     <strong>TRANSMISSION:</strong> 
     " 10-speed automatic with manual shifting mode "
   </p>
   <p>
     <strong>CHASSIS</strong>
     <br>
     " Suspension (F/R): struts/multilink "
     <br>
     " Brakes (F/R): 15.0-in vented disc/13.0-in vented disc "
     <br>
     " Tires: Michelin Pilot Sport 4S, F: 255/40ZR-19 (100Y) R: 275/40ZR-19 (105Y) "
   </p>
</div>

到目前为止，我已经编写了以下代码：

import requests
from bs4 import BeautifulSoup

URL = requests.get('https://www.LinkeHere.com')

soup = BeautifulSoup(URL.text, 'html.parser')

FindClass = soup.find(class_='specs-content')
FindElement = FindClass.find_all('br')

for Specs in FindElement:
    Specs = Specs.next_sibling
    print(Specs.string)

功率：

扭矩：

悬架（F / R）：撑杆/多连杆

刹车（F / R）：13.9英寸通风盘/13.0英寸通风盘

轮胎：米其林Pilot Sport 4S，255 / 40ZR-19（100Y）

孟德尔

您可以在get_text()方法中添加换行符\n作为separator参数：

from bs4 import BeautifulSoup

html = """THE ABOVE HTML SNIPPET"""

soup = BeautifulSoup(html, "html.parser")

for tag in soup.find_all(class_="specs-content"):
    print(tag.get_text(strip=True, separator="\n").replace('"', ""))

输出：

Displacement:
 307 cu in, 5038 
Power:
 460 hp @ 7000 rpm 
Torque:
 420 lb-ft @ 4600 rpm 
TRANSMISSION:
 10-speed automatic with manual shifting mode 
CHASSIS
 Suspension (F/R): struts/multilink 
 Brakes (F/R): 15.0-in vented disc/13.0-in vented disc 
 Tires: Michelin Pilot Sport 4S, F: 255/40ZR-19 (100Y) R: 275/40ZR-19 (105Y)

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。