当循环找到特定的字符串序列时创建一个新索引

debugcn 发表于 Dev

汤匙

我有一个汽车规格的字符串清单。但是，将不同的修剪粉碎在一起，我希望代码以年份为指标自动将它们分开。它必须精确地是4位数字或在值的范围内，因为有3位数字值和5位数字值，但年份始终为4。我需要告诉什么代码来寻找要创建的4位代码换一行，然后继续循环？

这是代码：

import re
import requests
import csv
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

# headers = {
#    'User-Agent': 'Mewspoon',
#    'From': '[email protected]'
#}

URL = requests.get('https://www.caranddriver.com/reviews/a24847025/2018-
ford-mustang-automatic-transmission-performance/')

soup = BeautifulSoup(URL.text, 'html.parser')

for tag in soup.find_all(class_="specs-content"):
    DataList=pd.DataFrame(tag.get_text(strip=True, separator="\n").split())

    #create file
df.to_excel('CarScrapeTest.xlsx', sheet_name='Car&Driver')
    
#File Format
df = pd.DataFrame(DataList).transpose()

亚瑟·佩雷拉（Arthur Pereira）

回答您的问题，您可以re.match(r'.*([1-3][0-9]{3})', text)用来检查有效年份。并且如果它匹配，您将开始在注释数据帧上进行写入。

我还注意到您正在尝试获取汽车规格，因此我编写了一个litle循环，可用于将信息添加到数据框，然后将其写入csv。我使用:标记分隔属性和值，然后将其串联在df上。
干杯。

import requests
from bs4 import BeautifulSoup
import pandas as pd
import re

URL = requests.get('https://www.caranddriver.com/reviews/a24847025/2018-ford-mustang-automatic-transmission-performance/')

soup = BeautifulSoup(URL.text, 'html.parser')
specifications = soup.find(class_="specs-content")

cars_specs = dict()
df = pd.DataFrame()

for paragraph in specifications.find_all('p'):
    paragraph_text = paragraph.get_text(strip=True, separator="\n").strip()

    if paragraph_text == "Specifications":
        continue

    year = re.match(r'.*([1-3][0-9]{3})', paragraph_text)
    if year:
        if len(cars_specs) > 1:
            new_df = pd.DataFrame.from_dict(cars_specs, orient='index')
            df = pd.concat([df, new_df], axis=1, sort=False)

        cars_specs = {'Car': paragraph_text}

    else:
        specs = paragraph_text.split('\n')
        for index in range(len(specs) - 1):

            if specs[index].find(':') == len(specs[index]) - 1:
                cars_specs[specs[index].replace(':','')] = specs[index + 1]
            elif specs[index].find(':') > 1:
                inline_specs = specs[index].split(':')
                cars_specs[inline_specs[0]] = inline_specs[1]

else:
    new_df = pd.DataFrame.from_dict(cars_specs, orient='index')
    df = pd.concat([df, new_df], axis=1, sort=False)

print(df)
df.to_csv('CarScrapeTest.csv', encoding='utf-8', header=False, sep=';')

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。