python大熊猫删除字符

debugcn 发表于 Dev

保罗·卡尔森

我正在做一个项目，我需要删除数据结果的最左边和最右边的字符。数据构成了craigslist的一部分，并且邻域结果返回为'（####）'，但是我需要的是####。我正在使用熊猫，并尝试使用lstrip和rstrip。当我在python shell中尝试它时，它可以工作，但是当我在数据中使用它时，它不起作用。

post_results['neighborhood'] = post_results['neighborhood'].str.lstrip('(')
post_results['neighborhood'] = post_results['neighborhood'].str.rstrip(')')

由于某种原因，rstrip确实可以工作并删除了'）'，但lstrip却没有。

完整的代码是：

from bs4 import BeautifulSoup
import json
from requests import get
import numpy as np
import pandas as pd
import csv


print('hello world')
#get the initial page for the listings, to get the total count
response = get('https://washingtondc.craigslist.org/search/hhh?query=rent&availabilityMode=0&sale_date=all+dates')
html_result = BeautifulSoup(response.text, 'html.parser')
results = html_result.find('div', class_='search-legend')
total = int(results.find('span',class_='totalcount').text)
pages = np.arange(0,total+1,120)

neighborhood = []
bedroom_count =[]
sqft = []
price = []
link = []

for page in pages:
    #print(page)

    response = get('https://washingtondc.craigslist.org/search/hhh?s='+str(page)+'query=rent&availabilityMode=0&sale_date=all+dates')
    html_result = BeautifulSoup(response.text, 'html.parser')

    posts = html_result.find_all('li', class_='result-row')
    for post in posts:
        if post.find('span',class_='result-hood') is not None:
            post_url = post.find('a',class_='result-title hdrlnk')
            post_link = post_url['href']
            link.append(post_link)
            post_neighborhood = post.find('span',class_='result-hood').text
            post_price = int(post.find('span',class_='result-price').text.strip().replace('$',''))
            neighborhood.append(post_neighborhood)
            price.append(post_price)
            if post.find('span',class_='housing') is not None:
                if 'ft2' in post.find('span',class_='housing').text.split()[0]:
                    post_bedroom = np.nan
                    post_footage = post.find('span',class_='housing').text.split()[0][:-3]
                    bedroom_count.append(post_bedroom)
                    sqft.append(post_footage)
                elif len(post.find('span',class_='housing').text.split())>2:
                    post_bedroom = post.find('span',class_='housing').text.replace("br","").split()[0]
                    post_footage = post.find('span',class_='housing').text.split()[2][:-3]
                    bedroom_count.append(post_bedroom)
                    sqft.append(post_footage)
                elif len(post.find('span',class_='housing').text.split())==2:
                    post_bedroom = post.find('span',class_='housing').text.replace("br","").split()[0]
                    post_footage = np.nan
                    bedroom_count.append(post_bedroom)
                    sqft.append(post_footage)
            else:
                post_bedroom = np.nan
                post_footage = np.nan
                bedroom_count.append(post_bedroom)
                sqft.append(post_footage)



#create results data frame
post_results = pd.DataFrame({'neighborhood':neighborhood,'footage':sqft,'bedroom':bedroom_count,'price':price,'link':link})
#clean up results
post_results.drop_duplicates(subset='link')
post_results['footage'] = post_results['footage'].replace(0,np.nan)
post_results['bedroom'] = post_results['bedroom'].replace(0,np.nan)
post_results['neighborhood'] = post_results['neighborhood'].str.lstrip('(')
post_results['neighborhood'] = post_results['neighborhood'].str.rstrip(')')
post_results = post_results.dropna(subset=['footage','bedroom'],how='all')
post_results.to_csv("rent_clean.csv",index=False)
print(len(post_results.index))

贝尼

当您前面有空格时，将发生此问题

例如：

s=pd.Series([' (xxxx)','(yyyy) '])
s.str.strip('(|)')
0     (xxxx
1    yyyy) 
dtype: object

我们可以做的是strip两次

s.str.strip().str.strip('(|)')
0    xxxx
1    yyyy
dtype: object

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-04-2

我来说两句

0条评论

登录后参与评论

来自分类Dev

大熊猫联产NaN

来自分类Dev

大熊猫：计算下排的字符串条件

来自分类Dev

更新大熊猫的价值

来自分类Dev

大熊猫的内在条件

来自分类Dev

获得大熊猫的骨料

来自分类Dev

大熊猫的花式索引

来自分类Dev

大熊猫密谋

来自分类Dev

大熊猫堆栈与行号

来自分类Dev

排序大熊猫MultiIndex

来自分类Dev

大熊猫申请

来自分类Dev

如何删除大熊猫中的重复项？

来自分类Dev

大熊猫的选择

来自分类Dev

平行大熊猫适用

来自分类Dev

大熊猫不承认“||” 作为字符串分割

来自分类Dev

嵌套的联接使用Python大熊猫数据帧

来自分类Dev

分组删除存在重复行的列。大熊猫

来自分类Dev

可以申请大熊猫

来自分类Dev

如何根据条件删除大熊猫中的行？

来自分类Dev

在python大熊猫中使用loc时出错

来自分类Dev

大熊猫结合成串的字符串

来自分类Dev

大熊猫：根据复杂的逻辑删除具有特定字符串的行和列

来自分类Dev

与大熊猫聚集

来自分类Dev

大熊猫与重复

来自分类Dev

Python大熊猫groupby过滤器

来自分类Dev

大熊猫：加载csv时跳过字符串

来自分类Dev

用“ _”分隔字符串列，删除前面的文本，用“ _”组合str在大熊猫中

来自分类Dev

如何删除大熊猫中的重复项？

来自分类Dev

删除大熊猫中与标题匹配的行

来自分类Dev

python - 在应用 %LIKE% 的大熊猫中进行 vlookup

Related 相关文章

文章