我想通过从 api 的每一页(每页限制 100 行)获取数据来构建一个数据框。目前,下面的代码返回所有数据,但结构错误。
有 17 个标题,因此我需要 17 列中的数据。但是,它输出 [100 行 x 1700 列] 的数据帧,我需要 [10000 行 x 17 列]。
我不确定如何实现这一目标 - 任何帮助将不胜感激。
from ebaysdk.finding import Connection as finding
from bs4 import BeautifulSoup
import pandas as pd
x = []
for i in range(1,101):
print(type(i))
api = finding(siteid='EBAY-GB',appid='some_id',config_file=None)
response = api.execute('findItemsByKeywords', {'keywords': 'phone', 'outputSelector' : 'SellerInfo',
'paginationInput': {'entriesPerPage': '2','pageNumber': ' '+str(i)}})
soup = BeautifulSoup(response.content, 'lxml')
items = soup.find_all('item')
headers = ['itemid','title','categoryname','categoryid','postalcode','location','sellerusername','feedbackscore','positivefeedbackpercent','topratedseller','shippingservicecost','buyitnowavailable','currentprice','starttime','endtime','watchcount','conditionid']
for object in headers:
values = [element.text for element in soup.find_all(object)]
x.append(values)
df = pd.DataFrame(x)
df = df.T
print(x)
#[['152668959069', '252999725410'], ['Samsung GALAXY Ace GT-S5830i (Unlocked) Smartphone Android Phone- ALL COLOURS UK', '8GB 3G Unlocked Android 5.1 Quad Core Smartphone Mobile Phone 2 SIM GPS qHD'], ['Mobile & Smart Phones', 'Mobile & Smart Phones'], ['9355', '9355'], ['RM137PP'], ['Rainham,United Kingdom', 'United Kingdom'], ['deals4u_shop', 'smartlife2017'], ['15700', '456'], ['99.9', '98.5'], ['true', 'true'], ['0.0', '0.0'], ['false', 'false'], ['32.49', '48.9'], ['2017-08-18T18:36:28.000Z', '2017-06-19T09:04:40.000Z'], ['2017-12-16T18:36:28.000Z', '2017-12-16T09:04:40.000Z'], ['272', '134'], ['1000', '1000']]
print(df)
0 1 \
0 152668959069 Samsung GALAXY Ace GT-S5830i (Unlocked) Smartp...
1 252999725410 8GB 3G Unlocked Android 5.1 Quad Core Smartpho...
2 3 4 5 \
0 Mobile & Smart Phones 9355 RM137PP Rainham,United Kingdom
1 Mobile & Smart Phones 9355 None United Kingdom
6 7 8 9 ... 24 25 26 27 28 29 \
0 deals4u_shop 15700 99.9 true ... 456 98.5 true 0.0 false 48.9
1 smartlife2017 456 98.5 true ... 456 98.5 true 0.0 false 48.9
30 31 32 33
0 2017-06-19T09:04:40.000Z 2017-12-16T09:04:40.000Z 214 1000
1 2017-06-19T09:04:40.000Z 2017-12-16T09:04:40.000Z 182 1000
编辑:为第一页的前 2 个条目添加更多代码并打印 x,为 2 页的前 2 个条目添加 df。
这应该工作得更好。
词典理解版:
data_dict = {obj: [element.text for element in soup.find_all(obj)] for obj in headers}
df = pd.DataFrame(data_dict)
循环版本:
data_dict = {}
for obj in headers:
data_dict[obj] = [element.text for element in soup.find_all(obj)]
df = pd.DataFrame(data_dict)
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句