현재 beautifulsoup에서이 테이블을 가져 와서 여러 데이터 프레임으로 분할하고 싶습니다. 녹색 헤더 요소가 나타날 때마다 분할하고 싶습니다.
다음은 웹 페이지입니다. http://www.greyhound-data.com/d?page=stadia&st=1011&land=au&stadiummode=3
알아낼 수 없어서 지금은이게 전부 야,이 문제에 익숙해 져서 테이블 만 따로
url = "http://www.greyhound-data.com/d?page=stadia&st=1011&land=au&stadiummode=3"
req = requests.get(url).text
soup = BeautifulSoup(req, 'lxml')
table = soup.find_all("table", attrs={'id': "green"})
table = table[-1]
df = pd.read_html(str(table))[0]
output:
Year quarter ... Set on
Distance: 331 m / 362 y ... Distance: 331 m / 362 y
0 2020 2nd ... 15 JUN 2020
1 2020 1st ... 23 JAN 2020
2 2019 4th ... 6 OCT 2019
3 2019 3rd ... 1 SEP 2019
4 2019 2nd ... 28 APR 2019
.. ... ... ...
319 2002 3rd ... 5 SEP 2002
320 2002 2nd ... 6 JUN 2002
321 2001 4th ... 18 OCT 2001
322 2001 3rd ... 16 AUG 2001
323 2001 2nd ... 14 JUN 2001
[324 rows x 7 columns]
이 스크립트는 테이블을 여러 데이터 프레임으로 분할합니다.
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = "http://www.greyhound-data.com/d?page=stadia&st=1011&land=au&stadiummode=3"
req = requests.get(url).text
soup = BeautifulSoup(req, 'lxml')
table = soup.find_all("table", attrs={'id': "green"})[-1]
trs, dfs, all_data = table.select('tr'), [], []
header = [th.get_text(strip=True) for th in trs[0].select('th')]
for tr in trs[2:]:
if tr.td:
all_data.append([td.get_text(strip=True) for td in tr.select('td')])
else:
dfs.append(pd.DataFrame(all_data, columns=header))
all_data = []
dfs.append(pd.DataFrame(all_data, columns=header))
# print all DataFrames in list:
for df in dfs:
print(df)
print('-' * 160)
인쇄물:
Year quarter running dif.dogs average time avg win time best time Set by Set on
0 2020 2nd 226 19.63 19.18 18.79 Data Base 15 JUN 2020
1 2020 1st 255 19.68 19.14 18.58 Wazza Who 23 JAN 2020
.. ... ... ... ... ... ... ...
39 2010 3rd 286 19.85 19.34 18.90 Royal Surfer 15 SEP 2010
40 2010 2nd 92 20.01 19.57 19.28 Paw Form 16 JUN 2010
[41 rows x 7 columns]
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Year quarter running dif.dogs average time avg win time best time Set by Set on
0 2020 2nd 217 23.40 22.79 22.25 Canya Cruise 3 JUN 2020
1 2020 1st 285 23.35 22.85 22.47 Dawn's Dream 22 JAN 2020
.. ... ... ... ... ... ... ...
65 2004 1st 3 23.54 23.25 23.25 Seismic Shock 9 JAN 2004
66 2003 4th 16 23.67 23.33 23.29 Far Away Places 17 OCT 2003
[67 rows x 7 columns]
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Year quarter running dif.dogs average time avg win time best time Set by Set on
0 2020 2nd 264 30.68 30.13 29.56 Oh Mickey 23 APR 2020
1 2020 1st 224 30.70 30.12 29.41 Sennachie 10 JAN 2020
.. ... ... ... ... ... ... ...
76 2001 2nd 13 30.50 30.37 30.16 Korda 27 APR 2001
77 2001 1st 3 30.72 30.72 30.55 Fly Fast 0 MAR 2001
[78 rows x 7 columns]
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Year quarter running dif.dogs average time avg win time best time Set by Set on
0 2020 2nd 76 35.71 35.14 34.65 Frieda Las Vegas 28 MAY 2020
1 2020 1st 76 35.77 35.21 34.72 Velocity Bettina 23 JAN 2020
.. ... ... ... ... ... ... ...
73 2001 2nd 1 35.49 35.49 35.49 Kissin Bobbie 24 MAY 2001
74 2001 1st 1 36.10 36.10 36.10 Brampton Blues 23 MAR 2001
[75 rows x 7 columns]
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Year quarter running dif.dogs average time avg win time best time Set by Set on
0 2020 2nd 33 42.73 42.08 41.62 Rasheda 28 MAY 2020
1 2020 1st 16 42.38 41.93 41.83 What About It 20 FEB 2020
.. ... ... ... ... ... ... ...
57 2001 3rd 2 42.57 42.53 42.53 Universal Tears * 16 AUG 2001
58 2001 2nd 4 42.24 42.27 42.15 Hotshow Vintage 14 JUN 2001
[59 rows x 7 columns]
----------------------------------------------------------------------------------------------------------------------------------------------------------------
편집 : 거리 열도 얻으려면 :
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = "http://www.greyhound-data.com/d?page=stadia&st=1011&land=au&stadiummode=3"
req = requests.get(url).text
soup = BeautifulSoup(req, 'lxml')
table = soup.find_all("table", attrs={'id': "green"})[-1]
trs, dfs, all_data, th = table.select('tr'), [], [], ''
header = ['Distance'] + [th.get_text(strip=True) for th in trs[0].select('th')]
for tr in trs[1:]:
if tr.td:
all_data.append([th] + [td.get_text(strip=True) for td in tr.select('td')])
else:
th = tr.th.get_text(strip=True)
if all_data:
dfs.append(pd.DataFrame(all_data, columns=header))
all_data = []
dfs.append(pd.DataFrame(all_data, columns=header))
# print all DataFrames in list:
for df in dfs:
print(df)
print('-' * 160)
인쇄물:
Distance Year quarter running dif.dogs average time avg win time best time Set by Set on
0 Distance: 331 m / 362 y 2020 2nd 226 19.63 19.18 18.79 Data Base 15 JUN 2020
1 Distance: 331 m / 362 y 2020 1st 255 19.68 19.14 18.58 Wazza Who 23 JAN 2020
.. ... ... ... ... ... ... ... ...
39 Distance: 331 m / 362 y 2010 3rd 286 19.85 19.34 18.90 Royal Surfer 15 SEP 2010
40 Distance: 331 m / 362 y 2010 2nd 92 20.01 19.57 19.28 Paw Form 16 JUN 2010
[41 rows x 8 columns]
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Distance Year quarter running dif.dogs average time avg win time best time Set by Set on
0 Distance: 395 m / 432 y 2020 2nd 217 23.40 22.79 22.25 Canya Cruise 3 JUN 2020
1 Distance: 395 m / 432 y 2020 1st 285 23.35 22.85 22.47 Dawn's Dream 22 JAN 2020
.. ... ... ... ... ... ... ... ...
65 Distance: 395 m / 432 y 2004 1st 3 23.54 23.25 23.25 Seismic Shock 9 JAN 2004
66 Distance: 395 m / 432 y 2003 4th 16 23.67 23.33 23.29 Far Away Places 17 OCT 2003
[67 rows x 8 columns]
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Distance Year quarter running dif.dogs average time avg win time best time Set by Set on
0 Distance: 520 m / 569 y 2020 2nd 264 30.68 30.13 29.56 Oh Mickey 23 APR 2020
1 Distance: 520 m / 569 y 2020 1st 224 30.70 30.12 29.41 Sennachie 10 JAN 2020
.. ... ... ... ... ... ... ... ...
76 Distance: 520 m / 569 y 2001 2nd 13 30.50 30.37 30.16 Korda 27 APR 2001
77 Distance: 520 m / 569 y 2001 1st 3 30.72 30.72 30.55 Fly Fast 0 MAR 2001
[78 rows x 8 columns]
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Distance Year quarter running dif.dogs average time avg win time best time Set by Set on
0 Distance: 600 m / 656 y 2020 2nd 76 35.71 35.14 34.65 Frieda Las Vegas 28 MAY 2020
1 Distance: 600 m / 656 y 2020 1st 76 35.77 35.21 34.72 Velocity Bettina 23 JAN 2020
.. ... ... ... ... ... ... ... ...
73 Distance: 600 m / 656 y 2001 2nd 1 35.49 35.49 35.49 Kissin Bobbie 24 MAY 2001
74 Distance: 600 m / 656 y 2001 1st 1 36.10 36.10 36.10 Brampton Blues 23 MAR 2001
[75 rows x 8 columns]
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Distance Year quarter running dif.dogs average time avg win time best time Set by Set on
0 Distance: 710 m / 776 y 2020 2nd 33 42.73 42.08 41.62 Rasheda 28 MAY 2020
1 Distance: 710 m / 776 y 2020 1st 16 42.38 41.93 41.83 What About It 20 FEB 2020
.. ... ... ... ... ... ... ... ...
57 Distance: 710 m / 776 y 2001 3rd 2 42.57 42.53 42.53 Universal Tears * 16 AUG 2001
58 Distance: 710 m / 776 y 2001 2nd 4 42.24 42.27 42.15 Hotshow Vintage 14 JUN 2001
[59 rows x 8 columns]
----------------------------------------------------------------------------------------------------------------------------------------------------------------
이 기사는 인터넷에서 수집됩니다. 재 인쇄 할 때 출처를 알려주십시오.
침해가 발생한 경우 연락 주시기 바랍니다[email protected] 삭제
몇 마디 만하겠습니다