我正在尝试将以下网站转换为数据框,以便可以处理数据:https : //www.ifsqn.com/forum/index.php/rss/forums/4-food-safety-quality-discussion/
在网上看到的所有地方,我只会看到如何将XML FILES转换为数据框。我尝试了以下操作,但由于它不是XML文件,因此无法使用。我自己可以做熊猫部分,但是首先,需要掌握一些数据。
import requests
import xml.etree.ElementTree as ET
headers = {'User-Agent': 'Mozilla/5.0'}
r = requests.get("https://www.ifsqn.com/forum/index.php/rss/forums/4-food-safety-quality-discussion/",headers=headers)
c = r.content
root = ET.parse(r).getroot()
print(root)
我在这里缺少将XML转换为可读格式以将数据转换为Pandas数据框的哪些步骤?
任何输入,不胜感激!
您要解析的XML是RSS,并且由于它具有特定的格式,因此您可以使用用于解析RSS feed的python库(例如feedparser)
import feedparser
import pandas as pd
parsed_rss = feedparser.parse('https://www.ifsqn.com/forum/index.php/rss/forums/4-food-safety-quality-discussion/')
pd.DataFrame(parsed_rss['entries'])
title title_detail ... id guidislink
0 Monitored vs Verifying Records {'type': 'text/plain', 'language': None, 'base... ... https://www.ifsqn.com/forum/index.php/topic/38... False
1 Is it necessary to follow the new ISO 22000 to... {'type': 'text/plain', 'language': None, 'base... ... https://www.ifsqn.com/forum/index.php/topic/38... False
2 usda inspector tagging product {'type': 'text/plain', 'language': None, 'base... ... https://www.ifsqn.com/forum/index.php/topic/38... False
3 Chocolate Liquor Discs {'type': 'text/plain', 'language': None, 'base... ... https://www.ifsqn.com/forum/index.php/topic/38... False
4 Multi-Pack Beef Sticks {'type': 'text/plain', 'language': None, 'base... ... https://www.ifsqn.com/forum/index.php/topic/38... False
.. ... ... ... ... ...
95 HACCP Pan for super critical fluid extraction ... {'type': 'text/plain', 'language': None, 'base... ... https://www.ifsqn.com/forum/index.php/topic/38... False
96 Illegal Drugs Pictured on Food Label {'type': 'text/plain', 'language': None, 'base... ... https://www.ifsqn.com/forum/index.php/topic/38... False
97 BRC metal can packaging compliance requirements {'type': 'text/plain', 'language': None, 'base... ... https://www.ifsqn.com/forum/index.php/topic/38... False
98 Codex Decision tree in ISO 22000:2018 - Clause... {'type': 'text/plain', 'language': None, 'base... ... https://www.ifsqn.com/forum/index.php/topic/38... False
99 BRC clause 4.3.4 - Battery Charging area {'type': 'text/plain', 'language': None, 'base... ... https://www.ifsqn.com/forum/index.php/topic/38... False
[100 rows x 10 columns]
另一种方法是自己将XML解析为可用于构造DataFrame的某种结构,此处为示例
编辑:
现在,我看到您通过了,r
而不是c
在以下行中:
root = ET.parse(r).getroot()
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句