RSS 피드에서 뉴스를 얻으려고하는데 정확한 정보를 얻지 못하고 있습니다. 목표를 달성하기 위해 요청과 BeautifulSoup을 사용하고 있습니다. 다음 개체가 있습니다.
<item>
<title>
US making very good headway in respect to Covid-19 vaccines: Donald Trump
</title>
<description>
<a href="https://timesofindia.indiatimes.com/international/us/us-making-very-good-headway-in-respect-to-covid-19-vaccines-donald-trump/articleshow/76399892.cms"><img border="0" hspace="10" align="left" style="margin-top:3px;margin-right:5px;" src="https://timesofindia.indiatimes.com/photo/76399892.cms" /></a>Washington, Jun 16 () The United States is making very good headway in respect to vaccines for the coronavirus pandemic and also therapeutically, President Donald Trump has said.
</description>
<link>
https://timesofindia.indiatimes.com/international/us/us-making-very-good-headway-in-respect-to-covid-19-vaccines-donald-trump/articleshow/76399892.cms
</link>
<guid>
https://timesofindia.indiatimes.com/international/us/us-making-very-good-headway-in-respect-to-covid-19-vaccines-donald-trump/articleshow/76399892.cms
</guid>
<pubDate>
Mon, 15 Jun 2020 22:11:06 PT
</pubDate>
</item>
욕망 문제에 대한 코드는 여기 ..
def timesofindiaNews():
URL = 'https://timesofindia.indiatimes.com/rssfeeds_us/72258322.cms'
page = requests.get(URL)
soup = BeautifulSoup(page.content, features = 'xml')
# print(soup.prettify())
news_elems = soup.find_all('item')
news = []
print(news_elems[0].prettify())
for news_elem in news_elems:
title = news_elem.title.text
news_description = news_elem.description.text
image = news_elem.description.img
# news_date = news_elem.pubDate.text
news_link = news_elem.link.text
나는 태그에서 설명을 원하지만 설명에 필요하지 않은 더 많은 세부 정보가 포함되어 있습니다. 위의 코드는 다음과 같은 출력을 제공합니다.
{
"image": null,
"news_description": "<a href=\"https://timesofindia.indiatimes.com/international/us/us-making-very-good-headway-in-respect-to-covid-19-vaccines-donald-trump/articleshow/76399892.cms\"><img border=\"0\" hspace=\"10\" align=\"left\" style=\"margin-top:3px;margin-right:5px;\" src=\"https://timesofindia.indiatimes.com/photo/76399892.cms\" /></a>Washington, Jun 16 () The United States is making very good headway in respect to vaccines for the coronavirus pandemic and also therapeutically, President Donald Trump has said.",
"news_link": "https://timesofindia.indiatimes.com/international/us/us-making-very-good-headway-in-respect-to-covid-19-vaccines-donald-trump/articleshow/76399892.cms",
"source": "trucknews",
"title": "US making very good headway in respect to Covid-19 vaccines: Donald Trump"
}
예상 출력 ===>
{
"image": "image/link/from/the/description",
"news_description": "Washington, Jun 16 () The United States is making very good headway in respect to vaccines for the coronavirus pandemic and also therapeutically, President Donald Trump has said.",
"news_link": "https://timesofindia.indiatimes.com/international/us/us-making-very-good-headway-in-respect-to-covid-19-vaccines-donald-trump/articleshow/76399892.cms",
"source": "trucknews",
"title": "US making very good headway in respect to Covid-19 vaccines: Donald Trump"
}
< >
<
및로 변경되었습니다 >
. 그래서 나는 그것을 formatter=None
제어하기 위해 someting을 사용 하고 변경합니다 news_description
. 결과를 얻은 것 같습니다. 시도해 볼 수 있습니다.
import requests
from bs4 import BeautifulSoup
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}
def timesofindiaNews():
URL = 'https://timesofindia.indiatimes.com/rssfeeds_us/72258322.cms'
page = requests.get(URL,headers=headers)
soup = BeautifulSoup(page.text, 'xml')
# print(soup.prettify())
news_elems = soup.find_all('item')
news = []
# print(news_elems[0].prettify())
for news_elem in news_elems:
title = news_elem.title.text
n_description = news_elem.description
store = n_description.prettify(formatter=None)
sp = BeautifulSoup(store, 'xml')
news_description = sp.find("a").nextSibling
print(news_description)
# print(news_description)
image = news_elem.description.img
# news_date = news_elem.pubDate.text
news_link = news_elem.link.text
timesofindiaNews()
출력은 다음과 같습니다.
Washington, Jun 16 () The United States is making very good headway in respect to vaccines for the coronavirus pandemic and also therapeutically, President Donald Trump has said.
The proposed suspension could extend into the government's new fiscal year beginning October 1, when many new visas are issued, The Wall Street Journal reported on Thursday, quoting unnamed administration officials.
The team of researchers at the University of Georgia (UGA) in the US noted that the SARS-CoV-2 protein PLpro is essential for the replication and the ability of the virus to suppress host immune function.
After two weeks of protests over the death of George Floyd, hundreds of New Yorkers took to the streets again calling for reform in law enforcement and the withdrawal of police department funding.
Indian-origin California Senator Kamala Harris has joined former vice president and 2020 Democratic presidential nominee Joe Biden to raise USD 3.5 million for the upcoming November elections.
and so on....
이 기사는 인터넷에서 수집됩니다. 재 인쇄 할 때 출처를 알려주십시오.
침해가 발생한 경우 연락 주시기 바랍니다[email protected] 삭제
몇 마디 만하겠습니다