python3.7을 사용하여 RSS 피드 뉴스 스크랩을하고 있습니다. 정확한 정보를 얻지 못했습니다. 적절한 데이터를 얻을 수 있도록 도와주세요

Mehul Dhariyaparmar

RSS 피드에서 뉴스를 얻으려고하는데 정확한 정보를 얻지 못하고 있습니다. 목표를 달성하기 위해 요청과 BeautifulSoup을 사용하고 있습니다. 다음 개체가 있습니다.

<item>
 <title>
  US making very good headway in respect to Covid-19 vaccines: Donald Trump
 </title>
 <description>
  <a href="https://timesofindia.indiatimes.com/international/us/us-making-very-good-headway-in-respect-to-covid-19-vaccines-donald-trump/articleshow/76399892.cms"><img border="0" hspace="10" align="left" style="margin-top:3px;margin-right:5px;" src="https://timesofindia.indiatimes.com/photo/76399892.cms" /></a>Washington, Jun 16 () The United States is making very good headway in respect to vaccines for the coronavirus pandemic and also therapeutically, President Donald Trump has said.
 </description>
 <link>
  https://timesofindia.indiatimes.com/international/us/us-making-very-good-headway-in-respect-to-covid-19-vaccines-donald-trump/articleshow/76399892.cms
 </link>
 <guid>
  https://timesofindia.indiatimes.com/international/us/us-making-very-good-headway-in-respect-to-covid-19-vaccines-donald-trump/articleshow/76399892.cms
 </guid>
 <pubDate>
  Mon, 15 Jun 2020 22:11:06 PT
 </pubDate>
</item>

욕망 문제에 대한 코드는 여기 ..

def timesofindiaNews():
    URL = 'https://timesofindia.indiatimes.com/rssfeeds_us/72258322.cms'

    page = requests.get(URL)
    soup = BeautifulSoup(page.content, features = 'xml')

    # print(soup.prettify())

    news_elems = soup.find_all('item')
    news = []
    print(news_elems[0].prettify())
    for news_elem in news_elems:

        title = news_elem.title.text
        news_description = news_elem.description.text       
        image = news_elem.description.img
        # news_date = news_elem.pubDate.text
        news_link = news_elem.link.text

나는 태그에서 설명을 원하지만 설명에 필요하지 않은 더 많은 세부 정보가 포함되어 있습니다. 위의 코드는 다음과 같은 출력을 제공합니다.

    {
      "image": null,
      "news_description": "<a href=\"https://timesofindia.indiatimes.com/international/us/us-making-very-good-headway-in-respect-to-covid-19-vaccines-donald-trump/articleshow/76399892.cms\"><img border=\"0\" hspace=\"10\" align=\"left\" style=\"margin-top:3px;margin-right:5px;\" src=\"https://timesofindia.indiatimes.com/photo/76399892.cms\" /></a>Washington, Jun 16 () The United States is making very good headway in respect to vaccines for the coronavirus pandemic and also therapeutically, President Donald Trump has said.",
      "news_link": "https://timesofindia.indiatimes.com/international/us/us-making-very-good-headway-in-respect-to-covid-19-vaccines-donald-trump/articleshow/76399892.cms",
      "source": "trucknews",
      "title": "US making very good headway in respect to Covid-19 vaccines: Donald Trump"
    }

예상 출력 ===>

    {
      "image": "image/link/from/the/description",
      "news_description": "Washington, Jun 16 () The United States is making very good headway in respect to vaccines for the coronavirus pandemic and also therapeutically, President Donald Trump has said.",
      "news_link": "https://timesofindia.indiatimes.com/international/us/us-making-very-good-headway-in-respect-to-covid-19-vaccines-donald-trump/articleshow/76399892.cms",
      "source": "trucknews",
      "title": "US making very good headway in respect to Covid-19 vaccines: Donald Trump"
    }
후마윤 아마드 라지브

< >&lt;및로 변경되었습니다 &gt. 그래서 나는 그것을 formatter=None제어하기 위해 someting을 사용 하고 변경합니다 news_description. 결과를 얻은 것 같습니다. 시도해 볼 수 있습니다.

import requests
from bs4 import BeautifulSoup
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}


def timesofindiaNews():
    URL = 'https://timesofindia.indiatimes.com/rssfeeds_us/72258322.cms'

    page = requests.get(URL,headers=headers)
    soup = BeautifulSoup(page.text, 'xml')

    # print(soup.prettify())

    news_elems = soup.find_all('item')
    news = []
    # print(news_elems[0].prettify())
    for news_elem in news_elems:

        title = news_elem.title.text
        n_description = news_elem.description
        store = n_description.prettify(formatter=None)
        sp = BeautifulSoup(store, 'xml')
        news_description = sp.find("a").nextSibling
        print(news_description)
        # print(news_description)
        image = news_elem.description.img
        # news_date = news_elem.pubDate.text
        news_link = news_elem.link.text


timesofindiaNews()

출력은 다음과 같습니다.

Washington, Jun 16 () The United States is making very good headway in respect to vaccines for the coronavirus pandemic and also therapeutically, President Donald Trump has said.

The proposed suspension could extend into the government's new fiscal year beginning October 1, when many new visas are issued, The Wall Street Journal reported on Thursday, quoting unnamed administration officials.

The team of researchers at the University of Georgia (UGA) in the US noted that the SARS-CoV-2 protein PLpro is essential for the replication and the ability of the virus to suppress host immune function.

After two weeks of protests over the death of George Floyd, hundreds of New Yorkers took to the streets again calling for reform in law enforcement and the withdrawal of police department funding.

Indian-origin California Senator Kamala Harris has joined former vice president and 2020 Democratic presidential nominee Joe Biden to raise USD 3.5 million for the upcoming November elections.


and so on....

이 기사는 인터넷에서 수집됩니다. 재 인쇄 할 때 출처를 알려주십시오.

침해가 발생한 경우 연락 주시기 바랍니다[email protected] 삭제

에서 수정
0

몇 마디 만하겠습니다

0리뷰
로그인참여 후 검토

관련 기사

Related 관련 기사

뜨겁다태그

보관