使用美丽汤提取

RakeshKirola 发表于 Dev

拉克什·科罗拉（RakeshKirola）

我想从网站上获取股票价格：http ://www.bseindia.com/例如，股票价格显示为“ S＆P BSE：25,489.57”。我想获取其数字部分为“ 25489.57”

这是我到目前为止编写的代码，它将获取显示该金额而不是金额的整个div。

下面是代码：

from bs4 import BeautifulSoup
from urllib.request import urlopen



page = "http://www.bseindia.com"

html_page = urlopen(page)

html_text = html_page.read()
soup = BeautifulSoup(html_text,"html.parser")
divtag = soup.find_all("div",{"class":"sensexquotearea"})
for oye in divtag:
    tdidTags = oye.find_all("div", {"class": "sensexvalue2"})

    for tag in tdidTags:
        tdTags = tag.find_all("div",{"class":"newsensexvaluearea"})
        for newtag in tdTags:
            tdnewtags = newtag.find_all("div",{"class":"sensextext"})
            for rakesh in tdnewtags:
                tdtdsp1 = rakesh.find_all("div",{"id":"tdsp"})
                for texts in tdtdsp1:
                    print(texts)

基廷厄

我浏览了该页面加载信息时的情况，并且能够模拟javascript在python中的功能。

我发现它正在引用一个称为“请在此处检查”的页面IndexMovers.aspx?ln=en

看起来此页面是用逗号分隔的事物列表。首先是名称，其次是价格，然后是您不关心的其他几件事。

为了在python中进行模拟，我们请求页面，将其用逗号分隔，然后通读列表中的第6个值，并将该值和该值之后的值添加到一个名为stockInformation的新列表中。

现在我们可以循环浏览股票信息，并使用item[0]和获得价格item[1]

import requests

newUrl = "http://www.bseindia.com/Msource/IndexMovers.aspx?ln=en"
response = requests.get(newUrl).text
commaItems = response.split(",")


#create list of stocks, each one containing information
#index 0 is the name, index 1 is the price
#the last item is not included because for some reason it has no price info on indexMovers page
stockInformation = []
for i, item in enumerate(commaItems[:-1]):
    if i % 6 == 0:
        newList = [item, commaItems[i+1]]
        stockInformation.append(newList)


#print each item and its price from your list
for item in stockInformation:
    print(item[0], "has a price of", item[1])

打印输出：

S&P BSE SENSEX has a price of 25489.57
SENSEX#S&P BSE 100 has a price of 7944.50
BSE-100#S&P BSE 200 has a price of 3315.87
BSE-200#S&P BSE MidCap has a price of 11156.07
MIDCAP#S&P BSE SmallCap has a price of 11113.30
SMLCAP#S&P BSE 500 has a price of 10399.54
BSE-500#S&P BSE GREENEX has a price of 2234.30
GREENX#S&P BSE CARBONEX has a price of 1283.85
CARBON#S&P BSE India Infrastructure Index has a price of 152.35
INFRA#S&P BSE CPSE has a price of 1190.25
CPSE#S&P BSE IPO has a price of 3038.32
#and many more... (total of 40 items)

显然等同于页面上显示的值