带有xpath的Python2 Scrape html

Catalin 发表于 Dev

卡塔利娜岛

考虑其中有3个表的html页面。

我想遍历每个表，并同时打印一些内容（如果内容与我想要的内容相对应）。

我需要跟踪我所在的桌子。

如您在下面的代码中看到的，我有page一个html字符串变量。

我可以一次（在数组中）返回所有表中的内容。

我想遍历他们。

import __future__
from lxml import html
import requests
from bs4 import BeautifulSoup

page = """
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>cv</title>
</head>
<body>

    <table>
        <tr>
            <td>table1 td1</td>
            <td>table1 td2</td>
        </tr>
    </table>

    <table>
        <tr>
            <td>table2 td1</td>
            <td>table2 td2</td>
        </tr>
    </table>

    <table>
        <tr>
            <td>table3 td1</td>
            <td>table3 td2</td>
        </tr>
    </table>

</body>
</html>
"""

soup = str(BeautifulSoup(page, 'html.parser'))

tree = html.fromstring(soup)

tds = tree.xpath('//table/tr/td/text()')

for td in tds:
    print(td + '\n')

print('Ready !!')

眼睛

您是说需要单独处理每个表？

for table in tree.xpath(".//table"):
    print("---  new table: ---")
    for td in table.xpath(".//td"):
        print(td)

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-02-28

我来说两句

0条评论

登录后参与评论

上一篇：编译后Visual Studio冻结。当我尝试启动应用程序时，资源管理器也是如此

来自分类Dev

Python 3 Scrape黄页

来自分类Dev

带有请求和beautifulsoup的Python Scrape

来自分类Dev

带有Python请求库的Cant Scrape网页

来自分类Dev

Python + Selenium Scrape错误：ElementNotVisibleException

来自分类Dev

Python Web Scrape：删除输出中多余的 HTML 标签。所有数据都来自页表，get_text 和 pretiffy 不起作用

来自分类Dev

删除简单的html dom scrape变量中的特定图像

来自分类Dev

Python Web Scrape将输出写入文件

来自分类Dev

Scrape Wikipedia使用Python，精美汤

来自分类Dev

Beautifulsoup Python Youtube Scrape无法正常工作

来自分类Dev

来自网站的python selenium scrape href（链接）

来自分类Dev

每个帖子的Python Scrape论坛标题

来自分类Dev

Python Web Scrape将输出写入文件

来自分类Dev

解码python2中的html实体

来自分类Dev

带有类别名称的表的Python scrape网站w / BeautifulSoup4 shwoing属性错误

来自分类Dev

python-使用带有xpath语法的lxml.html解析html表单

来自分类Dev

带有HTML的Python图形

来自分类Dev

带有HTML的Python图形

来自分类Dev

用Python从Unicode Web Scrape输出ascii文件

来自分类Dev

Python Web scrape使用后端json数据文件

来自分类Dev

使用Python的Google Scrape中错误的结果数

来自分类Dev

XPath查找带有HTML换行符的元素

来自分类Dev

带有 Html Agility Pack 的 C# XPath 返回 null

来自分类Dev

带有 Nodemailer 的 HTML2PDF

来自分类Dev

带有python漂亮汤的HTML表

来自分类Dev

Python Web Scrape using Beautiful Soup - 从页面返回所有产品详细信息

来自分类Dev

带有type = html和html实体的xml的scrapy xpath解决方案

来自分类Dev

带有type = html和html实体的xml的scrapy xpath解决方案

来自分类Dev

Python Scrape：列表索引必须是整数或切片，而不是 str

来自分类Dev

Web Scrape python，在网站上提交搜索表单不更改网址

Related 相关文章

文章