无法使用Jsoup HTML解析器Java实现某些功能

Vishal Zanzrukia 发表于 Dev

维沙尔·赞祖鲁基亚（Vishal Zanzrukia）

我无法使用Jsoup Java库解析以下情况的一些文本。

1 ：This is My Text some other text as well non empty tag1 other text。

预期产量： some other text as well 

2 ：This is My Text some other text as well non empty tag2 other text。

预期产量： some other text as well 

3 ：This is My Text some other text as well non empty tag2 other text non empty tag3。

预期产量： some other text as well 

在这里，如果您注意到文本“我的文本”是固定的（静态），但是第二个非空（不要将空格视为值）的B标签值可能会有所不同。正则表达式应该能够提取到之后My Text的第一个非空标记之间的文本。

我正在使用Jsoup库，但无法实现上述预期输出。请确保该解决方案对于每种情况都应该是通用的，因为在我看来，该解决方案是动态的。

普什莫

简单的解决方案可能看起来像

查找您感兴趣的元素（包含您要查找的文本的元素）
遍历放置在其后的兄弟姐妹并打印它们，直到发现非空

您只需要记住，JsoupNode用来存储所有元素（包括不属于标签的文本），而Element类（extends Node）可能只包含特定的标签。

因此，例如文字

before <b>bold</b> after<i>italic</i>

将表示为

<node>before </node>
<element tag="B">
   <node>bold</node>
</element>
<node> after</node>
<element tag="I">
   <node>italic</node>
</element>

因此，例如，如果您select("b")（将找到<element tab="B">）并调用nextElementSibling()它，则会将您移至<element tag="I">。为了获得这一点，<node>after</node>您将需要使用nextSibling()不会消除简单文本节点的方法。

Node类的可能问题是它不提供text()可以生成当前节点文本内容的方法（这可以让我们测试当前节点/元素是否具有任何文本）。但是，没有什么能阻止我们强制转换提供这种方法的Node标记Element。

因此，我们的解决方案可能如下所示：

public static String findFragment(String html, String fixedStart) {

    Document doc = Jsoup.parse(html);
    Element myBTag = doc
            .select("b:matches(^" + Pattern.quote(fixedStart) + "$)")
            .first();

    StringBuilder sb = new StringBuilder();
    boolean foundNonEmpty = false;

    Node currentSibling = myBTag.nextSibling();
    while (currentSibling != null && !foundNonEmpty) {
        if (currentSibling.nodeName().equals("b")) {
            Element b = (Element) currentSibling;
            if (!b.text().trim().isEmpty())
                foundNonEmpty = true;
        }
        sb.append(currentSibling.toString());
        currentSibling = currentSibling.nextSibling();
    }

    return sb.toString();
}

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-03-1

我来说两句

0条评论

登录后参与评论

上一篇：OpenGL ES 2.0 Android Alpha显示为黑色

来自分类Dev

Related 相关文章

文章