删除HTML标签

Jessica Fura 发表于 Dev

杰西卡·弗拉（Jessica Fura）

我有一位教授要求我们删除HTML标记（<和>中的任何内容）而不使用removeAll方法。

我目前有这个：

public static void main(String[] args)
        throws FileNotFoundException {
    Scanner input = new Scanner(new File("src/HTML_1.txt"));
    while (input.hasNext())
    {
        String html = input.next();
        System.out.println(stripHtmlTags(html));
    }

}

static String stripHtmlTags(String html)
{
    int i;
    String[] str = html.split("");
    String s = "";
    boolean tag = false;

    for (i = html.indexOf("<"); i < html.indexOf(">"); i++) 
    {
        tag = true;
    }

    if (!tag) 
    {
        for (i = 0; i < str.length; i++) 
        {
            s += str[i];
        }
    }
    return s;   
}

这是文件内部的内容：

<html>
<head>
<title>My web page</title>
</head>
<body>
<p>There are many pictures of my cat here,
as well as my <b>very cool</b> blog page,
which contains <font color="red">awesome
stuff about my trip to Vegas.</p>


Here's my cat now:<img src="cat.jpg">
</body>
</html>

输出结果如下所示：

My web page


There are many pictures of my cat here,
as well as my very cool blog page,
which contains awesome
stuff about my trip to Vegas.


Here's my cat now:

艾略特新鲜

`String` 在Java中是不可变的+您从不显示任何内容

我建议你close你的Scanner时候用它做（作为最佳实践），和读取HTML_1.txt从用户的主目录文件。最简单的方法close是try-with-resources像

public static void main(String[] args) {
    try (Scanner input = new Scanner(new File(
            System.getProperty("user.home"), "HTML_1.txt"))) {
        while (input.hasNextLine()) {
            String html = stripHtmlTags(input.nextLine().trim());
            if (!html.isEmpty()) { // <-- removes empty lines.
                System.out.println(html);
            }
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
}

因为String是不可变的，所以我建议StringBuilder您删除一个HTML标记，例如

static String stripHtmlTags(String html) {
    StringBuilder sb = new StringBuilder(html);
    int open;
    while ((open = sb.indexOf("<")) != -1) {
        int close = sb.indexOf(">", open + 1);
        sb.delete(open, close + 1);
    }
    return sb.toString();
}

当我运行上面的我得到

My web page
There are many pictures of my cat here,
as well as my very cool blog page,
which contains awesome
stuff about my trip to Vegas.
Here's my cat now:

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。