我有一位教授要求我们删除HTML标记(<和>中的任何内容)而不使用removeAll方法。
我目前有这个:
public static void main(String[] args)
throws FileNotFoundException {
Scanner input = new Scanner(new File("src/HTML_1.txt"));
while (input.hasNext())
{
String html = input.next();
System.out.println(stripHtmlTags(html));
}
}
static String stripHtmlTags(String html)
{
int i;
String[] str = html.split("");
String s = "";
boolean tag = false;
for (i = html.indexOf("<"); i < html.indexOf(">"); i++)
{
tag = true;
}
if (!tag)
{
for (i = 0; i < str.length; i++)
{
s += str[i];
}
}
return s;
}
这是文件内部的内容:
<html>
<head>
<title>My web page</title>
</head>
<body>
<p>There are many pictures of my cat here,
as well as my <b>very cool</b> blog page,
which contains <font color="red">awesome
stuff about my trip to Vegas.</p>
Here's my cat now:<img src="cat.jpg">
</body>
</html>
输出结果如下所示:
My web page
There are many pictures of my cat here,
as well as my very cool blog page,
which contains awesome
stuff about my trip to Vegas.
Here's my cat now:
String
在Java中是不可变的+您从不显示任何内容我建议你close
你的Scanner
时候用它做(作为最佳实践),和读取HTML_1.txt
从用户的主目录文件。最简单的方法close
是try-with-resources
像
public static void main(String[] args) {
try (Scanner input = new Scanner(new File(
System.getProperty("user.home"), "HTML_1.txt"))) {
while (input.hasNextLine()) {
String html = stripHtmlTags(input.nextLine().trim());
if (!html.isEmpty()) { // <-- removes empty lines.
System.out.println(html);
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
因为String
是不可变的,所以我建议StringBuilder
您删除一个HTML标记,例如
static String stripHtmlTags(String html) {
StringBuilder sb = new StringBuilder(html);
int open;
while ((open = sb.indexOf("<")) != -1) {
int close = sb.indexOf(">", open + 1);
sb.delete(open, close + 1);
}
return sb.toString();
}
当我运行上面的我得到
My web page
There are many pictures of my cat here,
as well as my very cool blog page,
which contains awesome
stuff about my trip to Vegas.
Here's my cat now:
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句