I have this code that implements ParserCallback
and converts HTML
emails to Plain
text. This code works fine when I parse email body like this =
"DO NOT REPLY TO THIS EMAIL MESSAGE. <br>---------------------------------------<br>\n" +
"nix<br>---------------------------------------<br> Esfghjdfkj\n" +
"</blockquote></div><br><br clear=\"all\"><div><br></div>-- <br><div dir=\"ltr\"><b>Regards <br>Nisj<br>Software Engineer<br></b><div><b>Bingo</b></div></div>\n" +
"</div>"
but when I parse this kinda email body, it returns null,
email = "<html><head><meta http-equiv=\"Content-Type\" content=\"text/html charset=us-ascii\"></head><body style=\"word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;\">Got it...so pls send to customer now.<div><br><div style=\"\"><div>On Nov 8, 2013, at 12:31 PM, <a href=\"mailto:xxxxxxx.com\">xxxxxxx.com</a> wrote:</div><br class=\"Apple-interchange-newline\"><blockquote type=\"cite\">Forwarding test.<br>---------------------------------------<br> ABCD.</blockquote></div><br></div></body></html>";
Code :
import java.io.IOException;
import java.io.StringReader;
import javax.swing.text.MutableAttributeSet;
import javax.swing.text.html.HTML.Attribute;
import javax.swing.text.html.HTML.Tag;
import javax.swing.text.html.HTMLEditorKit.Parser;
import javax.swing.text.html.HTMLEditorKit.ParserCallback;
import javax.swing.text.html.parser.ParserDelegator;
public class EmailBody {
public static void main(String[] args) throws IOException
{
String email = "";
class EmailCallback extends ParserCallback
{
private String body_;
private boolean divStarted_;
public String getBody()
{
return body_;
}
@Override
public void handleStartTag(Tag t, MutableAttributeSet a, int pos)
{
if (t.equals(Tag.DIV) && "ltr".equals(a.getAttribute(Attribute.DIR)))
{
divStarted_ = true;
}
}
@Override
public void handleEndTag(Tag t, int pos)
{
if (t.equals(Tag.DIV))
{
divStarted_ = false;
}
}
@Override
public void handleText(char[] data, int pos)
{
if (divStarted_)
{
body_ = new String(data);
}
}
}
EmailCallback callback = new EmailCallback();
Parser parser = new ParserDelegator();
StringReader reader = new StringReader(email);
parser.parse(reader, callback, true);
reader.close();
System.out.println(callback.getBody());
}
}
Can you tell the reason, why this is happening ?
You code will only take the element text from DIV
elements which have a dir
attribute with an ltr
value. The handleText
method will only handle the element text if the divStarted_
flag is true, which happens only if the handleStartTag
set this flag to true.
In the first email example you have such elements, in the second one you do not have them.
이 기사는 인터넷에서 수집됩니다. 재 인쇄 할 때 출처를 알려주십시오.
침해가 발생한 경우 연락 주시기 바랍니다[email protected] 삭제
몇 마디 만하겠습니다