Convert formatted email (HTML) to plain Text?

user96546

I have this code that implements ParserCallback and converts HTML emails to Plain text. This code works fine when I parse email body like this =

  "DO NOT REPLY TO THIS EMAIL MESSAGE.   <br>---------------------------------------<br>\n" +
                "nix<br>---------------------------------------<br> Esfghjdfkj\n" +
                "</blockquote></div><br><br clear=\"all\"><div><br></div>-- <br><div dir=\"ltr\"><b>Regards <br>Nisj<br>Software Engineer<br></b><div><b>Bingo</b></div></div>\n" +
                "</div>"

but when I parse this kinda email body, it returns null,

 email = "<html><head><meta http-equiv=\"Content-Type\" content=\"text/html charset=us-ascii\"></head><body style=\"word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;\">Got it...so pls send to customer now.<div><br><div style=\"\"><div>On Nov 8, 2013, at 12:31 PM, <a href=\"mailto:xxxxxxx.com\">xxxxxxx.com</a> wrote:</div><br class=\"Apple-interchange-newline\"><blockquote type=\"cite\">Forwarding test.<br>---------------------------------------<br> ABCD.</blockquote></div><br></div></body></html>";

Code :

import java.io.IOException;
import java.io.StringReader;

import javax.swing.text.MutableAttributeSet;
import javax.swing.text.html.HTML.Attribute;
import javax.swing.text.html.HTML.Tag;
import javax.swing.text.html.HTMLEditorKit.Parser;
import javax.swing.text.html.HTMLEditorKit.ParserCallback;
import javax.swing.text.html.parser.ParserDelegator;

public class EmailBody {
    public static void main(String[] args) throws IOException
    {
        String email = "";

        class EmailCallback extends ParserCallback
        {
            private String body_;
            private boolean divStarted_;

            public String getBody()
            {
                return body_;
            }

            @Override
            public void handleStartTag(Tag t, MutableAttributeSet a, int pos)
            {
                if (t.equals(Tag.DIV) && "ltr".equals(a.getAttribute(Attribute.DIR)))
                {
                    divStarted_ = true;
                }
            }

            @Override
            public void handleEndTag(Tag t, int pos)
            {
                if (t.equals(Tag.DIV))
                {
                    divStarted_ = false;
                }
            }

            @Override
            public void handleText(char[] data, int pos)
            {
                if (divStarted_)
                {
                    body_ = new String(data);
                }
            }
        }
        EmailCallback callback = new EmailCallback();
        Parser parser = new ParserDelegator();
        StringReader reader = new StringReader(email);
        parser.parse(reader, callback, true);
        reader.close();
        System.out.println(callback.getBody());
    }
}

Can you tell the reason, why this is happening ?

Dror Bereznitsky

You code will only take the element text from DIV elements which have a dir attribute with an ltr value. The handleText method will only handle the element text if the divStarted_ flag is true, which happens only if the handleStartTag set this flag to true.
In the first email example you have such elements, in the second one you do not have them.

이 기사는 인터넷에서 수집됩니다. 재 인쇄 할 때 출처를 알려주십시오.

침해가 발생한 경우 연락 주시기 바랍니다[email protected] 삭제

에서 수정
0

몇 마디 만하겠습니다

0리뷰
로그인참여 후 검토

관련 기사

분류에서Dev

Outlook 2010 Macro to Convert Selected Email Messages to Plain Text

분류에서Dev

Display email as plain-text/HTML depending on sender?

분류에서Dev

Html display formatted text

분류에서Dev

Convert range formatted as date to text

분류에서Dev

Send a text/plain email and prevent Outlook from collapsing lines

분류에서Dev

HTML and JavaScript - How to save plain text right

분류에서Dev

Convert a Bibtex class object to a series of text strings formatted for each citation

분류에서Dev

How to convert CodeSoft .lab files to parsable plain text?

분류에서Dev

Replacing plain text in HTML file, using js function

분류에서Dev

Is it possible for Scrapy to get plain text from raw HTML data?

분류에서Dev

What (if anything) am I supposed to do when I receive a plain-text email recall notification?

분류에서Dev

Sublime Text Markdown-to-HTML-email

분류에서Dev

이메일-text / plain 및 text / html의 대안

분류에서Dev

수신 메일을 text / plain에서 text / html로 수정

분류에서Dev

Set format as plain text

분류에서Dev

Convert formatted cstring number to long

분류에서Dev

모든 HTML 코드를 보여주는 MIME 유형 "text / plain"

분류에서Dev

text / plain, text / html의 알 수없는 형식 예외; 헤더 수락

분류에서Dev

Find a block of text in formatted text file

분류에서Dev

Python에서 HTML + plain_text 이메일로 PDF 첨부 파일 보내기

분류에서Dev

How to convert list of IDs formatted in varchar into int

분류에서Dev

Showing XAML formatted text in WPF TextBlock

분류에서Dev

Windows Phone 8에서 HTML_TEXT를 Plain_text로 표시하는 방법은 무엇입니까?

분류에서Dev

PLAIN_TEXT_TYPE 문제

분류에서Dev

Return php files as plain text except index

분류에서Dev

Only allow plain text in aspx textbook

분류에서Dev

Identifying password similarity without storing in plain text?

분류에서Dev

Replace plain-text properly using jQuery

분류에서Dev

HTML email sizing - Android

Related 관련 기사

  1. 1

    Outlook 2010 Macro to Convert Selected Email Messages to Plain Text

  2. 2

    Display email as plain-text/HTML depending on sender?

  3. 3

    Html display formatted text

  4. 4

    Convert range formatted as date to text

  5. 5

    Send a text/plain email and prevent Outlook from collapsing lines

  6. 6

    HTML and JavaScript - How to save plain text right

  7. 7

    Convert a Bibtex class object to a series of text strings formatted for each citation

  8. 8

    How to convert CodeSoft .lab files to parsable plain text?

  9. 9

    Replacing plain text in HTML file, using js function

  10. 10

    Is it possible for Scrapy to get plain text from raw HTML data?

  11. 11

    What (if anything) am I supposed to do when I receive a plain-text email recall notification?

  12. 12

    Sublime Text Markdown-to-HTML-email

  13. 13

    이메일-text / plain 및 text / html의 대안

  14. 14

    수신 메일을 text / plain에서 text / html로 수정

  15. 15

    Set format as plain text

  16. 16

    Convert formatted cstring number to long

  17. 17

    모든 HTML 코드를 보여주는 MIME 유형 "text / plain"

  18. 18

    text / plain, text / html의 알 수없는 형식 예외; 헤더 수락

  19. 19

    Find a block of text in formatted text file

  20. 20

    Python에서 HTML + plain_text 이메일로 PDF 첨부 파일 보내기

  21. 21

    How to convert list of IDs formatted in varchar into int

  22. 22

    Showing XAML formatted text in WPF TextBlock

  23. 23

    Windows Phone 8에서 HTML_TEXT를 Plain_text로 표시하는 방법은 무엇입니까?

  24. 24

    PLAIN_TEXT_TYPE 문제

  25. 25

    Return php files as plain text except index

  26. 26

    Only allow plain text in aspx textbook

  27. 27

    Identifying password similarity without storing in plain text?

  28. 28

    Replace plain-text properly using jQuery

  29. 29

    HTML email sizing - Android

뜨겁다태그

보관