如何阅读包含HTML的Lync对话文件？

debugcn 发表于 Dev

用户名

我在用C＃将本地文件读入字符串时遇到麻烦。

到目前为止，这是我想到的：

 string file = @"C:\script_test\{5461EC8C-89E6-40D1-8525-774340083829}.html";
 using (StreamReader reader = new StreamReader(file))
 {
      string line = "";
      while ((line = reader.ReadLine()) != null)
      {
           textBox1.Text += line.ToString();
      }
 }

这是唯一可行的解决方案。

我尝试了一些其他建议的方法来读取文件，例如：

string file = @"C:\script_test\{5461EC8C-89E6-40D1-8525-774340083829}.html";
string html = File.ReadAllText(file).ToString();
textBox1.Text += html;

但是它没有按预期工作。

这是我尝试读取的文件的前几行：

如您所见，它有一些时髦的字符，老实说，我不知道这是否是这种怪异行为的原因。

但是在第一种情况下，代码似乎跳过了这些行，仅打印“ Office Communicator生成的文档...”。

用户名

我不知道这是否是正确的答案，但是到目前为止，这是我设法做到的：

        string file = @"C:\script_test\{1C0365BC-54C6-4D31-A1C1-586C4575F9EA}.hist";
                    string outText = "";
        //Encoding iso = Encoding.GetEncoding("ISO-8859-1");
        Encoding utf8 = Encoding.UTF8;
        StreamReader reader = new StreamReader(file, utf8);
        char[] text = reader.ReadToEnd().ToCharArray();
        //skip first n chars
        /*
        for (int i = 250; i < text.Length; i++)
        {
            outText += text[i];
        }
        */
        for (int i = 0; i < text.Length; i++)
        {
            //skips non printable characters
            if (!Char.IsControl(text[i]))
            {
                outText += text[i];
            }
        }
        string source = "";
        source = WebUtility.HtmlDecode(outText);
        HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
        htmlDoc.LoadHtml(source);

        string html = "<html><style>";
        foreach (HtmlNode node in htmlDoc.DocumentNode.SelectNodes("//style"))
        {
            html += node.InnerHtml+ Environment.NewLine;
        }
        html += "</style><body>";
        foreach (HtmlNode node in htmlDoc.DocumentNode.SelectNodes("//body"))
        {
            html += node.InnerHtml + Environment.NewLine;
        }
        html += "</body></html>";
        richTextBox1.Text += html+Environment.NewLine;

        webBrowser1.DocumentText = html;