Get specific Tables with Html Agility Pack

Austin

I am having trouble getting some specific table's with HTML Agility Pack. I cannot change the actual HTML either, so I can't use other ID"s or Classes or anything.

Can someone show me how I would access each individual table of the following?

<table class="newTable">
      //table 1 contents
    <table border="0" cellpadding="3" cellspacing="2" width="100%">
         //table 1 - A contents
    </table>
</table>
<table border="0" cellpadding="0" cellspacing="0" class="newTable">
     //table 2 contents
    <table width="100%" border="0" cellspacing="2" cellpadding="0">
        //table 2 - A contents
    </table>
    <table width="100%" border="0" cellspacing="2" cellpadding="0">
       //table 2 - B contents
    </table>
    <table width="100%" cellspacing="2" cellpadding="0">
       //table 2 - C contents
    </table>
</table>
<table>
     //table 3 contents
</table>

Right now if I were to call the following

HtmlNode table = doc.DocumentNode.SelectSingleNode("//table");
foreach (var cell in table.SelectNodes("//tr/td"))
{
     string someVariable = cell.InnerText
}

I would go through everything. I want to be able to access tables differently to correlate where I am storing the data.

I have tried looking at something like

doc.DocumentNode.SelectNodes("//table[1]");

but using an index does not seem to work, when I try to specify a table with it, it still reads in all tables or none.

Same thing applies to this, it either does not work at all or gets everything.

foreach (var cell in table.SelectNodes("//table").Skip(some_number))
{
     string someVariable = cell.InnerText
}

I am using the NuGet package of HTML Agility Pack 1.4.9

EDIT:

My attempt to get ONLY Table 1 - A's contents. Both give null or endcodingfound exceptions.

HtmlNode table = doc.DocumentNode.SelectSingleNode("//table/tr/td/table[1]");

HtmlNode table = doc.DocumentNode.SelectSingleNode("//table[1]/tr/td/table[1]");

jessehouwing

The error is with your second call, the "//tr/td" will go back to the root element. Your indexer is the correct solution for the first part of your problem, the second can be fixed by specifying that you want to navigate from where you are at:

HtmlNode table = doc.DocumentNode.SelectSingleNode("//table[1]");
foreach (var cell in table.SelectNodes(".//tr/td")) // **notice the .**
{
     string someVariable = cell.InnerText
}

Not sure what else is going on, but by extending your test table to this code, the following just works on my test. It might mean that you need to share a little more context.

This is the Document I used for the tests:

<!DOCTYPE html>

<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
    <meta charset="utf-8" />
    <title></title>
</head>
<body>
    <table class="newTable">
        <tr>
            <td>
                <table border="0" cellpadding="3" cellspacing="2" width="100%">
                    <tr><td>
                        //table 1 - A contents
                    </td></tr>
                </table>
            </td>
        </tr>

    </table>
    <table border="0" cellpadding="0" cellspacing="0" class="newTable">
        <tr>
            <td>
                //table 2 contents
                <table width="100%" border="0" cellspacing="2" cellpadding="0">
                    <tr>
                        <td>
                            //table 2 - A contents
                        </td>
                    </tr>
                </table>
                <table width="100%" border="0" cellspacing="2" cellpadding="0">
                    <tr>
                        <td>
                            //table 2 - B contents
                        </td>
                    </tr>
                </table>
                <table width="100%" cellspacing="2" cellpadding="0">
                    <tr>
                        <td>
                            //table 2 - C contents
                        </td>
                    </tr>
                </table>
            </td>
        </tr>
    </table>
    <table>
        <tr>
            <td>
                //table 3 contents
            </td>
        </tr>
    </table>
</body>
</html>

And this the code to extract the values you're after:

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(text);

var node1A = doc.DocumentNode.SelectSingleNode("//table[1]//table[1]");
string content1A = node1A.InnerText;
Console.WriteLine(content1A);

var node2C = doc.DocumentNode.SelectSingleNode("//table[2]//table[3]");
string content2C = node2C.InnerText;
Console.WriteLine(content2C);

Shows:

enter image description here

Update

Ok, I took your actual HTML and I get a NullReference as well. There must be something that greatly confuses the Agility Pack, not sure why. Some experimentation with the Linq API seems to work though, I hope it can be an alternative for you:

var table = doc.DocumentNode.DescendantsAndSelf("table").Skip(1).First().Descendants("table").First();
var tds   = table.Descendants("td");

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Get specific Tables with Html Agility Pack

From Dev

Cannot get content of specific div with html agility pack

From Dev

Select specific html with "Html Agility pack"

From Dev

get title tag by html agility pack

From Dev

How get a custom tag with html agility pack?

From Dev

HTML agility pack get all divs with class

From Dev

Get HttpWebResponse from Html Agility Pack HtmlWeb

From Dev

how to to get childs node with 'html agility pack'

From Dev

how to get tr using html agility pack

From Dev

HTML agility pack get all divs with class

From Dev

how to get tr using html agility pack

From Dev

How to get all HTML tags that contains specific string in their attribute values using Html Agility Pack?

From Dev

Html Agility Pack Xpath

From Dev

Cookies HTML Agility Pack?

From Dev

Html Agility Pack Xpath

From Dev

Html Agility Pack, SelectSingleNode

From Dev

Using Html Agility Pack to capture inner text from a specific node

From Dev

HTML Agility pack why does not the HTML page get to string?

From Java

HTML Agility Pack cant get text content from div

From Dev

Get entire form element as string using Html Agility Pack

From Dev

HTML Agility Pack get all anchors' href attributes on page

From Dev

C# - Get the text inside tags using HTML Agility Pack

From Dev

HTML Agility Pack-Get always the first element details

From Dev

Html agility pack not loading url

From Dev

Html Agility Pack - New HtmlAttribute

From Dev

Html Agility Pack selecting subattributes

From Dev

Parsing html using agility pack

From Dev

Html Agility Pack - New HtmlAttribute

From Dev

Html Agility Pack c#