I'm writing some code to find absolute URLS of a single webpage:
http://explore.bfi.org.uk/4ce2b69ea7ef3
So far I get all the links of that page and print the absolute urls
Here is part of the code:
Elements hyperLinks = htmlDoc.select("a[href]");
for(Element link: hyperLinks)
{
System.out.println(link.attr("abs:href"));
}
This prints out alot or urls just like the one above. However, it seems to skip a few URLS aswell. The ones it skips are the ones I actually need.
This is one of the a[href] elements its not turning into the absolute URL:
<div class="title"><a href="/4ce2b69ea7ef3">Royal Review</a><br /></div>
It will print this line if I just print "link" but when I put "abs:href", it will just print blank.
I am new to Java and appreciate any feedback!
You shouldn't use "a[href]", use "a" instead following this example:
Document doc = Jsoup.connect("http://jsoup.org").get();
Element link = doc.select("a").first();
String relHref = link.attr("href"); // == "/"
String absHref = link.attr("abs:href"); // "http://jsoup.org/"
So in your case:
Elements hyperLinks = htmlDoc.select("a");
for(Element link: hyperLinks)
{
System.out.println(link.attr("abs:href"));
}
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句