How can I get everything within a div using Nokogiri?

mtrmilk

I am using Nokogiri to scrape a site that looks like this:

<div class="BOX">
  <div class="apple">This is an apple.</div>
  <p>Apple a day, doctor away</p>
</div>

<div class="BOX">
  <div class="iphone">This is an iPhone.</div>
  <div class="android">This is an Android.</div>
  <a href="www.apple.com">Apple home page</a>
  <p>Snoop Lion has both. He's rich.</p>
</div>

I would like to scrape everything within the "BOX" div. Each "BOX" has its own unique divs and HTML tags, with no apparent patterns. How would I do this?

My first attempt looked like this:

require 'uri-open'
require 'nokogiri'

doc = Nokogiri::HTML(open('http://www.examplesite.com'))
doc.css('BOX').each do |box|
  puts box.content
end

But it returns nothing. May I please have an explanation of what's going on?

Arup Rakshit

I think you should use #inner_html method instead of #content. Although your CSS class selector rule is wrong. The code should look like below :

require 'nokogiri'

doc = Nokogiri::HTML::Document.parse <<-eot
<div class="BOX">
  <div class="apple">This is an apple.</div>
  <p>Apple a day, doctor away</p>
</div>

<div class="BOX">
  <div class="iphone">This is an iPhone.</div>
  <div class="android">This is an Android.</div>
  <a href="www.apple.com">Apple home page</a>
  <p>Snoop Lion has both. Hes rich.</p>
</div>
eot

doc.css('.BOX').each do|n|
   p n.inner_html
end

output:

  <div class="apple">This is an apple.</div>
  <p>Apple a day, doctor away</p>

  <div class="iphone">This is an iPhone.</div>
  <div class="android">This is an Android.</div>
  <a href="www.apple.com">Apple home page</a>
  <p>Snoop Lion has both. He's rich.</p>

#content will give you all the text by removing the html wrapper inside the each div node.See below :

doc.css('.BOX').each do|n|
   puts n.content
end

output:

  This is an apple.
  Apple a day, doctor away

  This is an iPhone.
  This is an Android.
  Apple home page
  Snoop Lion has both. He's rich.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

How can I get everything within a div using Nokogiri?

From Dev

How do I get all the text within a tag using a Nokogiri CSS selector?

From Dev

How to get a <a> tag using Nokogiri

From Dev

How can i get the height of a dynamicly varying div using javascript?

From Dev

How can I get everything after the forward slash in my url?

From Dev

How can I get Html table row to default to no height when div within it has visibility:none

From Dev

How can I get my Chart.JS to center within a div?

From Dev

How can I dynamically wrap elements in a div within another div?

From Dev

How can I flip a Div when an image within that div is clicked?

From Dev

How can I vertical align a div within another div?

From Dev

How can I dynamically apply the "left" px to a div's width, within the same parent element, using jQuery

From Dev

Using jquery, how can i find if an image exists with a certain attribute that is within a div?

From Dev

How can I make my div spawn within 400 px of either side using jquery?

From Dev

How can I cache everything using HTML5 Appcache?

From Dev

How can I scrape with Nokogiri and cURB?

From Dev

How do I scrape HTML using Nokogiri?

From Dev

How can I get the state of child checkbox components within a parent ListView component using React-Native?

From Dev

How can I get a random plot of ellipses to be created using createElementNs, and be appended within a specific place of an svg?

From Dev

How can I get users within a given radius of distance using PFGeoPoint

From Dev

How can I use Nokogiri to get value inside element with certain id?

From Dev

How can I use Nokogiri to get value inside element with certain id?

From Dev

How can I extract a value from inside an xml tag using Nokogiri?

From Dev

How can I copy nodes from one xml file to another, using Nokogiri?

From Dev

How can I select the next element using Nokogiri without calling "next_element" method?

From Dev

How do I get the root element name of an XML document using Nokogiri?

From Dev

How can I get text value of a <div> element and displaying it in another div using ng click in AngularJS

From Dev

How can I fix the position of tooltips within a scrolling <div>?

From Dev

How can I transform this div without transforming the image within?

From Dev

how can I use substr within a HTML Div in a php echo

Related Related

  1. 1

    How can I get everything within a div using Nokogiri?

  2. 2

    How do I get all the text within a tag using a Nokogiri CSS selector?

  3. 3

    How to get a <a> tag using Nokogiri

  4. 4

    How can i get the height of a dynamicly varying div using javascript?

  5. 5

    How can I get everything after the forward slash in my url?

  6. 6

    How can I get Html table row to default to no height when div within it has visibility:none

  7. 7

    How can I get my Chart.JS to center within a div?

  8. 8

    How can I dynamically wrap elements in a div within another div?

  9. 9

    How can I flip a Div when an image within that div is clicked?

  10. 10

    How can I vertical align a div within another div?

  11. 11

    How can I dynamically apply the "left" px to a div's width, within the same parent element, using jQuery

  12. 12

    Using jquery, how can i find if an image exists with a certain attribute that is within a div?

  13. 13

    How can I make my div spawn within 400 px of either side using jquery?

  14. 14

    How can I cache everything using HTML5 Appcache?

  15. 15

    How can I scrape with Nokogiri and cURB?

  16. 16

    How do I scrape HTML using Nokogiri?

  17. 17

    How can I get the state of child checkbox components within a parent ListView component using React-Native?

  18. 18

    How can I get a random plot of ellipses to be created using createElementNs, and be appended within a specific place of an svg?

  19. 19

    How can I get users within a given radius of distance using PFGeoPoint

  20. 20

    How can I use Nokogiri to get value inside element with certain id?

  21. 21

    How can I use Nokogiri to get value inside element with certain id?

  22. 22

    How can I extract a value from inside an xml tag using Nokogiri?

  23. 23

    How can I copy nodes from one xml file to another, using Nokogiri?

  24. 24

    How can I select the next element using Nokogiri without calling "next_element" method?

  25. 25

    How do I get the root element name of an XML document using Nokogiri?

  26. 26

    How can I get text value of a <div> element and displaying it in another div using ng click in AngularJS

  27. 27

    How can I fix the position of tooltips within a scrolling <div>?

  28. 28

    How can I transform this div without transforming the image within?

  29. 29

    how can I use substr within a HTML Div in a php echo

HotTag

Archive