cheat sheets.

$ command line ruby cheat sheets
require 'nokogiri'
require 'open-uri'

# Get a Nokogiri::HTML:Document for the page we're interested in...

doc = Nokogiri::HTML(open('http://www.google.com/search?q=tenderlove'))

# Do funky things with it using Nokogiri::XML::Node methods...

####
# Search for nodes by css
doc.css('h3.r a.l').each do |link|
    puts link.content
end

doc.at_css('h3').content

####
# Search for nodes by xpath
doc.xpath('//h3/a[@class="l"]').each do |link|
    puts link.content
end

####
# Or mix and match.
doc.search('h3.r a.l', '//h3/a[@class="l"]').each do |link|
    puts link.content
end

####
# Work with attributes
xml = "<foo wam='bam'>bar</foo>"

doc = Nokogiri::XML(xml)
doc.at_css("foo").content => "bar"
doc.at_css("foo")["wam"].content => "bam"

####
# Work with elements
el = doc.at_css("foo")
el.children  # => array of elements

####

So for example if we wanted to know all the names of the food items in our
document we simply say:
> doc.xpath("//name").collect(&:text)
=> ["carrot", "tomato", "corn", "grapes", "orange", "pear", "apple"]

If we were interested in the entire node we could leave off the
.collect(&:text). What if we wanted to select all the names of food items that
were best baked?  This requires us to use what’s called an axis – we will
first need to find the element “baked” but then go back up our XML elements to
find which food the item is inside.
> doc.xpath("//tag[text()='baked']/ancestor::node()/name").collect(&:text)
=> ["pear", "apple"]

What if we were only interested in vegetables that were good for roasting? 
Just add //veggies:
>
doc.xpath("//veggies//tag[text()='roasted']/ancestor::node()/name").collect(&:t
xt)
=> ["carrot", "tomato"]

What about if we wanted to know all the tags ‘corn’ had?  Again this is very
easy:
> doc.xpath("//name[text()='corn']/../tags/tag").collect(&:text)
=> ["raw", "boiled", "grilled"]

We can even do searches matching the first character.  Let’s say we wanted to
know all the food items that started with the letter ‘c’:
> doc.xpath("//name[starts-with(text(),'c')]").collect(&:text)
=> ["carrot", "corn"]

You could also use [contains(text(),'rot'] and get back just carrot, useful
when you want to do a partial match.

####
# Traversion

node.ancestors                        # Ancestors for <node>
node.at('xpath')                      # Returns node at given XPATH
node.at_css('selector')               # Returns node at given CSS selector

node.xpath('xpath')                   # Returns nodes at given XPATH
node.css('selector')                  # Returns nodes at given selector

node.child                            # Returns the child node
node.children                         # Returns child nodes
node.parent

####
# Data manipulation

node.name                             # Element name
node.node_type

node.content                          # Returns text as string
                                      # (aka: .inner_text, .text)
node.content = '...'

node.inner_html
node.inner_html = '...'

node.attribute_nodes                  # Returns attributes as nodes
node.attributes                       # Returns attributes as hash

####
# Tree manipulation

node.add_next_sibling(other)          # Place <other> after <node>
node.add_previous_sibling(other)      # Place <other> before <node>
node.add_child(other)                 # Put <other> inside <node>

node.after(data)                      # Put a new node after <node>
node.before(data)                     # Put a new node before <node>

node.parent = other                   # Reparents <node> inside <other>