How can we use Ruby to interact with, filter, or traverse HTML? Let's play around with Nokogiri.
gem install nokogiri
require 'nokogiri'
require 'open-uri'
html = open('https://www.turing.io')
doc = Nokogiri::HTML(html)
What is doc
? What does it represent? What information does it include?
require 'nokogiri'
require 'open-uri'
html = open('https://www.turing.io')
doc = Nokogiri::HTML(html)
images = doc.css('img')
What does the images
variable represent? How many images are there? What information are you given by Nokogiri about these images? Can you write a loop that gathers the src
of each image? Use the example below for reference:
doc.css('a').map do |a|
a['href']
end
require 'nokogiri'
require 'open-uri'
html = open('https://www.turing.io')
doc = Nokogiri::HTML(html)
div = doc.at_css('div')
divs = doc.css('div')
What is the difference between .at_css
and .css
?
require 'nokogiri'
require 'open-uri'
html = open('https://www.turing.io')
doc = Nokogiri::HTML(html)
var1 = doc.css('.field-type-text-with-summary')
var2 = doc.css('.field-type-text-with-summary p')
var3 = doc.css('.field-type-text-with-summary p').text
What is the difference between var1
, var2
, and var3
? What do '.mod-intro' and
'.mod-intro p' refer to?
What else can you do with Nokogiri?
Check out the Bastards Book of Ruby Nokogiri documentation.