Install
sudo gem install nokogiri
Contribute
github.com/tenderlove/nokogiri

An HTML, XML, SAX, & Reader parser with the ability to search documents via XPath or CSS3 selectors… and much more

Nokogiri

Class Nokogiri::HTML::SAX::Parser inherits from Nokogiri::XML::SAX::Parser

This class lets you perform SAX style parsing on HTML with HTML error correction.

Here is a basic usage example:

class MyDoc < Nokogiri::XML::SAX::Document
def start_element name, attributes = []
  puts "found a #{name}"
end
end

parser = Nokogiri::HTML::SAX::Parser.new(MyDoc.new)
parser.parse(File.read(ARGV[0], 'rb'))

For more information on SAX parsers, see Nokogiri::XML::SAX

Public Instance Methods

parse_file(filename, encoding = 'UTF-8') Show Source

Parse a file with filename

# File lib/nokogiri/html/sax/parser.rb, line 41 def parse_file filename, encoding = 'UTF-8' raise ArgumentError unless filename raise Errno::ENOENT unless File.exists?(filename) raise Errno::EISDIR if File.directory?(filename) ctx = ParserContext.file(filename, encoding) yield ctx if block_given? ctx.parse_with self end
parse_memory(data, encoding = 'UTF-8') Show Source

Parse html stored in data using encoding

# File lib/nokogiri/html/sax/parser.rb, line 31 def parse_memory data, encoding = 'UTF-8' raise ArgumentError unless data return unless data.length > 0 ctx = ParserContext.memory(data, encoding) yield ctx if block_given? ctx.parse_with self end