Web Scraping Script in Ruby

Web scrapping is possible with a ruby. Here, I create a library that will scrap ruby gems and get latest version of the gem searched.

Web Scraping Script in Ruby

I was working on a project and I had to scrape a web page, so I look into the options and I found Nokogiri.

Nokogiri is an HTML, XML, SAX, and Reader parser. Among Nokogiri's many features is the ability to search documents via XPath or CSS3 selectors.

To get the document, I used HTTParty.

HTTParty makes http fun! Moreover, makes consuming restful web services dead easy.

For this example, I will be scrapping https://rubygems.org/search?query=%s.

Script

The final script is given below:

require 'HTTParty'
require 'Nokogiri'

class RubygemsScrapper
  attr_accessor :parse_page

  # initialize repo for ruby gems requires query string
  def initialize(q)
    doc = HTTParty.get("https://rubygems.org/search?query=#{q}")
    @parse_page ||= Nokogiri::HTML(doc)
  end

  # get the first result's version or if not found returns -1
  def get_latest_version
    begin
      parse_page.css('.gems__gem').css('.gems__gem__version').children[0].text
    rescue
      -1 # Not found
    end
  end

  # get the first result's link to ruby gems org or if not found returns -1
  def get_link
    begin
      "https://rubygems.org" + parse_page.css('.gems__gem').attribute('href').value
    rescue
      -1 # not found
    end
  end

  # Calling scrapper
  scrapper = RubygemsScrapper.new('yiya')
  p scrapper.get_latest_version
  p scrapper.get_link
end

This class would get the name of the gem to be searched and returns the first element’s latest version and link to it.