A ruby wrapper for the Boilerpipe API.
Boilerpipe definition:
The boilerpipe library provides algorithms to detect and remove the surplus “clutter” (boilerplate, templates) around the main textual content of a web page.
For more information: http://code.google.com/p/boilerpipe/
The Boilerpipe module has only one method which is extract. Extract takes 2 parameters, first the url and second a hash.
The hash can have 3 options:
- output => :html, :htmlFragment, :text, :json, :debug
- extractor => :ArticleExtractor, :DefaultExtractor, :LargestContentExtractor, :KeepEverythingExtractor, :CanolaExtractor
- api: => The api url
None of these options are mandatory. To find out more about these options checkout the Boilerpipe API http://boilerpipe-web.appspot.com/
require "boilerpipe" Boilerpipe.extract("http://techcrunch.com/2011/05/12/karma-is-a-bitch/", {:output => :json})