Skip to content

CSV parsing performance is poor #3348

@chuckremes

Description

@chuckremes

JRuby 9.0.1.0 is a few orders of magnitude slower than MRI 2.2.3 and RBX 2.5.8 in a simple CSV parsing experiment. Code to follow:

require 'benchmark'
require 'csv'

FILE = "./testfile.csv"

Benchmark.bmbm(22) do |x|
  x.report('parse CSV') do
    io = File.open(FILE)
    csv = CSV.new(io, :converters => :all)

    while line = csv.gets
    end

    io.close
  end
end

On my system (OS X 10.10.5, 24GB RAM, 1TB SSD), here are the results of the benchmarks for each runtime:

GuestOSX:options_database cremes$ ruby -v
ruby 2.2.3p173 (2015-08-18 revision 51636) [x86_64-darwin14]
GuestOSX:options_database cremes$ ruby benchmarks.rb 
Rehearsal ----------------------------------------------------------
parse CSV                7.910000   0.010000   7.920000 (  7.924954)
------------------------------------------------- total: 7.920000sec

                             user     system      total        real
parse CSV                8.040000   0.020000   8.060000 (  8.053098)

GuestOSX:options_database cremes$ chruby jruby
GuestOSX:options_database cremes$ ruby -v
jruby 9.0.1.0 (2.2.2) 2015-09-02 583f336 Java HotSpot(TM) 64-Bit Server VM 25.60-b23 on 1.8.0_60-b27 +jit [darwin-x86_64]
GuestOSX:options_database cremes$ ruby -J-Xmx2g benchmarks.rb 
Rehearsal ----------------------------------------------------------
parse CSV              238.740000   1.480000 240.220000 (233.007993)
----------------------------------------------- total: 240.220000sec

                             user     system      total        real
parse CSV              229.210000   1.080000 230.290000 (227.212325)
GuestOSX:options_database cremes$ chruby rbx
GuestOSX:options_database cremes$ ruby -v
rubinius 2.5.8 (2.1.0 bef51ae3 2015-09-24 3.5.1 JI) [x86_64-darwin14.5.0]
GuestOSX:options_database cremes$ ruby benchmarks.rb 
Rehearsal ----------------------------------------------------------
parse CSV               16.264571   0.161624  16.426195 ( 10.562584)
------------------------------------------------ total: 16.426195sec

                             user     system      total        real
parse CSV                9.084859   0.033108   9.117967 (  9.010402)

The JRuby runtime never went above 250MB of RAM usage, so it doesn't appear to be memory pressure.

An earlier attempt at deducing the cause had Kernel.Integer and Kernel.Float show up as heavy hitters on the profile.

The test file is available at this link:
https://www.dropbox.com/s/l5lze28kpd7dx8u/testfile.zip?dl=0

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions