Skip to content

Distribution sampling for Zipf and Binomial is too slow #82

@aalexandrov

Description

@aalexandrov

The commons-math3 distributions used in the reference data generator in the archetypes are really slow.

During a local test of an experiment suite on which I am working with @ggevay I am observing the following numbers for generating dataset.A with key cardinality 100000:

  • with Uniform key distribution, the job takes ~ 5 seconds
  • with Binomial key distribution, the job takes ~ 25 seconds
  • with Zipfian key distribution, the datagen job exceeded the allowed limit of 600 seconds.

The fix should be pushed to the peel-wordcount repository (see peelframework/peel-wordcount#1).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions