The commons-math3 distributions used in the reference data generator in the archetypes are really slow.
During a local test of an experiment suite on which I am working with @ggevay I am observing the following numbers for generating dataset.A with key cardinality 100000:
- with
Uniform key distribution, the job takes ~ 5 seconds
- with
Binomial key distribution, the job takes ~ 25 seconds
- with
Zipfian key distribution, the datagen job exceeded the allowed limit of 600 seconds.
The fix should be pushed to the peel-wordcount repository (see peelframework/peel-wordcount#1).