Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RF] Ideas for RooFit #6557

Open
2 of 24 tasks
hageboeck opened this issue Oct 5, 2020 · 9 comments
Open
2 of 24 tasks

[RF] Ideas for RooFit #6557

hageboeck opened this issue Oct 5, 2020 · 9 comments

Comments

@hageboeck
Copy link
Member

hageboeck commented Oct 5, 2020

  • Implement batch eval for Chi2 test stat
  • Implement recovery from disallowed regions for batch eval ([RF] Improve recovery from invalid function values #6401)
  • Implement getWeightBatch() and getBatches() for RooDataHist
  • Implement getBatch for RooTreeDataStore?
  • Don't clear all intermediate values in batch fits between fit cycles. Only the ones that changed.
  • Disable recalculateCache etc belonging to Lvl2 optimisation.
  • Use batch evaluation & inverted CDF for toys
  • Continue to improve interface with variadic templates
  • Profile and optimise new Batch interface
  • Modernise proxyList member of RooSimultaneous
  • Investigate if retrieving batch data with category states is better for batch evaluations. (vs. splitting composite datasets into components, and creating one NLL for each.)
  • Continue modernisation of RooSimultaneous. Requires rebasing and fixing an index bug in https://github.com/hageboeck/root/tree/updateRooSimultaneous
  • Implement analytical integration of RooJohnson.
  • Correct interface of RooAbsData and derived classes to use e.g. std::size_t for indexing events. int doesn't make sense.
  • Always have a debug version of RooFit around with -DROOFIT_CHECK_CACHED_VALUES.
  • Use analytic integrals in RooBinSamplingPdf when available.
  • Check that different integrator settings are honoured in RooBinSamplingPdf.
  • https://sft.its.cern.ch/jira/browse/ROOT-8304
  • Implement evaluateSpan() in classes relevant for HistFactory fits.
  • Throw Gaussian & Poisson constraints into dedicated fast class.
  • Switch on FastEvaluations topic in RooFit message streams, and use it to trace down PDFs that don't implement the faster interface.
  • [RF] Implement checking of parameter ranges #7210, slowly augment PDFs with checks of the definition range of parameters. This prevents evaluation errors and can stabilise fits.
  • [RF] Pythonisations for RooFit #7217, pythonisations for RooFit
  • Vectorized generation of events. Unless specialised generator functions are implemented, RooFit employs accept/reject sampling. Since this has to evaluate the PDF many times, one could think about using the batch interface to generate e.g. 2x the requested number of events, and do accept/reject on those. Repeat until enough events have been generated, and throw away the rest.
@guitargeek
Copy link
Contributor

guitargeek commented Dec 2, 2022

Roo(Stats,Fit) - Performance Improvements

Just for completeness, here are some more ideas listed by John Harvey on JIRA (ROOT-8647):

  • Improvement of runtime performance in presence of large models starting from real use cases, e.g. from CMS
  • Use multiprocess for RooStats calculators
  • Take advantage of vectorisation, parallelisation on CPUs and/or GPU where it makes sense
  • Removal of virtual functions (define models at compile time)

Except for the last one, these ideas are all covered by current development efforts.

@guitargeek
Copy link
Contributor

guitargeek commented Apr 4, 2023

Warning message on repeated named arguments

Emit warning message in RooCmdConfig when multiple named arguments of the same type are encountered, for which no chaining behavior is defined.

Originally suggested by Wouter Verkerke in ROOT-2784

@guitargeek
Copy link
Contributor

More efficient datasets

RooDataSet is very inefficient when loading values, as it only loads single values. RooDataSet could theoretically adopt memory from a std::vector, making importing a zero-copy operation.
Further, RDataFrame snapshots or numpy arrays could be imported.

Originally suggested by Stephan Hageboeck in ROOT-10366

@guitargeek
Copy link
Contributor

guitargeek commented Apr 4, 2023

RooFit should be able to plot unbinned data with TGraph[Errors]

When RooFit plots unbinned data, it automatically creates a histogram. However, data points might be scattered such that some bins might be empty (or others might be filled much more often, see attached plot) that the plot and normalisation of the curve might look wrong.

138f1681def151a43a6d2b370aa6c72ca60c50c0

Originally suggested by Stephan Hageboeck in ROOT-9878

@guitargeek
Copy link
Contributor

RooFit should be able to read data from a TGraph[Errors]

RooFit should be able to make an unbinned and weighted fit to data coming from a TGraph[Errors]. If errors are set, RooFit should automatically take care of weighting the data points correctly.

This would make problems like this one much easier:
https://root-forum.cern.ch/t/max-likelihood-fit-with-a-tgraphasymmerrors-basically-how-to-do-a-unbinned-likelihood-fit/31903/9

Originally suggested by Stephan Hageboeck in 9877

@guitargeek guitargeek pinned this issue Apr 4, 2023
@guitargeek guitargeek unpinned this issue Apr 4, 2023
@guitargeek
Copy link
Contributor

RooMultivariateGaussian doesn't allow the converiance matrix to be fit

The data type used to represent the covariance matrix in RooMultivariateGaussian is TMatrixDSym and not RooRealVar or RooListProxy and doesn't allow for it to be fit.

This is not a bug but a new feature request. There is alway the possibility to write in this case your own pdf.

This feature only got requested once in 2017, and in the request ticket there was no mention of a usecase. Without a clear usecase, we are not going to blindly implement features.

Originally suggested by Albert Bursche ROOT-9052

@guitargeek
Copy link
Contributor

Make the Offset() option the default in createNLL()

As discussed here in the forum:
https://root-forum.cern.ch/t/failing-chi2-fit/56309/7?u=jonas

@guitargeek
Copy link
Contributor

Multi-threaded generation/evaluation of toys

When generating toys to estimate uncertainties on parameters, each round of toys is independent of each other. These can be done in threads.

Possible interference with multi-threaded likelihood evaluations.

Originally suggested by Stephan Hageboeck in 9822

@guitargeek
Copy link
Contributor

Clarify and improve interface for multi-ranged simultaneous fits

See the following forum post:
https://root-forum.cern.ch/t/problem-with-simultaneous-fit-in-two-subranges-on-multiple-variables/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants