Skip to content

Conversation

@josephmckenna
Copy link
Contributor

Add scaling VarTransform functionality to TMVA preproccessing (like normalisation it linearly scales the data but the sign of the input and output data is retained).

I have added to the functionality of the VariableNormalizeTransform class in the style of the VariableGaussTransform class to transform data such that it remains in the range of [-1,1], there is no offset, so the sign of the input data is unchanged by the transformation.

This is proving essential for my neural network analyses that treat a detector hit data like an image classification problem and use ReLU activation functions at the beginning of my network.

I have also added a description to the TMVA documentation

@phsft-bot
Copy link

Can one of the admins verify this patch?

@josephmckenna josephmckenna changed the title [TMVA][Preprocessing] [TMVA][Preprocessing] - Additional normalisation method Aug 2, 2019
@lmoneta
Copy link
Member

lmoneta commented Sep 2, 2019

Very nice PR. Thank you very much for your contribution!
Could you also please provide a simple test, to be sure the transformation is doing the right thing ?
Thank you

@josephmckenna
Copy link
Contributor Author

Hi Imoneta,

Modifying tutorials/tmva/keras/ClassificationKeras.py to add an 'S' transformation
Line 28-29 becomes:

factory = TMVA.Factory('TMVAClassification', output,
                       '!V:!Silent:Color:DrawProgressBar:Transformations=D,G,S:AnalysisType=Classification')

Line 63-66 becomes:

factory.BookMethod(dataloader, TMVA.Types.kFisher, 'Fisher',
                   '!H:!V:Fisher:VarTransform=D,G,S')
factory.BookMethod(dataloader, TMVA.Types.kPyKeras, 'PyKeras',
                   'H:!V:VarTransform=D,G,S:FilenameModel=model.h5:NumEpochs=20:BatchSize=32')

Updated script attached:
ClassificationKerasScale.zip

Before the changes the output from running
cd $ROOTSYS/tutorials/tmva/keras
python ClassificationKeras.py &> DG.log
DG.log
and after:
python ClassificationKerasScale.py &> DGS.log
DGS.log

We can see that the training sample transformation is limited to be between -1 and 1:

TFHandler_PyKeras : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: 0.0015578 0.17520 [ -0.54435 1.0000 ]
: var2: 0.0013889 0.17448 [ -0.54435 1.0000 ]
: var3: 0.0013901 0.17452 [ -0.54435 1.0000 ]
: var4: 0.0012939 0.17410 [ -0.54435 1.0000 ]
: -----------------------------------------------------------

Scaling is working... is it being saved and loaded again ok? We can check the 'Test' phase in the same script since TMVA saves transformations to file, then loads them to re-apply to testing data

TFHandler_PyKeras : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: 0.0041504 0.17586 [ -0.52983 1.0000 ]
: var2: 0.0048056 0.17568 [ -0.52290 1.0000 ]
: var3: 0.0039114 0.17501 [ -1.0000 0.70855 ]
: var4:-0.00083735 0.17310 [ -1.0000 1.0000 ]
: -----------------------------------------------------------

The limits are no longer exactly -0.54435 to 1.0, but they linearly match the D,G transformation (see DG.log file). If we had a larger data sample the training and test transformations would have more similar ranges.

We can also see at the end of the training that the training and test data classification accuracy match each other, also showing the transformation is working:

: Testing efficiency compared to training efficiency (overtraining check)
: -------------------------------------------------------------------------------------------------------------------
: DataSet MVA Signal efficiency: from test sample (from training sample)
: Name: Method: @b=0.01 @b=0.10 @b=0.30
: -------------------------------------------------------------------------------------------------------------------
: dataset PyKeras : 0.263 (0.228) 0.680 (0.673) 0.904 (0.908)
: dataset Fisher : 0.229 (0.192) 0.645 (0.640) 0.893 (0.896)
: -------------------------------------------------------------------------------------------------------------------

Any questions please ask

Thank you

@couet couet removed their request for review February 12, 2021 07:29
@lmoneta
Copy link
Member

lmoneta commented Feb 12, 2021

@phsft-bot build

@phsft-bot
Copy link

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-fedora30/cxx14, ROOT-fedora31/noimt, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

Errors:

  • [2021-02-12T20:19:31.987Z] CMake Error at C:/build/workspace/root-pullrequests-build/rootspi/jenkins/root-build.cmake:1049 (ctest_start):

@josephmckenna
Copy link
Contributor Author

@phsft-bot build

@josephmckenna
Copy link
Contributor Author

@phsft-bot build

Ahha I dont have that power. I saw the build failed and that this branch was waay behind the master branch so simply merged in the current version of the protject-root/master

Copy link
Contributor

@sitongan sitongan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@josephmckenna
Copy link
Contributor Author

josephmckenna commented Jun 23, 2021

Is there anything I can do to help expediate this pull request?

@ferdymercury
Copy link
Collaborator

ferdymercury commented Feb 17, 2025

so simply merged in the current version of the protject-root/master

Please do instead a rebase with current master (e.g. with git rebase --interactive)

Is there anything I can do to help expediate this pull request?

Based on your tutorial above, create a test either in the roottest or tmva/test folders.

…y scales the data but the sign of the input and output data is retained).
@guitargeek
Copy link
Contributor

Thank you very much! Given that this PR got a positive review by a TMVA expert, I don't think it should be held back from merging.

@guitargeek guitargeek merged commit ffdab66 into root-project:master Oct 20, 2025
21 of 26 checks passed
{
TString UseOffsetOrNot;

gTools().ReadAttr(trfnode, "UseOffsetOrNot", UseOffsetOrNot );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fairly certain this breaks the reading of existing TMVA files that have not been written with the UseOffsetOrNot tag.

I currently get messages like:

<FATAL>                          : Trying to read non-existing attribute 'UseOffsetOrNot' from xml node 'Transform'

I am currently trying a simple fix locally and will open a PR once I have validated that works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! That is very kind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants