Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

6.26 cannot read file written with 6.30.4 #14793

Closed
1 task done
chrisburr opened this issue Feb 22, 2024 · 17 comments · Fixed by #15006
Closed
1 task done

6.26 cannot read file written with 6.30.4 #14793

chrisburr opened this issue Feb 22, 2024 · 17 comments · Fixed by #15006
Assignees
Labels
bug experiment Affects an experiment / reported by its software & computimng experts priority:high

Comments

@chrisburr
Copy link
Member

chrisburr commented Feb 22, 2024

Check duplicate issues.

  • Checked for duplicates

Description

See also #14504 (comment)

I have a file which was written in LHCb with 6.30.4 fails to read with 6.24/06 (and 6.24/08).

cc @pikacic @wdconinc

ROOT Version: 6.30/04
Attaching file /scratching/lhcb-data/lhcb/MC/Dev/TUPLE.ROOT/00213804/0000/00213804_00000038_3.tuple.root as _file0...
(TFile *) 0x5608393b7e20


ROOT Version: 6.28/12
Attaching file /scratching/lhcb-data/lhcb/MC/Dev/TUPLE.ROOT/00213804/0000/00213804_00000038_3.tuple.root as _file0...
(TFile *) 0x55bca8869180


ROOT Version: 6.26/10
Attaching file /scratching/lhcb-data/lhcb/MC/Dev/TUPLE.ROOT/00213804/0000/00213804_00000038_3.tuple.root as _file0...
(TFile *) 0x560486f2d900


ROOT Version: 6.24/06
Attaching file /scratching/lhcb-data/lhcb/MC/Dev/TUPLE.ROOT/00213804/0000/00213804_00000038_3.tuple.root as _file0...
Error in <TList::Clear>: A list is accessing an object (0x558b19a500b0) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x558b19b74080) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x558b19b76bf0) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x558b19b80a80) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x558b19b80df0) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x558b19b81320) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x558b19b817c0) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x558b19b81b90) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x558b19b82030) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x558b19b82260) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x558b19bacc50) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x558b19bb6580) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x558b19bb6f70) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x558b19bb73d0) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x558b19bbad50) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x558b19bc5b80) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x558b19bc5eb0) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x558b19bc8710) already deleted (list name = TList)
(TFile *) 0x558b199f9480

Reproducer

Using: https://cburr.web.cern.ch/root-crash.root

This is enough to see warnings:

root -l -b -q myfile.root

Using hadd results in a segfault:

hadd -f505 /tmp/test.root myfile.root

The lines below might hint at the cause of the crash.
You may get help by asking at the ROOT forum https://root.cern.ch/forum
Only if you are really convinced it is a bug in ROOT then please submit a
report at https://root.cern.ch/bugs Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#6  0x00007f2d879f62ee in TList::FindObject(TObject const*) const () from /home/cburr/miniconda3/envs/test4/bin/../lib/libCore.so.6.24
#7  0x00007f2d879f354a in THashTable::FindObject(TObject const*) const () from /home/cburr/miniconda3/envs/test4/bin/../lib/libCore.so.6.24
#8  0x00007f2d879f1739 in THashList::Remove(TObject*) () from /home/cburr/miniconda3/envs/test4/bin/../lib/libCore.so.6.24
#9  0x00007f2d883c69c8 in TTree::~TTree() () from /home/cburr/miniconda3/envs/test4/bin/../lib/libTree.so.6.24
#10 0x00007f2d883c712a in TTree::~TTree() () from /home/cburr/miniconda3/envs/test4/bin/../lib/libTree.so.6.24
#11 0x00007f2d87ec9eb0 in (anonymous namespace)::WriteOneAndDelete(TString const&, TClass*, TObject*, bool, bool, TDirectory*) () from /home/cburr/miniconda3/envs/test4/bin/../lib/libRIO.so.6.24
#12 0x00007f2d87ecc318 in TFileMerger::MergeOne(TDirectory*, TList*, int, TFileMergeInfo&, TString&, THashList&, bool&, bool&, TString const&, TDirectory*, TFile*, TKey*, TObject*, TIter&) () from /home/cburr/miniconda3/envs/test4/bin/../lib/libRIO.so.6.24
#13 0x00007f2d87ece757 in TFileMerger::MergeRecursive(TDirectory*, TList*, int) () from /home/cburr/miniconda3/envs/test4/bin/../lib/libRIO.so.6.24
#14 0x00007f2d87ecdae0 in TFileMerger::PartialMerge(int) () from /home/cburr/miniconda3/envs/test4/bin/../lib/libRIO.so.6.24
#15 0x000056019ca7ba3d in main ()
===========================================================

ROOT version

See above.

Installation method

Botg LCG and conda builds exhibit the issue

Operating system

Linux

Additional context

No response

@chrisburr chrisburr added the bug label Feb 22, 2024
@hahnjo hahnjo added priority:high experiment Affects an experiment / reported by its software & computimng experts labels Feb 22, 2024
@iarspider
Copy link
Contributor

iarspider commented Feb 22, 2024

@hahnjo in CMS we saw this issue as well, fixed by rebuilding old ROOT versions with this commit - a434281 .

@wdconinc
Copy link
Contributor

wdconinc commented Mar 7, 2024

@iarspider Did your tests indicate that this is a regression introduced by that commit specifically, or that this issue was simply not there yet at the time of that commit and it could be any commit since then?

@iarspider
Copy link
Contributor

@wdconinc sorry, pinged wrong person. The mentioned commit (aka PR #12845) contains the fix, but I don't know when that incompatibility was introduced.

@wdconinc
Copy link
Contributor

wdconinc commented Mar 7, 2024

Thanks, @iarspider. If I can summarize then:

  • the change in 6.30 was intentional,
  • the change broke forward compatibility in older ROOT versions,
  • in some cases we are waiting for new bugfix releases on older versions (i.e. 6.28),
  • in other cases we are waiting for the bugfix to be merged in older versions as well (i.e. 6.26 and earlier).

Is that an accurate summary? If so, do we have an estimate when the next 6.28 patch release will be, which will include this fix? Does this issue merit being summarized as a pinned issue at the top of the GitHub issues?

@iarspider
Copy link
Contributor

@wdconinc disclaimer: I am not part of ROOT team, so my interpretation could be wrong.

  • the change in 6.30 was intentional,
  • the change broke forward compatibility in older ROOT versions,

Yes

  • in some cases we are waiting for new bugfix releases on older versions (i.e. 6.28),
  • in other cases we are waiting for the bugfix to be merged in older versions as well (i.e. 6.26 and earlier).

I don't know if there is a plan to update older versions with this fix - in CMS we just cherry-picked the fix into our fork of ROOT.

@wdconinc
Copy link
Contributor

wdconinc commented Mar 8, 2024

In CMS we just cherry-picked the fix into our fork of ROOT.

But that still forces everyone to use the CMS environment all the time. At our earlier stage of development, we have many users who use computing environments at clusters where we don't have control over the version installed, let alone patches installed.

@sethrj
Copy link

sethrj commented Mar 14, 2024

Yeah, this would mean updating all CI versions, cluster environments, etc. if anyone wants to use a new ROOT version... aren't there "compatibility" flags ROOT can use to ensure backward compatibility?

@pcanal
Copy link
Member

pcanal commented Mar 14, 2024

This one time unfortunately that is not the case. The written file were incorrectly and could lead to spurious crash. The new file when read in unpatched old version can lead to 'just' spurious error message (that can be suppress with the insertion of custom ROOT error handler).

@pikacic
Copy link

pikacic commented Mar 15, 2024

@pcanal, it's not just spurious error messages: @chrisburr mentioned that hadd (as an example) segfaults, while I do not remember reports of the "spurious crashes" you mention in our productions.

I was discussing with @dpiparo about an option to temporarily enable compatibility of the written file with ROOT 6.24, along the lines of @sethrj suggestion.
This would save us from a deployment nightmare. We can easily add a ROOT option for the jobs that produce files we have to read with old versions of ROOT, but rebuilding legacy versions of the experiment software stack on a patched ROOT 6.24 is much more problematic.

@pcanal
Copy link
Member

pcanal commented Mar 15, 2024

it's not just spurious error messages:

Then I am misremembering/missing something. Let me check a few things.

@pcanal
Copy link
Member

pcanal commented Mar 15, 2024

So indeed, there is a path that lead to the new file inadvertently disabling the cleanup mechanism, so we will need to add an option.

We can easily add a ROOT option for the jobs that produce files we have to read with old versions of ROOT,

@pikacic What type of option is convenient/easy to add to your use cases? (rootrc or compilation flag or function call or ...?)

@pikacic
Copy link

pikacic commented Mar 18, 2024

The most practical way for us is a function call, followed by an option passed to TFile::Open. The idea is that we write the correct format by default, but for some specific grid jobs we switch to the legacy format (so rootrc and compilation flags are not applicable).

@pcanal
Copy link
Member

pcanal commented Mar 18, 2024

With #15006, you can use on the file that needs to be forward compatible:

file->SetBit(TFile::k630forwardCompatibility);

(of course before storing anything :) ).

@wdconinc
Copy link
Contributor

On our end, the function in #15006 is not an optimal solution since we use external libraries (podio, edm4hep) that use ROOT to open files: this change would require changes in multiple places, including in community projects that may not be able to provide a single solution for their entire user base.

A rootrc entry or a static TFile::Set630ForwardCompatibility() would therefore be preferred. Presumably, during one session, one would not intentionally write both compatible and incompatible files.

@pcanal
Copy link
Member

pcanal commented Mar 20, 2024

Fair enough. #15006 was updated to add in the rootrc:

# Force the producing of files forward compatible with (unpatched) version
# of ROOT older than v6.30 by recording the internal bits kIsOnHeap and
# kNotDeleted; Older releases were not explicitly setting those bits to the
# correct value but instead used verbatim the value stored in the file.
# TFile.v630forwardCompatibility: no

@pikacic
Copy link

pikacic commented Mar 22, 2024

Thanks a lot @pcanal, that's perfect.

Copy link

Hi @pcanal, @dpiparo,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

tmadlener pushed a commit to key4hep/Gaudi that referenced this issue Aug 28, 2024
…mode

This makes files written with ROOT 6.30, although technically incorrect, to be read by
unpatched ROOT 6.24.

See root-project/root#14793
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug experiment Affects an experiment / reported by its software & computimng experts priority:high
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants