Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Particular combination of function overload and std::runtime_error leads to segfault #17497

Closed
1 task
vepadulano opened this issue Jan 22, 2025 · 4 comments · Fixed by #17536
Closed
1 task

Comments

@vepadulano
Copy link
Member

Check duplicate issues.

  • Checked for duplicates

Description

I can see a segfault in a seemingly innocent situation with an overload set that includes std::string_view and std::vector<std::string> in which at least one of the overloads throws a runtime error. The gdb stacktrace appears to be recursive, see attachment

gdb_stacktrace.txt

Note that this happens only if I pass an empty Python list [] that is then converted implicitly to the std::vector. If I pass manually an empty cppyy.gbl.std.vector[cppyy.gbl.std.string], it works. See the reproducer.

This issue is the direct cause of part 1 of #17485

Reproducer

import cppyy

cppyy.cppdef(r"""

void fun(std::string_view, std::string_view){throw std::runtime_error("std::string_view overload");}
void fun(std::string_view, const std::vector<std::string> &){throw std::runtime_error("const std::vector<std::string> & overload");}

""")

# This throws the correct runtime error
cppyy.gbl.fun("", cppyy.gbl.std.vector[cppyy.gbl.std.string]())

# Uncomment the following line to get a segfault
# cppyy.gbl.fun("", [])

ROOT version

master

Installation method

Any

Operating system

Linux Fedora 41

Additional context

No response

@vepadulano
Copy link
Member Author

vepadulano commented Jan 22, 2025

An interesting thing to note. In InstanceConverter::SetArg, even when the current argument is the std::vector<std::string>, i.e. the working case, I see the stacktrace going through STLStringViewConverter, which looks sketchy. Same for the non-working case, every recursion iteration shows the same stacktrace.

#0  CPyCppyy::(anonymous namespace)::InstanceConverter::SetArg (this=0x55555e25b160, pyobject=0x7fffd91bba10, para=..., ctxt=0x7fffffffcb90)
    at /home/vpadulan/Programs/rootproject/rootsrc/bindings/pyroot/cppyy/CPyCppyy/src/Converters.cxx:2206
#1  0x00007fffd5427e55 in CPyCppyy::(anonymous namespace)::STLStringViewConverter::SetArg (this=0x55555e25b160, pyobject=0x7fffd91bba10, para=..., 
    ctxt=0x7fffffffcb90) at /home/vpadulan/Programs/rootproject/rootsrc/bindings/pyroot/cppyy/CPyCppyy/src/Converters.cxx:2016
#2  0x00007fffd54848cc in CPyCppyy::CPPMethod::ConvertAndSetArgs (this=0x55555d688750, args=0x7ffff7fb1078, nargsf=9223372036854775810, ctxt=0x7fffffffcb90)
    at /home/vpadulan/Programs/rootproject/rootsrc/bindings/pyroot/cppyy/CPyCppyy/src/CPPMethod.cxx:949
#3  0x00007fffd547c10f in CPyCppyy::CPPFunction::Call (this=0x55555d688750, self=@0x7fffffffccd0: 0x0, args=0x7ffff7fb1078, nargsf=9223372036854775810, 
    kwds=0x0, ctxt=0x7fffffffcb90) at /home/vpadulan/Programs/rootproject/rootsrc/bindings/pyroot/cppyy/CPyCppyy/src/CPPFunction.cxx:86
#4  0x00007fffd548a817 in CPyCppyy::(anonymous namespace)::mp_vectorcall (pymeth=0x7fffe9797000, args=0x7ffff7fb1078, nargsf=9223372036854775810, kwds=0x0)
    at /home/vpadulan/Programs/rootproject/rootsrc/bindings/pyroot/cppyy/CPyCppyy/src/CPPOverload.cxx:699

I put some printouts in the SetArg function just to get some more info. Here is the diff

--- a/bindings/pyroot/cppyy/CPyCppyy/src/Converters.cxx
+++ b/bindings/pyroot/cppyy/CPyCppyy/src/Converters.cxx
@@ -45,6 +45,8 @@
 #endif
 #endif // py2
 
+#include <iostream>
+
 
 //- data _____________________________________________________________________
 namespace CPyCppyy {
@@ -2202,8 +2204,12 @@ bool CPyCppyy::InstanceConverter::SetArg(
 {
 // convert <pyobject> to C++ instance, set arg for call
     CPPInstance* pyobj = GetCppInstance(pyobject, fClass);
+    auto *pytype = Py_TYPE(pyobject);
+    auto *pytype_name = PyType_GetName(pytype);
+    std::cout << "InstanceConverter::SetArg: pyboject=" << pyobject << ",classname=" << Cppyy::GetScopedFinalName(fClass) << ",pytype=" << pytype << ",pytype_name=" << CPyCppyy_PyText_AsString(pytype_name) << "\n";
     if (pyobj) {
         auto oisa = pyobj->ObjectIsA();
+        std::cout << "InstanceConverter::SetArg: object is a " << Cppyy::GetScopedFinalName(oisa) << "\n";
         if (oisa && (oisa == fClass || Cppyy::IsSubtype(oisa, fClass))) {
         // calculate offset between formal and actual arguments
             para.fValue.fVoidp = pyobj->GetObject();
@@ -2218,6 +2224,8 @@ bool CPyCppyy::InstanceConverter::SetArg(
             para.fTypeCode = 'V';
             return true;
         }

And here the printouts

InstanceConverter::SetArg: pyboject=0x7fffd91bba10,classname=basic_string_view<char,char_traits<char> >,pytype=0x55555b8a5fd0,pytype_name=vector<string>
InstanceConverter::SetArg: object is a std::vector<string>

@wlav
Copy link
Contributor

wlav commented Jan 23, 2025

Not sketchy: that's simply the 2nd argument conversion to the trial of the first overload. It doesn't fail until it hits the predicate in the next line, so after the printouts, then returns false. The program then moves on to the next overload.

Anyway, looks to me like you have a stack overflow: ConvertImplicit should only ever be called once, yet it appears multiple times in the stack trace you posted. Normally, implicit conversion is stopped by a flag to disallow implicit conversions when doing implicit conversions. However, there's an exemption for list and tuple objects, as these can be passed as *args.

Not sure where things run afoul. My first suspicion is/was on Pythonize.cxx and yes, I found significant differences there between ROOT and cppyy proper: a whole soup of classes with and without std:: fly by in the ROOT case, whereas cppyy is nicely consistent. Of note, I see basic_string_view<char,char_traits<char> >, meaning you don't get to StringViewInit in ROOT whereas in cppyy, the class name is std::basic_string_view<char>, so you do. That's not the problem, though. Or at least, not directly: changing the name to match what's given by ROOT/meta to have StringViewInit called still crashes.

However, I cleaned up the using namespace std mess b/c I didn't want to deal with that any longer, hence I'm not following the white rabbit any further, but perhaps it helps you (unless it's a red herring instead, but there's definitely a bug there with the string_view pythonization, so that's something).

vepadulano added a commit to vepadulano/root that referenced this issue Jan 27, 2025
The check `klass == ctxt->fCurScope` in ConvertImplicit was always returning
false due to a bad construction of the `ctxt` variable of type
`CPyCppyy::CallContext`.

In particular, the `fCurScope` of the context is taken as the `fScope` data
member of the instance converter. This value is usually created at construction
time, for example in the `STLStringViewConverter` constructor, by calling
`Cppyy::GetScope`.

The `GetScope` implementation of ROOT's clingwrapper was outdated w.r.t. the
cppyy-backend implementation. This commit aligns the two implementations, thus
fixing issues such as the one reported at
root-project#17497.
vepadulano added a commit to vepadulano/root that referenced this issue Jan 27, 2025
vepadulano added a commit to vepadulano/root that referenced this issue Jan 27, 2025
@aaronj0
Copy link
Contributor

aaronj0 commented Jan 27, 2025

Not sketchy: that's simply the 2nd argument conversion to the trial of the first overload. It doesn't fail until it hits the predicate in the next line, so after the printouts, then returns false. The program then moves on to the next overload.

Anyway, looks to me like you have a stack overflow: ConvertImplicit should only ever be called once, yet it appears multiple times in the stack trace you posted. Normally, implicit conversion is stopped by a flag to disallow implicit conversions when doing implicit conversions. However, there's an exemption for list and tuple objects, as these can be passed as *args.

Not sure where things run afoul. My first suspicion is/was on Pythonize.cxx and yes, I found significant differences there between ROOT and cppyy proper: a whole soup of classes with and without std:: fly by in the ROOT case, whereas cppyy is nicely consistent. Of note, I see basic_string_view<char,char_traits<char> >, meaning you don't get to StringViewInit in ROOT whereas in cppyy, the class name is std::basic_string_view<char>, so you do. That's not the problem, though. Or at least, not directly: changing the name to match what's given by ROOT/meta to have StringViewInit called still crashes.

However, I cleaned up the using namespace std mess b/c I didn't want to deal with that any longer, hence I'm not following the white rabbit any further, but perhaps it helps you (unless it's a red herring instead, but there's definitely a bug there with the string_view pythonization, so that's something).

Thanks for the insights @wlav!

Based on some debugging I could do along with @vepadulano 's help, the infinite loop is the fact is not only the InstanceConverter, setting the arg repeatedly, but also the fact that we end up rebuilding the python shadow class for basic_string_view<char,char_traits<char> > over and over again:

CreateScopeProxy-> BuildScopeProxyDict: KLASS - 8 NAME: basic_string_view<char,char_traits<char> > LOOKUP: basic_string_view<char,char_traits<char> >
StringViewConverter -> InstanceConverter::SetArg: pyboject=0x7f47854e9240,classname=basic_string_view<char,char_traits<char> >,pytype=0xa3b8a0,pytype_name=list
CreateScopeProxy-> BuildScopeProxyDict: KLASS - 8 NAME: basic_string_view<char,char_traits<char> > LOOKUP: basic_string_view<char,char_traits<char> >
StringViewConverter -> InstanceConverter::SetArg: pyboject=0x7f47854e9240,classname=basic_string_view<char,char_traits<char> >,pytype=0xa3b8a0,pytype_name=list
CreateScopeProxy-> BuildScopeProxyDict: KLASS - 8 NAME: basic_string_view<char,char_traits<char> > LOOKUP: basic_string_view<char,char_traits<char> >

As mentioned, ConvertImplicit should stop by the second round at which the condition:

if (IsConstructor(ctxt->fFlags) && klass == ctxt->fCurScope && ctxt->GetSize() == 1)

Should evaluate to true. The point at which they differ is the klass not matching the ctxt->fCurScope, but the problem seems to be deeper than that. The CreateScopeProxy is called everytime for the exact same scope(8) that we get.

The actual creation of every member of the std::class happens like so upstream:

StringViewConverter -> InstanceConverter::SetArg: pyboject=0x7ffff75ad240,classname=std::basic_string_view<char>,pytype=0xa3b8a0,pytype_name=list
ConvertImplicit -> CreateScopeProxy-> BuildScopeProxyDict: KLASS - 9 NAME: std::basic_string_view<char> SCNAME:  LOOKUP: std::basic_string_view<char>

Created CPPMethod-basic_string_view Scope:9
Created CPPMethod-basic_string_view Scope:9
Created CPPMethod-basic_string_view Scope:9
Created CPPMethod-basic_string_view Scope:9
Created CPPMethod-operator= Scope:9
Created CPPMethod-begin Scope:9
Created CPPMethod-end Scope:9
Created CPPMethod-cbegin Scope:9

And on ROOT the same way with scope 8 instead. We should get the same value by doing Cppyy::GetScope("std::string_view") which is the case on master.

On ROOT, even though we still take the StLStringViewConverter, it is initialised like this:

CPyCppyy::STLStringViewConverter::STLStringViewConverter(bool keepControl) :
    InstanceConverter(Cppyy::GetScope("std::string_view"), keepControl) {}

And here lies our problem.

Cppyy::GetScope("std::string_view") gives 10 instead of 8, and if we patch the constructor as so:

CPyCppyy::STLStringViewConverter::STLStringViewConverter(bool keepControl) :
    InstanceConverter(Cppyy::GetScope("basic_string_view<char,char_traits<char> >"), keepControl) {}

The code path followed is the same as upstream. I believe #17536 fixes the lookup on meta to provide the same scope for both, and should be a viable fix, and also reduce the divergence in clingwrapper between ROOT and upstream cppyy

@vepadulano
Copy link
Member Author

vepadulano commented Jan 27, 2025

Dear @wlav ,

Thank you for taking the time to analyse this issue! Your insights proved useful, indeed the fact that ConverImplicit was being called multiple times was wrong and the flag that should have stopped was the check klass == ctxt->fCurScope. Also thanks to a pair debugging session with @aaronj0 I followed the thread of where the fCurScope was being set to finally find out that the ROOT implementation of Cppyy::GetScope was imperfect in this scenario. I took the liberty of aligning it as much as possible with the implementation at https://github.com/wlav/cppyy-backend/blob/master/clingwrapper/src/clingwrapper.cxx and I opened the linked PR. We will discuss internally to understand what is the best course of action to integrate the better implementation into ROOT, further aligning the two implementations.

vepadulano added a commit to vepadulano/root that referenced this issue Jan 29, 2025
Disabled on MacOS ARM and Windows because LLVM JIT fails to catch exceptions.
vepadulano added a commit to vepadulano/root that referenced this issue Jan 29, 2025
Disabled on MacOS ARM and Windows because LLVM JIT fails to catch exceptions.
vepadulano added a commit to vepadulano/root that referenced this issue Jan 29, 2025
Disabled on MacOS ARM and Windows because LLVM JIT fails to catch exceptions.
vepadulano added a commit that referenced this issue Jan 31, 2025
The check `klass == ctxt->fCurScope` in ConvertImplicit was always returning
false due to a bad construction of the `ctxt` variable of type
`CPyCppyy::CallContext`.

In particular, the `fCurScope` of the context is taken as the `fScope` data
member of the instance converter. This value is usually created at construction
time, for example in the `STLStringViewConverter` constructor, by calling
`Cppyy::GetScope`.

The `GetScope` implementation of ROOT's clingwrapper was outdated w.r.t. the
cppyy-backend implementation. This commit aligns the two implementations, thus
fixing issues such as the one reported at
#17497.
vepadulano added a commit that referenced this issue Jan 31, 2025
Disabled on MacOS ARM and Windows because LLVM JIT fails to catch exceptions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

4 participants