Add missing "is_alive" checks before acquiring the GIL in destructors. #894

smurfix · 2025-01-30T05:26:12Z

cf. #891

src/nb_ndarray.cpp

docs/api_extra.rst

wjakob · 2025-01-30T05:40:24Z

Could you add a changelog as well?

smurfix · 2025-01-30T05:46:41Z

Done.

We'll see whether that is sufficient …

wjakob · 2025-01-30T07:36:21Z

After some more thought, I don't think that this really fixes the issue.

While nb::is_alive() might return True when the condition is checked, there is no guarantee that this is still the case at the next instruction. Basically this is a race condition. If you want to avoid undefined behavior, you will need to prevent this problem in another way.

smurfix · 2025-01-30T07:50:57Z

That is true in general. But what happens when the Python program throws an uncaught exception or otherwise ends abnormally, or simply is signalled to end?

I'm not concerned with preventing a 0.1% chance of a hard coredump and whatnot due to a race condition in this case. I'm concerned with preventing a 100% chance of getting one.

wjakob · 2025-01-30T10:43:34Z

By having a condition variable, mutex, or similar synchronization mechanism, you can guarantee that the right order is enforced during shutdown. If the application is killed, then the kernel will shut things down and none of this code will run at all.

A crash that happens in 0.1% of runs is super annoying because it is so hard to reproduce. I would rather have software fail spectacularly than accumulate lots of issues in the long tail.

smurfix · 2025-01-30T13:00:36Z

On the other hand, in a program that does have mutexes and whatnot, this issue raises its ugly head only when an abnormal situation happens. In that case it transforms a reasonably-clean stacktrace and debug dump into an inconsistent ugly mess. Been there done that; with this patch applied I can at least get out of my debugging session without crashing or having to resort to "kill -9".

wjakob · 2025-01-30T13:27:25Z

-> https://discuss.python.org/t/safely-using-the-c-api-when-python-might-shut-down/78850

Add missing "is_alive" checks before acquiring the GIL in destructors.

9e679ad

smurfix mentioned this pull request Jan 30, 2025

Added a comment about GIL acquisition vs. multithreading vs. program exit. #891

Open

wjakob reviewed Jan 30, 2025

View reviewed changes

src/nb_ndarray.cpp Outdated Show resolved Hide resolved

wjakob reviewed Jan 30, 2025

View reviewed changes

src/nb_ndarray.cpp Outdated Show resolved Hide resolved

Update the documentation accordingly

917d056

wjakob reviewed Jan 30, 2025

View reviewed changes

docs/api_extra.rst Outdated Show resolved Hide resolved

smurfix added 2 commits January 30, 2025 08:10

Unify formatting

7e2d20d

Added changelog entry

b7b485d

smurfix force-pushed the alive branch from 971d91b to b7b485d Compare January 30, 2025 07:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add missing "is_alive" checks before acquiring the GIL in destructors. #894

Add missing "is_alive" checks before acquiring the GIL in destructors. #894

smurfix commented Jan 30, 2025

wjakob commented Jan 30, 2025

smurfix commented Jan 30, 2025

wjakob commented Jan 30, 2025

smurfix commented Jan 30, 2025

wjakob commented Jan 30, 2025

smurfix commented Jan 30, 2025

wjakob commented Jan 30, 2025

Add missing "is_alive" checks before acquiring the GIL in destructors. #894

Are you sure you want to change the base?

Add missing "is_alive" checks before acquiring the GIL in destructors. #894

Conversation

smurfix commented Jan 30, 2025

wjakob commented Jan 30, 2025

smurfix commented Jan 30, 2025

wjakob commented Jan 30, 2025

smurfix commented Jan 30, 2025

wjakob commented Jan 30, 2025

smurfix commented Jan 30, 2025

wjakob commented Jan 30, 2025