Skip to content

Commit

Permalink
free-threading: Argument-level locking
Browse files Browse the repository at this point in the history
Adapting C++ to handle parallelism due to free-threaded Python can be
tricky, especially when the original code is given as-is. This commit
an tentative API to retrofit locking onto existing code by locking the
arguments of function calls.
  • Loading branch information
wjakob committed Sep 20, 2024
1 parent 43fcf4d commit e6e094a
Show file tree
Hide file tree
Showing 5 changed files with 218 additions and 67 deletions.
6 changes: 6 additions & 0 deletions docs/api_core.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1638,6 +1638,12 @@ parameter of :cpp:func:`module_::def`, :cpp:func:`class_::def`,
explain it in docstrings and stubs (``str(value)``) does not produce
acceptable output.

.. cpp:function:: arg &lock(bool value = true)

Set a flag noting that this argument must be locked when dispatching a
function call in free-threaded Python extensions. It does nothing in
regular GIL-protected extensions.

.. cpp:struct:: is_method

Indicate that the bound function is a method.
Expand Down
42 changes: 42 additions & 0 deletions docs/functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -545,3 +545,45 @@ The following interactive session shows how to call them from Python.
C++ libraries (e.g. GUI libraries, asynchronous networking libraries,
etc.).

.. _binding-overheads:

Minimizing binding overheads
----------------------------

The code that dispatches function calls from Python to C++ is in general
:ref:`highly optimized <benchmarks>`. When it is important to further reduce
binding overheads to an absolute minimum, consider removing annotations for
:ref:`keyword and default arguments <keyword_and_default_args>` along with
other advanced binding annotations.

In the snippet below, ``f1`` has lower binding overheads compared to ``f2``.

.. code-block:: cpp
NB_MODULE(my_ext, m) {
m.def("f1", [](int) { /* no-op */ });
m.def("f2", [](int) { /* no-op */ }, "arg"_a);
}
This is because ``f1``:

- does not accept keyword arguments and does not specify :ref:`default argument
values <keyword_and_default_args>`.

- has no :cpp:class:`nb::keep_alive\<Nurse, Patient\>() <keep_alive>` or
:ref:`argument locking <argument-locks>` annotations.

- takes no variable-length positional (:cpp:class:`nb::args <args>`) or keyword
(:cpp:class:`nb::kwargs <kwargs>`) arguments.

- has 8 or fewer arguments.

If all of the above conditions are satisfied, nanobind switches to a
specialized dispatcher that is optimized to handle a small number of positional
arguments. Otherwise, it uses the default dispatcher that works in any
situation. It is also worth noting that functions with many overloads generally
execute more slowly, since nanobind must first select a suitable one.

These differences are mainly of interest when a function that does *very
little* is called at a *very high rate*, in which case binding overheads can
become noticeable.
63 changes: 51 additions & 12 deletions include/nanobind/nb_attr.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,11 @@ struct arg {
return *this;
}

NB_INLINE arg &lock(bool value = true) {
lock_ = value;
return *this;
}

NB_INLINE arg &sig(const char *value) {
signature_ = value;
return *this;
Expand All @@ -40,6 +45,7 @@ struct arg {
const char *name_, *signature_;
uint8_t convert_{ true };
bool none_{ false };
bool lock_{ false };
};

struct arg_v : arg {
Expand All @@ -62,6 +68,7 @@ struct is_flag {};
struct is_final {};
struct is_generic {};
struct kw_only {};
struct lock_self {};

template <size_t /* Nurse */, size_t /* Patient */> struct keep_alive {};
template <typename T> struct supplement {};
Expand Down Expand Up @@ -126,16 +133,38 @@ enum class func_flags : uint32_t {
/// Does this overload specify a custom function signature (for docstrings, typing)
has_signature = (1 << 16),
/// Does this function have one or more nb::keep_alive() annotations?
has_keep_alive = (1 << 17)
has_keep_alive = (1 << 17),
/// Free-threaded Python: does the binding lock the 'self' argument
lock_self = (1 << 18)
};

enum cast_flags : uint8_t {
// Enable implicit conversions (code assumes this has value 1, don't reorder..)
convert = (1 << 0),

// Passed to the 'self' argument in a constructor call (__init__)
construct = (1 << 1),

// Indicates that the function dispatcher should accept 'None' arguments
accepts_none = (1 << 2),

// Indicates that a function argument must be locked before dispatching a call
lock = (1 << 3),

// Indicates that this cast is performed by nb::cast or nb::try_cast.
// This implies that objects added to the cleanup list may be
// released immediately after the caster's final output value is
// obtained, i.e., before it is used.
manual = (1 << 4)
};


struct arg_data {
const char *name;
const char *signature;
PyObject *name_py;
PyObject *value;
bool convert;
bool none;
uint8_t flag;
};

template <size_t Size> struct func_data_prelim {
Expand Down Expand Up @@ -266,27 +295,37 @@ NB_INLINE void func_extra_apply(F &, std::nullptr_t, size_t &) { }

template <typename F>
NB_INLINE void func_extra_apply(F &f, const arg &a, size_t &index) {
arg_data &arg = f.args[index++];
uint8_t flag = 0;
if (a.none_)
flag |= (uint8_t) cast_flags::accepts_none;
if (a.convert_)
flag |= (uint8_t) cast_flags::convert;
if (a.lock_)
flag |= (uint8_t) cast_flags::lock;

arg_data &arg = f.args[index];
arg.flag = flag;
arg.name = a.name_;
arg.signature = a.signature_;
arg.value = nullptr;
arg.convert = a.convert_;
arg.none = a.none_;
index++;
}

template <typename F>
NB_INLINE void func_extra_apply(F &f, const arg_v &a, size_t &index) {
arg_data &arg = f.args[index++];
arg.name = a.name_;
arg.signature = a.signature_;
arg.value = a.value.ptr();
arg.convert = a.convert_;
arg.none = a.none_;
arg_data &ad = f.args[index];
func_extra_apply(f, (const arg &) a, index);
ad.value = a.value.ptr();
}

template <typename F>
NB_INLINE void func_extra_apply(F &, kw_only, size_t &) {}

template <typename F>
NB_INLINE void func_extra_apply(F &f, lock_self, size_t &) {
f.flags |= (uint32_t) func_flags::lock_self;
}

template <typename F, typename... Ts>
NB_INLINE void func_extra_apply(F &, call_guard<Ts...>, size_t &) {}

Expand Down
14 changes: 0 additions & 14 deletions include/nanobind/nb_cast.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,20 +32,6 @@
NAMESPACE_BEGIN(NB_NAMESPACE)
NAMESPACE_BEGIN(detail)

enum cast_flags : uint8_t {
// Enable implicit conversions (impl. assumes this is 1, don't reorder..)
convert = (1 << 0),

// Passed to the 'self' argument in a constructor call (__init__)
construct = (1 << 1),

// Indicates that this cast is performed by nb::cast or nb::try_cast.
// This implies that objects added to the cleanup list may be
// released immediately after the caster's final output value is
// obtained, i.e., before it is used.
manual = (1 << 2),
};

/**
* Type casters expose a member 'Cast<T>' which users of a type caster must
* query to determine what the caster actually can (and prefers) to produce.
Expand Down
Loading

0 comments on commit e6e094a

Please sign in to comment.