From 24c05e2d5738c07ed94132ace189322162b0ef64 Mon Sep 17 00:00:00 2001 From: "Gregory P. Smith" Date: Fri, 20 Mar 2026 16:01:17 +0000 Subject: [PATCH 1/2] Document reusing a thread state across repeated foreign-thread calls Add a subsection under "Non-Python created threads" explaining the performance cost of creating/destroying a PyThreadState on every Ensure/Release cycle and showing how to keep one alive for the thread's lifetime instead. --- Doc/c-api/threads.rst | 55 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/Doc/c-api/threads.rst b/Doc/c-api/threads.rst index 3b761d0c657cbd..7e45b7070dbea4 100644 --- a/Doc/c-api/threads.rst +++ b/Doc/c-api/threads.rst @@ -227,6 +227,61 @@ For example:: If the interpreter finalized before ``PyThreadState_Swap`` was called, then ``interp`` will be a dangling pointer! +Reusing a thread state across repeated calls +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Creating and destroying a :c:type:`PyThreadState` is not free, and is +more expensive on a :term:`free-threaded build`. If a non-Python thread +calls into the interpreter many times, creating a fresh thread state on +every entry and destroying it on every exit is a performance +anti-pattern. Instead, create the thread state once (when the native +thread starts, or lazily on its first call into Python), attach and +detach it around each call, and destroy it when the native thread +exits:: + + /* Thread startup: create the state once. */ + PyThreadState *tstate = PyThreadState_New(interp); + + /* Per-call: attach, run Python, detach. */ + PyEval_RestoreThread(tstate); + result = CallSomeFunction(); + PyEval_SaveThread(); + + /* ... many more calls ... */ + + /* Thread shutdown: destroy the state once. */ + PyEval_RestoreThread(tstate); + PyThreadState_Clear(tstate); + PyThreadState_DeleteCurrent(); + +The equivalent with the :ref:`PyGILState API ` keeps an *outer* +:c:func:`PyGILState_Ensure` outstanding for the thread's lifetime, so +nested Ensure/Release pairs never drop the internal nesting counter to +zero:: + + /* Thread startup: create and pin the state. */ + PyGILState_STATE outer = PyGILState_Ensure(); + PyThreadState *saved = PyEval_SaveThread(); + + /* Per-call: the thread state already exists. */ + PyGILState_STATE inner = PyGILState_Ensure(); + result = CallSomeFunction(); + PyGILState_Release(inner); + + /* ... many more calls ... */ + + /* Thread shutdown: unpin and destroy the state. */ + PyEval_RestoreThread(saved); + PyGILState_Release(outer); + +The embedding code must arrange for the shutdown sequence to run before +the native thread exits, and before :c:func:`Py_FinalizeEx` is called. +If interpreter finalization begins first, the shutdown +:c:func:`PyEval_RestoreThread` call will hang the thread (see +:c:func:`PyEval_RestoreThread` for details) rather than return. If the +native thread exits without running the shutdown sequence, the thread +state is leaked for the remainder of the process. + .. _gilstate: Legacy API From 9d7365dda508b5e7b8dbf300bacabfb5be2af773 Mon Sep 17 00:00:00 2001 From: "Gregory P. Smith" Date: Sat, 21 Mar 2026 00:09:58 +0000 Subject: [PATCH 2/2] Clarify the PyGILState variant in the docs... as being an init, do things, finalizer trio where lots of stuff without the GIL held can happen inbetween. There might not be a GIL but when used in builds where there is you don't want to hold it. There's an internal recursion counter within the PyGILState APIs, if it goes to 0 on Release, any Python thread state that it created is destroyed. We're working around that. Should I mention that internals detail? --- Doc/c-api/threads.rst | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/Doc/c-api/threads.rst b/Doc/c-api/threads.rst index 7e45b7070dbea4..ab76971ba97780 100644 --- a/Doc/c-api/threads.rst +++ b/Doc/c-api/threads.rst @@ -257,20 +257,28 @@ exits:: The equivalent with the :ref:`PyGILState API ` keeps an *outer* :c:func:`PyGILState_Ensure` outstanding for the thread's lifetime, so nested Ensure/Release pairs never drop the internal nesting counter to -zero:: +zero. + +In thread startup, pin the state and immediately detach so the thread +does not hold the GIL while off doing non-Python work. Stash ``outer`` +and ``saved`` somewhere that survives for the thread's lifetime (for +example, in thread-local storage):: - /* Thread startup: create and pin the state. */ PyGILState_STATE outer = PyGILState_Ensure(); - PyThreadState *saved = PyEval_SaveThread(); + PyThreadState *saved = PyEval_SaveThread(); + +Each subsequent call into Python from this thread reuses the pinned +state; the inner Release decrements the nesting counter but does not +destroy the thread state because the outer Ensure is still +outstanding:: - /* Per-call: the thread state already exists. */ PyGILState_STATE inner = PyGILState_Ensure(); result = CallSomeFunction(); PyGILState_Release(inner); - /* ... many more calls ... */ +At thread shutdown, re-attach and drop the outer reference to destroy +the thread state:: - /* Thread shutdown: unpin and destroy the state. */ PyEval_RestoreThread(saved); PyGILState_Release(outer);