Skip to content

gh-135953: Properly obtain main thread identifier in Gecko Collector#146045

Open
flowln wants to merge 6 commits intopython:mainfrom
flowln:fix_tachyon_main_thread
Open

gh-135953: Properly obtain main thread identifier in Gecko Collector#146045
flowln wants to merge 6 commits intopython:mainfrom
flowln:fix_tachyon_main_thread

Conversation

@flowln
Copy link

@flowln flowln commented Mar 17, 2026

Since running a profiler via CLI (python -m profiling.sampling run) spawns a new subprocess where the actual user-specified code will run, a call to threading.main_thread() in the collector's process will not return the profiled process's main thread.

To combat this, we add a new attribute to the returned InterpreterState for the sampled stack frames, which takes the id of the thread_main member from the original sampled interpreter's state.

This allows the main thread to be highlighted on Firefox Profiler:
image

Since running a profiler via CLI (python -m profiling.sampling run)
spawns a new subprocess where the actual user-specified code will run,
a call to threading.main_thread() in the collector's process will not
return the profiled process's main thread.

To combat this, we rely on the fact that thread objects are inserted
in such a way that the first object in the list represents the oldest
ThreadState object [1], which corresponds to a ThreadState associated
with the main thread.

[1] - https://github.com/python/cpython/blob/1b118353bb0a9d816de6ef673f3b11775de5bec5/Include/internal/pycore_interp_structs.h#L831

Signed-off-by: Sofia Donato Ferreira <flowlnlnln@gmail.com>
@flowln flowln requested a review from pablogsal as a code owner March 17, 2026 02:12
@bedevere-app
Copy link

bedevere-app bot commented Mar 17, 2026

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@python-cla-bot
Copy link

python-cla-bot bot commented Mar 17, 2026

All commit authors signed the Contributor License Agreement.

CLA signed

The ordering is actually newest -> oldest, since the _remote_debugging
code traverses the ThreadState linked list in the intepreter state, appending
to a list of threads in-order.

Signed-off-by: Sofia Donato Ferreira <flowlnlnln@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Mar 17, 2026

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

It seems to sometimes happen that some samples (possibly at the very start / end)
do not have any active threads in the interpreter state.

It is OK to make main_tid maybe None here, since if it is, the threads list is
empty, and so the for loop will not execute anyways.

Signed-off-by: Sofia Donato Ferreira <flowlnlnln@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Mar 17, 2026

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@pablogsal
Copy link
Member

Thanks for the PR!

I think we should do this but I find the approach a bit flaky. We have enough information in the C layer to know what thread is the main thread so we can surface that in the data that we emit (or alternatively we can add to the unwinder a new method that tells us the tid of the main thread).

Do you feel comfortable to do this in the C code?

…tate

Signed-off-by: Sofia Donato Ferreira <flowlnlnln@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Mar 21, 2026

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@flowln
Copy link
Author

flowln commented Mar 21, 2026

@pablogsal I've made the minimum set of changes in the _remote_debugging module to get the main thread id information to the Python side. Since I don't have much of an idea how this module relates to other functionality within the project, please tell me if you see anything that needs changing!

… in Gecko test

Signed-off-by: Sofia Donato Ferreira <flowlnlnln@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Mar 21, 2026

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

Copy link
Member

@pablogsal pablogsal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. Moving the main-thread identification into C is the right direction, but I don't think this version is ready to merge yet.

There are two correctness issues with the current shape:

  1. main_thread_id can remain unset in the live unwinder path. In module.c it is only populated if the loop actually visits the interpreter's main PyThreadState, but in filtered modes (CPU, GIL, EXCEPTION) unwind_stack_for_thread() can skip that thread entirely, and the only_active_thread / targeted-thread paths can avoid it as well.

  2. The new InterpreterInfo.main_thread_id field is not wired through binary replay. InterpreterInfo now has 3 fields, but the binary writer does not serialize main_thread_id, and the binary reader only fills items 0 and 2. That means replay ... --format gecko will still lose the main-thread marker.

I think the cleanest fix is to model this per thread rather than per interpreter: either add an is_main_thread field on ThreadInfo or, better, a new THREAD_STATUS_* bit. That avoids depending on the main thread being present in a given sample, and it should flow through the existing binary status byte naturally.

If you prefer to keep the current main_thread_id approach, then I think we at least need to: use Py_None when the value is unknown, extend the binary format reader/writer for the new field, and add tests for binary-to-Gecko replay plus a filtered-mode case.

@bedevere-app
Copy link

bedevere-app bot commented Mar 22, 2026

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@pablogsal
Copy link
Member

I can give this a go, don't worry

…terInfo field

Replace the main_thread_id field in InterpreterInfo with a
THREAD_STATUS_MAIN_THREAD status flag set directly in the thread
unwinding code. This simplifies the struct from 3 to 2 fields and
removes the need to compare thread IDs at the Python level.
@bedevere-app
Copy link

bedevere-app bot commented Mar 22, 2026

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants