Skip to content

fix(cmd/docker): prevent race between force-exit goroutine and plugin wait#6878

Open
zampani-docker wants to merge 1 commit intodocker:masterfrom
zampani-docker:zampani/fix-plugin-force-exit-race
Open

fix(cmd/docker): prevent race between force-exit goroutine and plugin wait#6878
zampani-docker wants to merge 1 commit intodocker:masterfrom
zampani-docker:zampani/fix-plugin-force-exit-race

Conversation

@zampani-docker
Copy link

Summary

  • Fixes a race condition in tryPluginRun where two goroutines could both call os.Exit with different exit codes when force-killing a plugin after 3 SIGINT/SIGTERM signals
  • Moves os.Exit(1) ownership to the main goroutine; the signal goroutine now only closes a channel and calls Kill()
  • Adds a return after the force-kill to prevent additional signals from causing a close on an already-closed channel (panic)

Root cause: when a plugin ignored context cancellation and 3 SIGINTs were sent, the signal goroutine called plugincmd.Process.Kill() then os.Exit(1). On a loaded runner, plugincmd.Run() in the main goroutine could return first — the SIGKILL'd plugin has ws.ExitStatus() = -1, leading to os.Exit(-1) = exit code 255 instead of 1.

Fix: forceExitCh is closed before Kill(). Since the plugin can only die after SIGKILL is delivered and Run() only returns after the process is reaped, forceExitCh is guaranteed to be closed before Run() returns in the force-kill path. The main goroutine checks the channel and owns os.Exit(1).

Test plan

  • Existing test TestPluginSocketCommunication/detached/the_main_CLI_exits_after_3_signals covers the fixed race — was flaky (exit code 255) on loaded CI runners with RC Docker on Alpine; should now be reliable
  • No new tests required: the fix is a concurrency correctness change to existing behaviour with existing test coverage

🤖 Generated with Claude Code

@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 0% with 10 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
cmd/docker/docker.go 0.00% 10 Missing ⚠️

📢 Thoughts on this report? Let us know!

… wait

When a plugin ignores context cancellation and the user sends 3 SIGINTs,
the CLI kills the plugin with SIGKILL. Previously the signal goroutine
called os.Exit(1) directly; a race existed where plugincmd.Run() could
return first (plugin was SIGKILL'd, so ws.ExitStatus() = -1) and the
main goroutine would call os.Exit(-1) = exit code 255 before the
goroutine reached os.Exit(1).

Fix by moving exit-code ownership to the main goroutine. The signal
goroutine closes forceExitCh before calling Kill(), guaranteeing the
channel is closed before plugincmd.Run() returns (the plugin can only
die after Kill() delivers SIGKILL; Run() only returns after the process
is reaped). The main goroutine checks forceExitCh after Run() returns
and performs the print + os.Exit(1) itself.

Also return from the signal goroutine after the force-kill to prevent
further loop iterations from calling close(forceExitCh) a second time
(which would panic), in case additional signals arrive while the kill
is in flight.

Fixes a flaky failure in TestPluginSocketCommunication/detached/
the_main_CLI_exits_after_3_signals where exit code 255 was observed
instead of 1 on loaded CI runners (RC Docker on Alpine).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Michael Zampani <michael.zampani@docker.com>
@zampani-docker zampani-docker force-pushed the zampani/fix-plugin-force-exit-race branch from c52c101 to 2bc4307 Compare March 21, 2026 22:31
@zampani-docker zampani-docker marked this pull request as ready for review March 21, 2026 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants