Skip to content

Fix concurrent sendMessage race in StdioClientTransport#876

Open
AbbasNS wants to merge 2 commits intomodelcontextprotocol:mainfrom
AbbasNS:as/fix-concurrent-sendmessage-race
Open

Fix concurrent sendMessage race in StdioClientTransport#876
AbbasNS wants to merge 2 commits intomodelcontextprotocol:mainfrom
AbbasNS:as/fix-concurrent-sendmessage-race

Conversation

@AbbasNS
Copy link

@AbbasNS AbbasNS commented Mar 20, 2026

Problem

StdioClientTransport.sendMessage() uses tryEmitNext() on a unicast Reactor sink (Sinks.many().unicast().onBackpressureBuffer()). When two threads call sendMessage() concurrently on the same transport, one fails with "Failed to enqueue message".

Root cause

Reactor wraps the unicast sink in SinkManySerialized, which uses a CAS-based guard in tryEmitNext(). When two threads race on the CAS, the loser immediately gets FAIL_NON_SERIALIZED — the method does not retry. The current code checks only .isSuccess() and returns Mono.error for any failure, discarding the retryable FAIL_NON_SERIALIZED result.

The existing TODO at line 230-235 acknowledges this:

"we delegate the retry and the backpressure onto the caller. This might be enough for most cases."

It is not enough when an MCP server proxies concurrent tool calls to a downstream server via stdio — each call dispatches on a separate reactor thread and both call sendMessage() on the same transport.

Fix

Replace tryEmitNext with emitNext + Sinks.EmitFailureHandler.busyLooping(Duration.ofMillis(100)).

emitNext with busyLooping spin-retries on FAIL_NON_SERIALIZED until the competing thread finishes its CAS (microseconds in practice). The 100ms duration is just a generous upper bound for pathological cases like GC pauses. This is the approach recommended by Reactor's own emitNext Javadoc:

"It would be possible for an EmitFailureHandler to busy-loop and optimistically wait for the contention to disappear"

For non-retryable failures (FAIL_TERMINATED, FAIL_CANCELLED), emitNext throws EmissionException which is caught and converted to Mono.error, preserving the existing error behavior.

Test

StdioClientTransportConcurrencyTest reproduces the race with two threads sending through the same transport concurrently (20 repetitions):

  • Before fix: 19/20 fail
  • After fix: 20/20 pass

Closes #875

AbbasNS added 2 commits March 20, 2026 15:11
When two threads call sendMessage() concurrently on the same
StdioClientTransport, the unicast sink's SinkManySerialized wrapper
returns FAIL_NON_SERIALIZED via its CAS guard, causing
"Failed to enqueue message".

This test reproduces the race: 19/20 repetitions fail.

Closes modelcontextprotocol#875
Replace tryEmitNext (fail-fast) with emitNext + busyLooping(100ms)
in StdioClientTransport.sendMessage().

The unicast sink's SinkManySerialized wrapper returns FAIL_NON_SERIALIZED
when two threads call tryEmitNext concurrently. busyLooping retries the
CAS spin instead of immediately failing, making concurrent sends safe.
The contention window is microseconds (single CAS operation), so the
100ms duration is just a generous upper bound for pathological cases
like GC pauses.

Before: 19/20 test repetitions fail
After:  20/20 pass

Closes modelcontextprotocol#875
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

StdioClientTransport.sendMessage() fails with concurrent calls due to unicast sink race

1 participant