Skip to content

fix #3231 prevent snapshot corruption#3232

Open
nowheresly wants to merge 1 commit intomoby:masterfrom
nowheresly:fix3231
Open

fix #3231 prevent snapshot corruption#3232
nowheresly wants to merge 1 commit intomoby:masterfrom
nowheresly:fix3231

Conversation

@nowheresly
Copy link
Copy Markdown

- What I did

Fixed snapshot corruption that occurs when the swarm state is large enough
that the raft message struct overhead exceeds GRPCMaxMsgSize.

- How I did it

Two fixes in manager/state/raft/transport/peer.go:

  1. Clamped raftMessagePayloadSize to a minimum of 64 KiB. When a large cluster
    state causes the raft message struct overhead to exceed GRPCMaxMsgSize, the
    computed payload size went negative, leading to corruption during snapshot
    splitting. The payload size is now floored at 64 KiB to guarantee forward
    progress.
  2. Copied the Snapshot struct before slicing chunk data. The splitSnapshotData
    loop was assigning sub-slices of Snapshot.Data directly through a shared
    pointer, which mutated the original message's snapshot data across iterations.
    Each chunk now gets its own copy of the Snapshot struct so the original data
    remains intact for correct sub-slicing.

- How to test it

  • go test ./manager/state/raft/transport/ -run TestSplitSnapshotData —
    verifies both normal snapshots and the large-metadata scenario that previously
    panicked/corrupted data.
  • go test ./manager/state/raft/transport/ -run
    TestRaftMessagePayloadSizeMinimum — verifies the payload size floor at 64 KiB.

- Description for the changelog

Fixes #3231 — prevent corruption of snapshot when swarm state is large. When
the raft message struct overhead exceeds the gRPC max message size (e.g.
clusters with many objects), snapshot splitting now clamps the chunk payload
to a minimum of 64 KiB and correctly copies the snapshot struct per chunk to
avoid data corruption.

Signed-off-by: Sylvere Richard <sylvere.richard@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dockerd panic: slice bounds out of range in raft transport.splitSnapshotData (snapshot send)

1 participant