Skip validator duties while syncing#246
Skip validator duties while syncing#246thinktanktom wants to merge 2 commits intolambdaclass:mainfrom
Conversation
During sync the node lacks a complete view of the chain. Proposing blocks or producing attestations in this state would cast uninformed votes and potentially harm the network. Add is_syncing field to BlockChainServer (starts true) and check it on every tick by comparing store.head_slot() against the wall-clock slot. If we're more than SYNC_TOLERANCE_SLOTS (2) behind, duties are suppressed. The transition is logged once in each direction. Also adds lean_is_syncing Prometheus gauge so the sync state is visible on the dashboard from node startup. Closes lambdaclass#236
Greptile SummaryThis PR adds sync-awareness to Key changes:
Issue:
Confidence Score: 3/5
|
| Filename | Overview |
|---|---|
| crates/blockchain/src/lib.rs | Adds is_syncing field and sync-detection logic in on_tick. Core logic is correct, but the lean_is_syncing metric is not initialised to 1 at startup, so it misreports the initial sync state. |
| crates/blockchain/src/metrics.rs | Adds LEAN_IS_SYNCING IntGauge, forces it in init(), and exposes set_is_syncing(). Implementation mirrors the existing LEAN_IS_AGGREGATOR pattern cleanly. |
| docs/metrics.md | Documents lean_is_syncing in the Validator Metrics table; a missing blank line before ## Network Metrics may affect rendering in some Markdown parsers. |
Sequence Diagram
sequenceDiagram
participant T as Ticker
participant BCS as BlockChainServer
participant S as Store
participant M as Metrics
participant P2P as P2P Layer
Note over BCS: is_syncing = true (startup)
Note over M: lean_is_syncing = 0 (BUG: should be 1)
loop Every 800 ms tick
T->>BCS: on_tick(timestamp_ms)
BCS->>S: head_slot()
S-->>BCS: head_slot
BCS->>BCS: behind_by = slot - head_slot
BCS->>BCS: now_syncing = behind_by > SYNC_TOLERANCE_SLOTS (2)
alt Sync state changed
BCS->>M: set_is_syncing(now_syncing)
BCS->>BCS: self.is_syncing = now_syncing
end
alt !is_syncing && interval == 0 && slot > 0
BCS->>BCS: get_our_proposer(slot)
end
BCS->>S: on_tick(...)
S-->>BCS: new_aggregates
alt has new aggregates
BCS->>P2P: publish_aggregated_attestation(aggregate)
end
alt proposer_validator_id is Some
BCS->>BCS: propose_block(slot, validator_id)
BCS->>P2P: publish block
end
alt !is_syncing && interval == 1
BCS->>BCS: produce_attestations(slot)
BCS->>P2P: publish attestations
end
BCS->>M: update_safe_target_slot / update_head_slot
end
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/blockchain/src/lib.rs
Line: 58
Comment:
**`lean_is_syncing` metric starts at 0 despite `is_syncing = true`**
`is_syncing` is initialized to `true` here, but `metrics::set_is_syncing(true)` is never called at this point. Prometheus initialises `IntGauge` to `0` by default, and `metrics::set_is_syncing` is only called inside the `if now_syncing != self.is_syncing` branch, which fires on *transitions*. Because the field starts at `true`, the very first tick where the node is still syncing finds `now_syncing == self.is_syncing` (both `true`) and skips the branch — so the metric stays at `0` (not-syncing) for the entire initial sync.
The fix is to initialize the metric at startup, just as `set_is_aggregator` is called in `BlockChain::spawn`:
```rust
// In BlockChain::spawn(), alongside set_is_aggregator:
metrics::set_is_syncing(true);
```
```suggestion
is_syncing: true, // assume syncing until on_tick proves otherwise
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: docs/metrics.md
Line: 71-72
Comment:
**Missing blank line before section heading**
The blank line that previously separated the Validator Metrics table from the `## Network Metrics` heading was removed when the new row was inserted. Without it, some Markdown renderers merge the heading into the table or render it incorrectly.
```suggestion
|`lean_is_syncing`| Gauge | Whether the node is currently syncing. True=1, False=0 | On sync state change | | ✅ |
## Network Metrics
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "Skip validator duties while syncing (#23..." | Re-trigger Greptile
| pending_blocks: HashMap::new(), | ||
| is_aggregator, | ||
| pending_block_parents: HashMap::new(), | ||
| is_syncing: true, // assume syncing until on_tick proves otherwise |
There was a problem hiding this comment.
lean_is_syncing metric starts at 0 despite is_syncing = true
is_syncing is initialized to true here, but metrics::set_is_syncing(true) is never called at this point. Prometheus initialises IntGauge to 0 by default, and metrics::set_is_syncing is only called inside the if now_syncing != self.is_syncing branch, which fires on transitions. Because the field starts at true, the very first tick where the node is still syncing finds now_syncing == self.is_syncing (both true) and skips the branch — so the metric stays at 0 (not-syncing) for the entire initial sync.
The fix is to initialize the metric at startup, just as set_is_aggregator is called in BlockChain::spawn:
// In BlockChain::spawn(), alongside set_is_aggregator:
metrics::set_is_syncing(true);| is_syncing: true, // assume syncing until on_tick proves otherwise | |
| is_syncing: true, // assume syncing until on_tick proves otherwise |
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/src/lib.rs
Line: 58
Comment:
**`lean_is_syncing` metric starts at 0 despite `is_syncing = true`**
`is_syncing` is initialized to `true` here, but `metrics::set_is_syncing(true)` is never called at this point. Prometheus initialises `IntGauge` to `0` by default, and `metrics::set_is_syncing` is only called inside the `if now_syncing != self.is_syncing` branch, which fires on *transitions*. Because the field starts at `true`, the very first tick where the node is still syncing finds `now_syncing == self.is_syncing` (both `true`) and skips the branch — so the metric stays at `0` (not-syncing) for the entire initial sync.
The fix is to initialize the metric at startup, just as `set_is_aggregator` is called in `BlockChain::spawn`:
```rust
// In BlockChain::spawn(), alongside set_is_aggregator:
metrics::set_is_syncing(true);
```
```suggestion
is_syncing: true, // assume syncing until on_tick proves otherwise
```
How can I resolve this? If you propose a fix, please make it concise.| |`lean_is_syncing`| Gauge | Whether the node is currently syncing. True=1, False=0 | On sync state change | | ✅ | | ||
| ## Network Metrics |
There was a problem hiding this comment.
Missing blank line before section heading
The blank line that previously separated the Validator Metrics table from the ## Network Metrics heading was removed when the new row was inserted. Without it, some Markdown renderers merge the heading into the table or render it incorrectly.
| |`lean_is_syncing`| Gauge | Whether the node is currently syncing. True=1, False=0 | On sync state change | | ✅ | | |
| ## Network Metrics | |
| |`lean_is_syncing`| Gauge | Whether the node is currently syncing. True=1, False=0 | On sync state change | | ✅ | | |
| ## Network Metrics |
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/metrics.md
Line: 71-72
Comment:
**Missing blank line before section heading**
The blank line that previously separated the Validator Metrics table from the `## Network Metrics` heading was removed when the new row was inserted. Without it, some Markdown renderers merge the heading into the table or render it incorrectly.
```suggestion
|`lean_is_syncing`| Gauge | Whether the node is currently syncing. True=1, False=0 | On sync state change | | ✅ |
## Network Metrics
```
How can I resolve this? If you propose a fix, please make it concise.
Summary
During sync, the node doesn't have a complete view of the chain. Proposing blocks or producing attestations in this state would cast uninformed votes and potentially harm the network.
This PR adds sync detection to
BlockChainServerand suppresses validator duties while the node is catching up.Changes
crates/blockchain/src/lib.rsSYNC_TOLERANCE_SLOTS = 2constant: the number of slots our head may lag the wall-clock slot before duties are suppressedis_syncing: boolfield toBlockChainServer, initializedtrueon_tick, comparestore.head_slot()against the current slot on every tick; flipis_syncingand log once per transition to avoid spamproposer_validator_idcomputation andproduce_attestationsbehind!self.is_syncingcrates/blockchain/src/metrics.rslean_is_syncingIntGaugeinit()so it appears in/metricsfrom startupset_is_syncing()public functiondocs/metrics.mdlean_is_syncingin the Validator Metrics tableHow to verify
Run a local devnet and observe the logs on startup:
Verify
lean_is_syncingappears in the/metricsendpoint and transitions from1to0once the node catches up.Closes #236