feat: add active ballooning reclaim controller#160
feat: add active ballooning reclaim controller#160sjmiller609 wants to merge 20 commits intomainfrom
Conversation
✱ Stainless preview buildsThis PR will update the Edit this comment to update it. It will appear in the SDK's changelogs. ✅ hypeman-openapi studio · code · diff
✅ hypeman-go studio · code · diff
✅ hypeman-typescript studio · code · diff
This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push. |
|
Validated the feature end to end on What I checked:
The main issue I had to fix was that the manual Linux path could reuse a stale non-Linux embedded |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 4 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
config.example.darwin.yaml
Outdated
| protected_floor_percent: 50 | ||
| protected_floor_min_bytes: 512MB | ||
| min_adjustment_bytes: 64MB | ||
| per_vm_max_step_bytes: 256MB |
There was a problem hiding this comment.
Config defaults and example YAML use incompatible byte units
Medium Severity
The example YAML files use 512MB, 64MB, and 256MB while the Go code defaults use raw byte strings "536870912", "67108864", "268435456" (which are 512/64/256 MiB respectively). The c2h5oh/datasize library treats MB as SI megabytes (1,000,000 bytes), not mebibytes (1,048,576 bytes). Anyone copying the example config would get ~4.9% less memory for each threshold than the Go defaults intend, causing subtle behavioral differences between default and YAML-configured deployments.
Additional Locations (2)
| } else { | ||
| appliedTarget = candidate.currentTargetGuestBytes - minInt64(-delta, c.config.PerVMMaxStepBytes) | ||
| } | ||
| } |
There was a problem hiding this comment.
Step-size clamping uses stale delta after adjustment suppression
Medium Severity
After appliedTarget is reset to candidate.currentTargetGuestBytes by the min-adjustment or cooldown checks, the subsequent step-size clamping block still uses the original delta (computed from plannedTarget). When appliedTarget == candidate.currentTargetGuestBytes, the step-size branch condition appliedTarget != candidate.currentTargetGuestBytes is false, so this is currently benign, but the logic flow is fragile — if the cooldown or min-adjustment resets appliedTarget to anything other than currentTargetGuestBytes, the step-size clamp would use a stale delta.
lib/guestmemory/pressure_parse.go
Outdated
| parts := strings.Fields(line) | ||
| for i := 0; i < len(parts); i++ { | ||
| if parts[i] == "of" && i+1 < len(parts) { | ||
| n, err := strconv.ParseInt(strings.TrimSuffix(parts[i+1], " bytes)"), 10, 64) |
There was a problem hiding this comment.
Darwin vm_stat page size parsing fails on real output
Medium Severity
The parseDarwinVMStatOutput page-size parser uses strings.TrimSuffix(parts[i+1], " bytes)") but since parts comes from strings.Fields, it never contains spaces. The actual token for a line like "(page size of 16384 bytes)" will be "16384" followed by "bytes)" as separate fields. Trimming " bytes)" (with a leading space) from "16384" is a no-op, so the parse succeeds coincidentally, but the suffix removal is dead code. On systems where the token format differs slightly, the fallback to 4096 could silently produce wrong available-memory values.
| } | ||
| return HostPressureStatePressure | ||
| default: | ||
| if availablePercent <= highWatermark || sample.Stressed { |
There was a problem hiding this comment.
Pressure hysteresis uses <= where < matches docs
Low Severity
In nextPressureState, the healthy→pressure transition triggers when availablePercent <= highWatermark. This means available memory exactly at the high watermark (e.g., 10%) enters pressure state. However, the pressure→healthy exit condition uses >= lowWatermark. The asymmetry means available memory at exactly the high watermark threshold enters pressure, which may cause unnecessary pressure transitions on hosts hovering near the boundary, contradicting the hysteresis intent of avoiding flapping at thresholds.
- Change example YAML byte-size values from MB (decimal SI, 10^6) to MiB (binary, 2^20) so they match the Go default config which uses raw binary byte counts (e.g. 536870912 = 512 MiB). - Remove dead strings.TrimSuffix call in parseDarwinVMStatOutput; the " bytes)" suffix is never present after strings.Fields splits on whitespace. Addresses remaining Cursor Bugbot review findings on PR #160. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The c2h5oh/datasize library does not support MiB (binary IEC) suffixes. Use raw byte counts (e.g. 536870912 = 512*1024*1024) to match the Go default config and avoid the ~5% discrepancy from using MB (decimal SI). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>


Summary
lib/guestmemorywith pressure sampling, proportional reclaim, protected floors, and manual reclaim holdsPOST /resources/memory/reclaim, wire the controller through the API startup path, and document/configure the newhypervisor.memory.active_ballooningsettingsValidation
go test ./lib/guestmemory -count=1go test ./cmd/api/api -run 'TestReclaimMemory_' -count=1make test-guestmemory-vzmake test-guestmemory-linuxondeft-kernel-devfrom/home/sjmiller609/codex-active-ballooning-plan/hypemanNote
High Risk
Introduces a new background control loop that actively adjusts VM balloon targets and extends the core
hypervisor.Hypervisorinterface across all backends, which can affect VM stability and host memory behavior if misconfigured or buggy.Overview
Adds an active guest-memory ballooning controller (
lib/guestmemory) that samples host pressure (Linux/proc+ PSI, macOSvm_stat/memory_pressure), computes reclaim targets with hysteresis and protected floors, and applies per-VM balloon target changes with rate limits plus metrics/tracing/logging.Exposes manual reclaim via
POST /resources/memory/reclaim(new API handler + tests), wires aGuestMemoryControllerthrough DI and API startup, and adds config surfacehypervisor.memory.active_ballooningwith defaults, validation, and example YAML updates.Extends hypervisor backends (Cloud Hypervisor, QEMU, Firecracker, VZ/vz-shim) with runtime balloon
Get/SetTargetGuestMemoryBytes, adds socket-based cache keys and Linux PID resolution to stabilize runtime control, and updates guest-memory integration tests/Makefile targets for better determinism and CI robustness.Written by Cursor Bugbot for commit 400c9c7. This will update automatically on new commits. Configure here.