v0.7.0 This release contains bugfixes, new features, major public-facing API changes, *and* inter-component API changes. Live-upgrading is possible but must be done carefully. Read the "Upgrade path from v0.6.0" section at the end for more info. Breaking API changes: - Upgraded to Kubernetes 1.24 (#132) - VMs may have fractional CPU values (#172) Features: - Improve scaling bounds validation (#190) - Make api.ScalingBounds (for scaling annotations) public (#181) - informant: Respect max file cache size (#182) - agent: Add runner panics metrics (#180) - agent: Rework (improve!) scaling algorithm (#195) - In general, scaling should be much smoother now. There's still some work to do in this area (particularly around downscaling), but overall, a step that should be fairly impactful. - agent->informant health checks (#203) - Support for fractional CPU (#172) - !!! - NeonVM: Add current usage annotation to runner pod (#231) - NeonVM: Allow disabling service links (#235) Fixes: - VirtualMachineSpec.PodResources now sets the pod's resources (#138) - autoscaler-agents no longer produce logs about VM updates that aren't on their node (#186) - Fix NeonVM CRD still including VirtualMachineSpec.ServiceAccountName (#188) - plugin: Fix Unreserve verdict format string in logs (#206) - agent: Stop informant server when context canceled (#214) - This was the cause of a pretty notable goroutine leak that should now be fixed. See #196 - agent: Fix log for /unregister response (#224) - agent: Fix inverted 'ErrServerClosed' check (#225) - This may have been causing spurious error logs and silencing actual errors. - Add node affinity to NeonVM's kube-multus-ds DaemonSet (#236) - agent: Fix deadlock on invalid plugin response (#237) Protocol changes: - agent->informant health checks are now supported, but not required (#203) - NeonVM CRD now supports fractional CPU - all of min/use/max. (#172) - NeonVM controller -> runner makes requests to /cpu_current and /cpu_change endpoints to get/set fractional CPU via the runner's cgroup manipulations. (#172) - agent->plugin resource requests can now request fractional CPU (#172) - plugin->agent permits can now return fractional CPU (#172) - note: plugin does not return fractional CPU unless the agent supports it. This makes it possible to do upgrades without significant downtime. (#238) Other changes: - Upgraded to Go 1.20 (#130) - agent/metrics: Make request error labels self-consistent (#193) - Mark scheduler with `priorityClassName: system-cluster-critical` (#227) Upgrade path from v0.6.0: note: each step produces a "valid" state - the system will operate successfully. It is not recommended to stay in a partial upgrade for long, because they have not been tested as much. 1. Upgrade NeonVM controllers v0.6.0 -> v0.7.0 2. Upgrade autoscale-scheduler v0.6.0 -> v0.7.0 - note: it is ok to change to a compute unit with fractional CPU at this step! Old autoscaler-agents will be given a multiplied CU so it has an integer number of CPUs. 3. Upgrade autoscaler-agent v0.6.0 -> v0.7.0 note: Upgrading the vm-informant can be done at any point. Its protocol changes are opt-in.