This project is mirrored from https://github.com/neondatabase/autoscaling.
Pull mirroring updated .
-
v0.13.60d335208 · ·
v0.13.6 Hotfix release fixing the plugin's incorrect usage of fun/pubsub.Queue that caused some events to be dropped.
-
v0.13.53ff49732 · ·
v0.13.5 Hotfix release fixing the plugin's Score method so that it takes into account actual resource usage. This release only contains a backport of the fix from #426.
-
-
-
v0.13.495b83f30 · ·
v0.13.4 Hotfix release to fix two issues: 1. Scheduler plugin's Filter logic was incorrectly counting Completed pods into the usage calculations. 2. Scheduler plugin's node state 'Buffer' field was always underflowing. The fixes were in #423 and #424, respectively.
-
v0.13.3a010f282 · ·
v0.13.3 Another small release, with a minor improvement to the plugin's method call metrics, so we can avoid tripping alerts for overprovisioning pods. Change was in #422, nothing else included.
-
v0.13.2076db28d · ·
v0.13.2 Another small release, primarily to fix the scheduler's handling of overprovisioning pods. Also contains a minor improvement. Fixes: - plugin: Fix handling of ignored namespaces during Filter (#416, #418) Other changes: - plugin: Emit k8s Event on failed ExtractVmInfo (#408) - Should help with observability for certain failures. Upgrade path from v0.12.x / v0.13.x: - No ordering requirements.
-
v0.13.1f8330227 · ·
v0.13.1 Small release, primarily to fix a leak in the scheduler plugin. Also contains other bugfixes. Fixes: - plugin: Memory leak (#415) - plugin: Missing node metrics during initial load (#410) - neonvm/runner: Missing error logs (#401) - neonvm/runner: Various log.Printf calls with unnecessary trailing newline (#401) - neonvm/controller, informant: Typos in error messages (#407) Upgrade path from v0.12.x / v0.13.0: - No ordering requirements.
-
v0.13.0909ae5ed · ·
v0.13.0 This relatively small release contains significant changes to existing behavior in both the autoscaler-agent and scheduler plugin. No breaking API changes (technically). Features: - agent: Memory-based scaling (#393) - Currently implemented in a similar manner to our load average-based scaling, via total memory usage, including the kernel. - plugin: Allow ignoring resource usage from namespaces (#399) - Carveout for 'overprovisioning' pods now that we're tracking everything. Fixes: - plugin: Improve plugin method logs (#405) - Previously, some notable metrics were being increased without suitable accompanying log messages. No protocol changes. Other changes: - plugin: Track all pods (#399) - Should make our accounting & metrics reporting much more accurate. - plugin: Remove 'System' reserved resources (#399) - No longer necessary, because we're tracking everything.
-
v0.12.287af5b1a · ·
v0.12.2 Small release, just containing #395 - a fix for #234, where the autoscaler-agent's per-VM Runner will panic when the scaling bounds decrease below the current usage. This was fast-tracked for release because of the impact on VM pools. It's not hard-blocking, but is significant enough that it's worth fixing beforehand.
-
v0.12.1129b29df · ·
v0.12.1 This release contains bugfixes and new metrics (along with some changes to existing ones). No breaking API changes. Features: - plugin: New migration-related metrics (#387): - autoscaling_plugin_migrations_created_total - autoscaling_plugin_migrations_deleted_total - autoscaling_plugin_migration_create_fails_total - autoscaling_plugin_migration_delete_fails_total - plugin: Include node group in node resource metrics (#382) - agent: agent->informant request metrics now include the endpoint (#380) Fixes: - Add vmscrape.yaml to release assets (#392) - plugin: Fix spurious "updated scaling bounds" logs (#391) - Incidentally, this *also* entirely fixes our handling of scaling bounds changes. - plugin: Migration handling reliability improvements (#387) - informant: Fix parent process stall when child dies quickly (#389) - agent: Fix NeonVM downscaling not showing up in metrics (#381) No protocol changes. No other changes. Upgrade path from v0.12.0: - No ordering requirements.
-
v0.12.00cd3cdbb · ·
v0.12.0 This release contains bugfixes (lots of them!), new metrics, and BREAKING CHANGES TO OLD METRICS. No breaking API changes. Features: - neonvm: Propagate label/annotation changes to runner pod(s) (#279) - agent: Add scaling metrics! (#334) - All of: - autoscaling_agent_scheduler_plugin_{requested,approved}_{cpu,mem}_change_total - autoscaling_agent_informant_{requested,approved}_{cpu,mem}_change_total - autoscaling_agent_neonvm_requested_{cpu,mem}_change_total - autoscaling_agent_neonvm_outbound_requests_total - plugin: Add per-node resource metrics (#363) - Two new metrics: - autoscaling_plugin_node_cpu_resources_current - autoscaling_plugin_node_mem_resources_current Fixes: - Add whereabouts.yaml to release assets (#348) - neonvm: Don't propagate kubectl's last-applied-configuration annotation (#344) - agent: Reset Runner endState on restart (#349) - This bug caused the agent's metrics to never show a previously-panicked Runner as recovered, even when it was. - agent/schedwatch: Fix spurious close (#352) - This bug was causing agents to be unable to recognize new schedulers. - plugin/watch: Remove redundant error wrapping (#358) - plugin: Fix filter cycle metrics (#356) - This REMOVES two metrics: - autoscaling_plugin_filter_cycle_successes_total - autoscaling_plugin_filter_cycle_rejections_total - See the PR for more details. - README: fix make commands to reflect kind/k3d (#365) - plugin: Cleanup state for deleted k8s Nodes (#361) - Should *hopefully* fix a particular memory leak, but it's not clear. - informant/filecache: Close DB connections (#367) - This was causing some users to be unable to connect to their database because the informant took all the connections. - This was already released as v0.11.1 - agent/billing: Move push logic into separate thread (#368) - This was preventing us from having more reasonable request timeouts (like... anything above 2s) No protocol changes. Other changes: - util/watch: More logs! (#351) - agent: Record neon/endpoint-id for each Runner if/when assigned (#353) - agent: Improve help message for autoscaling_agent_tracked_vms_current (#354) - agent/billing: Log IdempotencyKey of events (#366) - billing: Add x-trace-id header to requests (#372) Upgrade path from v0.11.0: - No ordering requirements, but considering the fixes to the agent's scheduler detection, it's probably worthwhile to update any agents first.
-
v0.11.10c17ef26 · ·
v0.11.1 Hotfix release, backporting #367 to fix a bug in the informant that caused it to never close DB connections when the file cache integration is enabled.
-
v0.11.0da7e37b0 · ·
v0.11.0 This release contains bugfixes, new features, and large changes to the NeonVM controller. Breaking API changes: - neonvm: VirtualMachine .spec.extraNetwork fields changed (#256) - Removed multusNetworkNoIP - Made multusNetwork omitempty - neonvm: VirtualMachineMigrations no longer have post-copy enabled by default (#256) Features: - neonvm: Two new VmPhase types: "PreMigrating" and "Scaling" (#256) - neonvm: Migration source runner pod now has an ownerref pointing back to the migration (#332) - ci: Added support for k3d (#340) - plugin: new metrics - autoscaling_plugin_filter_cycle_successes_total (#346) - autoscaling_plugin_filter_cycle_rejections_total (#346) - autoscaling_plugin_extension_call_fails_total (#347) Fixes: - scheduler: Fixed agent-handler log keys explosion (#338) - NB: this was already released as v0.10.1 - scheduler: Fixed missing `continue` when skipping completed pods (#342) - NB: this was already released as v0.10.2 - scheduler: Fixed outdated log line (#343) - Removed "[autoscale-enforcer] load state: " prefix from the message - agent: Do informant health checks even when suspended (#341) No protocol changes. Other changes: - ci: kind and kubectl versions tweaked (#336) - k8s deps upgraded to 1.25.11 (#339) - plugin: Capitalize pluginCalls metric labels (#345) There's even more changes to the NeonVM controller that aren't listed here. For more, see #256. Upgrade path from v0.10.x: - No ordering requirements.
-
v0.10.24cb6e477 · ·
v0.10.2 Hotfix release, backporting #342 to fix the scheduler plugin's handling of completed pods on startup.
-
v0.10.1353499fb · ·
v0.10.1 Hotfix release, backporting #338 to fix scheduler plugin logs for agent requests.
-
v0.10.09e8b63d6 · ·
v0.10.0 This release contains bugfixes, ???, and a breaking change to the agent<->informant protocol. Breaking API changes: - agent<->informant: Include AgentID in informant /downscale and /upscale (#316) - This bumps the agent<->informant protocol to v2. - The agent currently supports both versions, and will for the immediate future. Features: - neonvm/builder: Make output prettier (#280) - Start switch from klog -> zap [agent/plugin/informant] (#323) - All kinds of dashboards need updating. It's for the best. Fixes: - agent/informant: Fix inverted condition for logs (#315) - plugin: Handle usage updates for non-autoscaling VMs (#312) - plugin: Fix Unreserve condition (#317) - util/watch: Set failingCurrent gauge to zero so it shows up (#320) - neonvm: Fix default ports from Go client (#257) Protocol changes: - See above, re: informant agent<->informant changes. Other changes: - deploy: Change metrics scrape interval 10s -> 60s (#321) - neonvm/runner: Set AutomountServiceAccountToken = false (#298) - agent/billing: Use NeonVM .status.cpus, not .spec.guest.cpus.use (#325) Upgrade path from v0.9.0: - All autoscaler-agents must be upgraded before any vm-informants - No other requirements.
-
v0.9.03577fe6c · ·
v0.9.0 This release contains bugfixes and upgrades to Kubernetes 1.25. Breaking API changes: - Upgrading to K8s 1.25. NB: Autoscaling requires K8s control planes with a version equal or +1; i.e. K8s 1.25 OR 1.26 is not required. Features: - New metrics! (#306, #310) - Too many to cover here; refer to those PRs intead. Fixes: - util/watch: Fix race condition on k8s watch.Update events (#295) - agent/informant: Fix informant server exit logs (#286) - api: Fix ExtractVmInfo disallowing min > use or use > max (#303) - this one may be counterintuitive at first. See #249 for context - agent: Fix vmEvent formatting (#307) - informant: Suspend old agent *before* new one (#308) - util/watch: Fix racy behavior with InitModeDefer (#305) - This was causing billing events to not be generated for VMs until an event *after* startup occurs for them. - plugin: Allow overcommitted nodes on startup (#313) - agent: Stop SchedulerWatch when Runner finishes (#314) - This was preventing the switchover to a new scheduler on upgrade or restart Other changes: - Fix yaml formatting for autoscaler-agent config deploy (#300) No protocol changes. Upgrade path from v0.8.0: - No ordering requirements.
-
v0.8.001b88843 · ·
v0.8.0 This release contains bugfixes, a new component, minor public-facing API changes, and significant changes to the deployed services, but no inter-component API changes. Breaking API changes: - NeonVM: restart policy no longer applies directly to the pod (#293) Features: - Add patch for cluster-autoscaler compatability with VMs (#232) - NeonVM: implement RestartPolicy (#293) - NeonVM security and networking redesign (#245) - Runner pod no longer has Privileged: true - QEMU in the runner pod runs under its own user - Adapted generic-device-plugin for NeonVM, to give access to /dev/kvm and /dev/vhost-* - Switch from neonvm-vxlan-ipam to Whereabouts CNI -> Allows using overlay IP addresses in normal pods as well as VMs - Reconcile cycles improved - NeonVM/vm-builder: Add --enable-file-cache flag (default: off) (#265) - NeonVM: user RBAC roles (#284): - neonvm-virtualmachine-viewer-role - neonvm-virtualmachine-editor-role - neonvm-virtualmachinemigration-viewer-role - neonvm-virtualmachinemigration-editor-role - More logs for autoscaler-agent (#290, #291) - More autoscaler-agent metrics: - autoscaling_agent_runner_starts (#273) - autoscaling_agent_runner_restarts (#273) - autoscaling_agent_runner_fatal_errors_total (#274) - autoscaling_errored_vm_runners_current (#274) Fixes: - NeonVM/vm-builder: Fix command passthrough (#263) - NeonVM/vm-builder: Fix cgexec being ignored (#281) - NeonVM/vm-builder: Build without cgo (#255) - This removes the dependency on a dynamically loaded libc. - informant: Fix cgroup memory.high throttling (#223) - agent: Various logs fixes (#242, #267, #271, #272) - agent: Restart panicked/errored runners (#273) - agent/billing: Don't count VMs that aren't runnnig (#278) - agent, sched: Add ports to pod spec for metrics (#282) - agent, sched: Fix logging of MilliCPU (#261) - sched: Don't output command help on error (#253) - plugin: Handle completed pods as if deleted (#260) No protocol changes. Other changes: - Many unused RBAC (and other) items removed: - Namespace autoscaler-config (#245) - ClusterRole vm-view (#284) - ClusterRole vm-patcher (#284) - ClusterRoleBinding kube-system/autoscaler-vm-view (#284) - ClusterRoleBinding kube-system/autoscale-scheduler-as-vm-patcher (#284) - Role kube-system/autoscale-scheduler-config-reader (#284) - RoleBinding kube-system/autoscale-scheduler-config-reader (#284) - NeonVM: Rename 'runner' container to 'neonvm-runner' (#277) - agent: Network error metrics include root cause (#287) Upgrade path from v0.7.2: - No ordering requirements. - You may wish to remove old items as mentioned above.
-
v0.7.3-alpha3a008e8e8 · ·
v0.7.3-alpha3 This is a pre-release just for building and distributing images. Do not deploy anything from this release.