Tags

Tags give the ability to mark specific points in history as being important

This project is mirrored from https://github.com/neondatabase/autoscaling. Pull mirroring updated Jan 16, 2025.

v0.13.6

0d335208 · BACKPORT: plugin: Fix event queue usage (#430) · Jul 20, 2023

v0.13.6

Hotfix release fixing the plugin's incorrect usage of fun/pubsub.Queue
that caused some events to be dropped.

v0.13.5

3ff49732 · BACKPORT: plugin: Fix scoring to use current resources (#426) · Jul 19, 2023

v0.13.5

Hotfix release fixing the plugin's Score method so that it takes into
account actual resource usage. This release only contains a backport of
the fix from #426.

list

c5d6577a · Update versioning with informant/monitor protocol · Jul 18, 2023
litst

c5d6577a · Update versioning with informant/monitor protocol · Jul 18, 2023

v0.13.4

95b83f30 · BACKPORT: plugin: Ignore completed pods in Filter (#423) · Jul 18, 2023

v0.13.4

Hotfix release to fix two issues:

1. Scheduler plugin's Filter logic was incorrectly counting Completed
   pods into the usage calculations.
2. Scheduler plugin's node state 'Buffer' field was always underflowing.

The fixes were in #423 and #424, respectively.

v0.13.3

a010f282 · Bump: v0.13.2 -> v0.13.3 · Jul 17, 2023

v0.13.3

Another small release, with a minor improvement to the plugin's method
call metrics, so we can avoid tripping alerts for overprovisioning pods.

Change was in #422, nothing else included.

v0.13.2

076db28d · Bump version: v0.13.1 -> v0.13.2 · Jul 17, 2023

v0.13.2

Another small release, primarily to fix the scheduler's handling of
overprovisioning pods. Also contains a minor improvement.

Fixes:

- plugin: Fix handling of ignored namespaces during Filter (#416, #418)

Other changes:

- plugin: Emit k8s Event on failed ExtractVmInfo (#408)
  - Should help with observability for certain failures.

Upgrade path from v0.12.x / v0.13.x:

- No ordering requirements.

v0.13.1

f8330227 · Bump version: v0.13.0 -> v0.13.1 · Jul 15, 2023

v0.13.1

Small release, primarily to fix a leak in the scheduler plugin.
Also contains other bugfixes.

Fixes:

- plugin: Memory leak (#415)
- plugin: Missing node metrics during initial load (#410)
- neonvm/runner: Missing error logs (#401)
- neonvm/runner: Various log.Printf calls with unnecessary trailing newline (#401)
- neonvm/controller, informant: Typos in error messages (#407)

Upgrade path from v0.12.x / v0.13.0:

- No ordering requirements.

v0.13.0

909ae5ed · Bump version: v0.12.2 -> v0.13.0 · Jul 13, 2023

v0.13.0

This relatively small release contains significant changes to existing
behavior in both the autoscaler-agent and scheduler plugin.

No breaking API changes (technically).

Features:

- agent: Memory-based scaling (#393)
  - Currently implemented in a similar manner to our load average-based
    scaling, via total memory usage, including the kernel.
- plugin: Allow ignoring resource usage from namespaces (#399)
  - Carveout for 'overprovisioning' pods now that we're tracking
    everything.

Fixes:

- plugin: Improve plugin method logs (#405)
  - Previously, some notable metrics were being increased without
    suitable accompanying log messages.

No protocol changes.

Other changes:

- plugin: Track all pods (#399)
  - Should make our accounting & metrics reporting much more accurate.
- plugin: Remove 'System' reserved resources (#399)
  - No longer necessary, because we're tracking everything.

v0.12.2

87af5b1a · Bump version: v0.12.1 -> v0.12.2 · Jul 11, 2023

v0.12.2

Small release, just containing #395 - a fix for #234, where the
autoscaler-agent's per-VM Runner will panic when the scaling bounds
decrease below the current usage.

This was fast-tracked for release because of the impact on VM pools.
It's not hard-blocking, but is significant enough that it's worth
fixing beforehand.

v0.12.1

129b29df · Bump version: v0.12.0 -> v0.12.1 · Jul 10, 2023

v0.12.1

This release contains bugfixes and new metrics (along with some changes
to existing ones).

No breaking API changes.

Features:

- plugin: New migration-related metrics (#387):
  - autoscaling_plugin_migrations_created_total
  - autoscaling_plugin_migrations_deleted_total
  - autoscaling_plugin_migration_create_fails_total
  - autoscaling_plugin_migration_delete_fails_total
- plugin: Include node group in node resource metrics (#382)
- agent: agent->informant request metrics now include the endpoint (#380)

Fixes:

- Add vmscrape.yaml to release assets (#392)
- plugin: Fix spurious "updated scaling bounds" logs (#391)
  - Incidentally, this *also* entirely fixes our handling of scaling
    bounds changes.
- plugin: Migration handling reliability improvements (#387)
- informant: Fix parent process stall when child dies quickly (#389)
- agent: Fix NeonVM downscaling not showing up in metrics (#381)

No protocol changes.

No other changes.

Upgrade path from v0.12.0:

- No ordering requirements.

v0.12.0

0cd3cdbb · Bump version: v0.11.0 -> v0.12.0 · Jul 03, 2023

v0.12.0

This release contains bugfixes (lots of them!), new metrics, and
BREAKING CHANGES TO OLD METRICS.

No breaking API changes.

Features:

- neonvm: Propagate label/annotation changes to runner pod(s) (#279)
- agent: Add scaling metrics! (#334)
  - All of:
    - autoscaling_agent_scheduler_plugin_{requested,approved}_{cpu,mem}_change_total
    - autoscaling_agent_informant_{requested,approved}_{cpu,mem}_change_total
    - autoscaling_agent_neonvm_requested_{cpu,mem}_change_total
    - autoscaling_agent_neonvm_outbound_requests_total
- plugin: Add per-node resource metrics (#363)
  - Two new metrics:
    - autoscaling_plugin_node_cpu_resources_current
    - autoscaling_plugin_node_mem_resources_current

Fixes:

- Add whereabouts.yaml to release assets (#348)
- neonvm: Don't propagate kubectl's last-applied-configuration annotation (#344)
- agent: Reset Runner endState on restart (#349)
  - This bug caused the agent's metrics to never show a
    previously-panicked Runner as recovered, even when it was.
- agent/schedwatch: Fix spurious close (#352)
  - This bug was causing agents to be unable to recognize new
    schedulers.
- plugin/watch: Remove redundant error wrapping (#358)
- plugin: Fix filter cycle metrics (#356)
  - This REMOVES two metrics:
    - autoscaling_plugin_filter_cycle_successes_total
    - autoscaling_plugin_filter_cycle_rejections_total
  - See the PR for more details.
- README: fix make commands to reflect kind/k3d (#365)
- plugin: Cleanup state for deleted k8s Nodes (#361)
  - Should *hopefully* fix a particular memory leak, but it's not clear.
- informant/filecache: Close DB connections (#367)
  - This was causing some users to be unable to connect to their
    database because the informant took all the connections.
  - This was already released as v0.11.1
- agent/billing: Move push logic into separate thread (#368)
  - This was preventing us from having more reasonable request timeouts
    (like... anything above 2s)

No protocol changes.

Other changes:

- util/watch: More logs! (#351)
- agent: Record neon/endpoint-id for each Runner if/when assigned (#353)
- agent: Improve help message for autoscaling_agent_tracked_vms_current (#354)
- agent/billing: Log IdempotencyKey of events (#366)
- billing: Add x-trace-id header to requests (#372)

Upgrade path from v0.11.0:

- No ordering requirements, but considering the fixes to the agent's
  scheduler detection, it's probably worthwhile to update any agents
  first.

v0.11.1

0c17ef26 · BACKPORT: informant/filecache: Close DB connections (#367) · Jun 30, 2023

v0.11.1

Hotfix release, backporting #367 to fix a bug in the informant that
caused it to never close DB connections when the file cache integration
is enabled.

v0.11.0

da7e37b0 · Bump version: v0.10.0 -> v0.11.0 · Jun 20, 2023

v0.11.0

This release contains bugfixes, new features, and large changes to the
NeonVM controller.

Breaking API changes:

- neonvm: VirtualMachine .spec.extraNetwork fields changed (#256)
  - Removed multusNetworkNoIP
  - Made multusNetwork omitempty
- neonvm: VirtualMachineMigrations no longer have post-copy enabled by default (#256)

Features:

- neonvm: Two new VmPhase types: "PreMigrating" and "Scaling" (#256)
- neonvm: Migration source runner pod now has an ownerref pointing back
  to the migration (#332)
- ci: Added support for k3d (#340)
- plugin: new metrics
  - autoscaling_plugin_filter_cycle_successes_total (#346)
  - autoscaling_plugin_filter_cycle_rejections_total (#346)
  - autoscaling_plugin_extension_call_fails_total (#347)

Fixes:

- scheduler: Fixed agent-handler log keys explosion (#338)
  - NB: this was already released as v0.10.1
- scheduler: Fixed missing `continue` when skipping completed pods (#342)
  - NB: this was already released as v0.10.2
- scheduler: Fixed outdated log line (#343)
  - Removed "[autoscale-enforcer] load state: " prefix from the message
- agent: Do informant health checks even when suspended (#341)

No protocol changes.

Other changes:

- ci: kind and kubectl versions tweaked (#336)
- k8s deps upgraded to 1.25.11 (#339)
- plugin: Capitalize pluginCalls metric labels (#345)

There's even more changes to the NeonVM controller that aren't listed
here. For more, see #256.

Upgrade path from v0.10.x:

- No ordering requirements.

v0.10.2

4cb6e477 · BACKPORT: plugin: Fix missing `continue` when skipping completed pods (#342) · Jun 19, 2023

v0.10.2

Hotfix release, backporting #342 to fix the scheduler plugin's handling
of completed pods on startup.

v0.10.1

353499fb · BACKPORT: plugin: Fix agent-handler log keys explosion (#338) · Jun 16, 2023
```
v0.10.1

Hotfix release, backporting #338 to fix scheduler plugin logs for agent
requests.
```

v0.10.0

9e8b63d6 · Bump version: v0.9.0 -> v0.10.0 · Jun 15, 2023

v0.10.0

This release contains bugfixes, ???, and a breaking change to the
agent<->informant protocol.

Breaking API changes:

- agent<->informant: Include AgentID in informant /downscale and /upscale (#316)
  - This bumps the agent<->informant protocol to v2.
  - The agent currently supports both versions, and will for the
    immediate future.

Features:

- neonvm/builder: Make output prettier (#280)
- Start switch from klog -> zap [agent/plugin/informant] (#323)
  - All kinds of dashboards need updating. It's for the best.

Fixes:

- agent/informant: Fix inverted condition for logs (#315)
- plugin: Handle usage updates for non-autoscaling VMs (#312)
- plugin: Fix Unreserve condition (#317)
- util/watch: Set failingCurrent gauge to zero so it shows up (#320)
- neonvm: Fix default ports from Go client (#257)

Protocol changes:

- See above, re: informant agent<->informant changes.

Other changes:

- deploy: Change metrics scrape interval 10s -> 60s (#321)
- neonvm/runner: Set AutomountServiceAccountToken = false (#298)
- agent/billing: Use NeonVM .status.cpus, not .spec.guest.cpus.use (#325)

Upgrade path from v0.9.0:

- All autoscaler-agents must be upgraded before any vm-informants
- No other requirements.

v0.9.0

3577fe6c · Bump version: v0.8.0 -> v0.9.0 · Jun 02, 2023

v0.9.0

This release contains bugfixes and upgrades to Kubernetes 1.25.

Breaking API changes:

- Upgrading to K8s 1.25. NB: Autoscaling requires K8s control planes
  with a version equal or +1; i.e. K8s 1.25 OR 1.26 is not required.

Features:

- New metrics! (#306, #310)
  - Too many to cover here; refer to those PRs intead.

Fixes:

- util/watch: Fix race condition on k8s watch.Update events (#295)
- agent/informant: Fix informant server exit logs (#286)
- api: Fix ExtractVmInfo disallowing min > use or use > max (#303)
  - this one may be counterintuitive at first. See #249 for context
- agent: Fix vmEvent formatting (#307)
- informant: Suspend old agent *before* new one (#308)
- util/watch: Fix racy behavior with InitModeDefer (#305)
  - This was causing billing events to not be generated for VMs until an
    event *after* startup occurs for them.
- plugin: Allow overcommitted nodes on startup (#313)
- agent: Stop SchedulerWatch when Runner finishes (#314)
  - This was preventing the switchover to a new scheduler on upgrade or
    restart

Other changes:

- Fix yaml formatting for autoscaler-agent config deploy (#300)

No protocol changes.

Upgrade path from v0.8.0:

- No ordering requirements.

v0.8.0

01b88843 · release workflow: Fix kustomize render step (#294) · May 24, 2023

v0.8.0

This release contains bugfixes, a new component, minor public-facing API
changes, and significant changes to the deployed services, but no
inter-component API changes.

Breaking API changes:

- NeonVM: restart policy no longer applies directly to the pod (#293)

Features:

- Add patch for cluster-autoscaler compatability with VMs (#232)
- NeonVM: implement RestartPolicy (#293)
- NeonVM security and networking redesign (#245)
  - Runner pod no longer has Privileged: true
  - QEMU in the runner pod runs under its own user
  - Adapted generic-device-plugin for NeonVM, to give access to /dev/kvm
    and /dev/vhost-*
  - Switch from neonvm-vxlan-ipam to Whereabouts CNI
    -> Allows using overlay IP addresses in normal pods as well as VMs
  - Reconcile cycles improved
- NeonVM/vm-builder: Add --enable-file-cache flag (default: off) (#265)
- NeonVM: user RBAC roles (#284):
  - neonvm-virtualmachine-viewer-role
  - neonvm-virtualmachine-editor-role
  - neonvm-virtualmachinemigration-viewer-role
  - neonvm-virtualmachinemigration-editor-role
- More logs for autoscaler-agent (#290, #291)
- More autoscaler-agent metrics:
  - autoscaling_agent_runner_starts   (#273)
  - autoscaling_agent_runner_restarts (#273)
  - autoscaling_agent_runner_fatal_errors_total (#274)
  - autoscaling_errored_vm_runners_current      (#274)

Fixes:

- NeonVM/vm-builder: Fix command passthrough (#263)
- NeonVM/vm-builder: Fix cgexec being ignored (#281)
- NeonVM/vm-builder: Build without cgo (#255)
  - This removes the dependency on a dynamically loaded libc.
- informant: Fix cgroup memory.high throttling (#223)
- agent: Various logs fixes (#242, #267, #271, #272)
- agent: Restart panicked/errored runners (#273)
- agent/billing: Don't count VMs that aren't runnnig (#278)
- agent, sched: Add ports to pod spec for metrics (#282)
- agent, sched: Fix logging of MilliCPU (#261)
- sched: Don't output command help on error (#253)
- plugin: Handle completed pods as if deleted (#260)

No protocol changes.

Other changes:

- Many unused RBAC (and other) items removed:
  - Namespace autoscaler-config (#245)
  - ClusterRole vm-view (#284)
  - ClusterRole vm-patcher (#284)
  - ClusterRoleBinding kube-system/autoscaler-vm-view (#284)
  - ClusterRoleBinding kube-system/autoscale-scheduler-as-vm-patcher (#284)
  - Role kube-system/autoscale-scheduler-config-reader (#284)
  - RoleBinding kube-system/autoscale-scheduler-config-reader (#284)
- NeonVM: Rename 'runner' container to 'neonvm-runner' (#277)
- agent: Network error metrics include root cause (#287)

Upgrade path from v0.7.2:

- No ordering requirements.
- You may wish to remove old items as mentioned above.

v0.7.3-alpha3

a008e8e8 · release workflow: Fix cluster-autoscaler build + tag, take 2 · May 22, 2023

v0.7.3-alpha3

This is a pre-release just for building and distributing images.
Do not deploy anything from this release.