Commits · v0.25.0 · Synced / Neon Autoscaling

This project is mirrored from https://github.com/neondatabase/autoscaling. Pull mirroring updated 19 minutes ago.

Feb 16, 2024

Bump version: v0.24.0 -> v0.25.0 · b961dc87
Em Sharnoff authored 11 months ago

v0.25.0

b961dc87

neonvm-controller: Add reconcile metrics debug endpoint (#798) · f332ae36

Em Sharnoff authored 11 months ago

Similar in spirit to the "dump state" endpoints exposed by the
autsocaler-agent and scheduler plugin.

This one is on port 7778 (+1 from the pprof server on 7777).
The state can be fetched with:

  kubectl port-forward -n neonvm-system pod/<pod-name> 7778:7778 &
  curl http://localhost:7778/

f332ae36

neonvm: Tune QEMU disk cache settings (#776) · 76d5808d

Em Sharnoff authored 11 months ago


Adds the '--qemu-disk-cache-settings' flag to neonvm-controller and
neonvm-runner. The default is 'cache=none', but the one we'd like to use
is 'cache.writeback=on,cache.direct=on,cache.no-flush=on'.

The neonvm-controller flag is directly propagated to all new VM runner
pods, as they're created.

Resolves #775, refer there for more information.

Co-authored-by: Oleg Vasilev <oleg@neon.tech>

76d5808d

runner: log forward: wait for QEMU startup and prevent panic (#811) · 79a96a12

Oleg Vasilev authored 11 months ago


Previously, if QEMU failed to startup, the log output contained either:
1. Panic, because of `reader==nil`
2. `"error":"dial unix /vm/log.sock: connect: no such file or
directory","stacktrace":"main.forwardLogs.func1\n\t/workspace/neonvm/runner/main.go:897\nmain.forwardLogs\n\t/workspace/neonvm/runner/main.go:921"`

Which was confusing. Not it will have:


`{"level":"warn","ts":1707993260.548491,"logger":"neonvm-runner","caller":"runner/main.go:897","msg":"QEMU
shut down too soon to start forwarding logs"}`

Panic was found by the Discord user @ido6668.

Fixes #791

---------

Signed-off-by: Oleg Vasilev <oleg@neon.tech>
Co-authored-by: Em Sharnoff <sharnoff@neon.tech>

79a96a12

neonvm: merge init-rootdisk and sysctl containers (#769) · 72c52201

Oleg Vasilev authored 11 months ago


In an effort to reduce pressure on containerd, this reduces the number
of init containers.

This might be a preliminary step on the path to get rid of init
containers entirely.


Part of #747

---------

Signed-off-by: Oleg Vasilev <oleg@neon.tech>

72c52201

Remove old kernel path `neonvm/hack/vmlinuz` (#731) · 4df9d36c

Alexander Bayandin authored 11 months ago

[0] has moved `vmlinuz` directory from `neonvm/hack` 
to `neonvm/hack/kernel`. This PR deletes the old path 
from the code.

Ref: [1]

- [0] https://github.com/neondatabase/autoscaling/pull/715
- [1] https://github.com/neondatabase/autoscaling/pull/715#discussion_r1446239895

4df9d36c

neonvm: add InitScript to spec (#782) · 7db17a93

Oleg Vasilev authored 11 months ago


This allows to run arbitrary script on startup inside the main container
in the pod. Meant to replace init container with iptables
initialization.

Part of #747

---------

Signed-off-by: Oleg Vasilev <oleg@neon.tech>

7db17a93

Feb 15, 2024

neonvm-controller: Fix field name typo in webhook validation (#793) · 7f1ba5ce

Em Sharnoff authored 11 months ago

Noticed while reviewing #782.

Maybe we want to go through the list more thoroughly, I think there's a
handful of fields missing.

7f1ba5ce

tests: Add e2e test for VM SSH (#799) · 90d549d4

Shayan Hosseini authored 11 months ago

Making sure once ssh is enabled, we create the SSH secret and can ssh
into the VM from the runner pod.

90d549d4

neonvm-controller: Enable container-mgr by default (#803) · 95caab16

Em Sharnoff authored 11 months ago

We should have this for dev at least. We'll probably need to be careful
not to prematurely enable it staging/prod, but that should be
straightforward enough.

95caab16

Clarify Agent-Scheduler protocol in the architecture docs (#802) · 42074f2d
Shayan Hosseini authored 11 months ago
```
Finish the sentence that ends prematurely
```
42074f2d

neonvm-controller: Bump CPU limits 4 -> 8 (#805) · 12592254

Em Sharnoff authored 11 months ago

We noticed higher than desired reconcile latency in prod because of CPU
throttling. Bumping CPU limits from 4 to 8 in larger regions
significantly reduced latency, without increasing average CPU usage.

ref https://neondb.slack.com/archives/C03TN5G758R/p1707686417925039

12592254

Feb 14, 2024

Add support for having a swap disk in VMs (#801) · d23218ad

Heikki Linnakangas authored 11 months ago

The swap disk needs to be configured in the VM spec.

This also sets the size of /dev/shm to match the size of swap (if it's
larger than the 1/2 of initial memory size, which is the Linux default).

See https://github.com/neondatabase/autoscaling/issues/800

. This doesn't
implement the  "autoscale if swapping" behavior yet.

Co-authored-by: Em Sharnoff <sharnoff@neon.tech>

d23218ad

Feb 13, 2024

plugin: Fix k8s status message if pod scheduling fails (#804) · fda961a8

Heikki Linnakangas authored 11 months ago

Saw this when I did "kubectl describe pod compute-...", on a VM pod:

    Events:
      Type     Reason            Age    From                 Message
      ----     ------            ----   ----                 -------
Warning FailedScheduling 4m53s autoscale-scheduler 0/1 nodes are
available: 1 Not enough resources for pod.
Warning FailedScheduling 4m51s autoscale-scheduler running Reserve
plugin "AutoscaleEnforcer": Not enough resources to reserve non-VM pod

That message is wrong, because this was a VM. Fix the message to not
specify whether it's a VM or non-VM pod.

fda961a8

Feb 12, 2024

plugin: Handle panics from agent requests (#785) · d46b3089

Em Sharnoff authored 11 months ago

Closes #760.

AFAIK this hasn't been an issue in the past, but as we're trying to
improve reliability, it's good to get this out of the way before it
becomes an issue.

Note that this PR is quite minimal - expanding the existing tech debt we
have around how the scheduler plugin handles HTTP requests.
It's probably ok *enough* for now. I don't expect we'll be making too
many changes to it in the near future. See also: #13.

Tested locally by forcing it to panic on every request:

diff --git a/pkg/plugin/run.go b/pkg/plugin/run.go
index 007554a..6da7728 100644
--- a/pkg/plugin/run.go
+++ b/pkg/plugin/run.go
@@ -262,8 +262,10 @@ func (e *AutoscaleEnforcer) handleAgentRequest(
 		}
 	}

-	pod.vm.mostRecentComputeUnit = &e.state.conf.ComputeUnit
-	return &resp, 200, nil
+	panic(errors.New("test panic!"))
+
+	// pod.vm.mostRecentComputeUnit = &e.state.conf.ComputeUnit
+	// return &resp, 200, nil
 }

 // getComputeUnitForResponse tries to return compute unit that the agent supports

The change appears to work as intended.

d46b3089

Feb 09, 2024

neonvm: Update VM's .Status.PodName immediately on API server (#797) · 79b0ddd4
Shayan Hosseini authored 11 months ago
```
also added e2e test for runner pod

Fixes #794

---------

Co-authored-by: Em Sharnoff <sharnoff@neon.tech>
```
v0.24.0

79b0ddd4

neonvm-controller: Don't retry update on conflict (#796) · f723811f

Em Sharnoff authored 11 months ago

The previous logic would:

1. Try to update the status
2. If we got a conflict, overwrite the current VirtualMachine object
equal to the state on the API server
3. Then, retry with the udpated object (without changing anything!)

This is basically doing extra work for nothing!

We discussed potentially changing this to overwrite the status on
conflict, but this could result in issues when operating on stale data
(e.g., overwriting .status.podName back to "" after it was already
created).

ref https://www.notion.so/neondatabase/Autoscaling-Team-Internal-Sync-179c7d597dbb4fe5b565d9c482d4d166
ref https://neondb.slack.com/archives/C03TN5G758R/p1707414998235459

f723811f

Feb 07, 2024

CI: Use `format('{0}', x) == 'true')` to check truthiness (#795) · 52a883bf

Em Sharnoff authored 11 months ago

After the changes by #748 (adding separate image build workflows),
releasing hit some issues because `true != 'true'` and sometimes
workflow inputs are the string "true", whereas other times, they're the
*value* `true` [1] [2].

Hopefully this solution works. It's quite hacky though, so it'd be good
to find a clean solution (e.g., maybe `fromJSON` works?)

[1]: https://github.com/neondatabase/autoscaling/actions/runs/7818714553


[2]: actions/runner#1483

Co-authored-by: Alexander Bayandin <alexander@neon.tech>

52a883bf

Bump version: v0.23.x -> v0.24.0 · 6aecd7a4
Em Sharnoff authored 11 months ago

6aecd7a4

neonvm: VM startup metrics (#774) · 43b0f482

Shayan Hosseini authored 11 months ago

Implemented VM startup latency metrics.

The following metrics have been added:
1. `vm_creation_to_runner_creation_duration_seconds`: VM creation
timestamp to runner pod creation timestamp
2. `vm_runner_creation_to_vm_running_duration_seconds`: Runner pod
creation timestamp to the moment when NeonVM controller changes VM's
status to Running from Pending.
3. `vm_creation_to_vm_running`: VM creation timestamp to the moment when
NeonVM controller change VM's status to Running from Pending.

Related to #759

43b0f482

util: use prometheus.Registerer in RegisterMetrics (#792) · 22dab33e
Shayan Hosseini authored 11 months ago
```
Using the interface allows to use other implementations as well.
```
22dab33e

Feb 06, 2024

agent/billing: Fix unregistered metric lastSendDuration (#787) · 8f12d157

Em Sharnoff authored 11 months ago

While working on the "billing" section of the new autoscaler-agents
dashboard [1], I noticed that this metric is recorded but never actually
registered, hence not *actually* available.

[1]: https://neonprod.grafana.net/d/bdbt33ngwqc5cb

8f12d157

Avoid "dnsmasq: failed to create inotify .." errors. (#786) · 50ea3149

Heikki Linnakangas authored 11 months ago

When testing on my laptop, launching the VM sometimes failed with:

    dnsmasq: failed to create inotify: No file descriptors available

That can be fixed by raising the inotify limits, which are apparently
quite low by default. We did that in commit a17be2b0 for e2e tests,
although that was later removed because we changed the settings in the
runners instead.

But to avoid having to change those settings, we can use the --no-resolv
flag. When --no-resolv is given, and none of the dhcp-hostsdir
dhcp-optsdir, hostsdir options are used, dnsmasq doesn't use inotify at
all.

It's a bit silly that --no-resolv has that effect when DNS is disabled
(--port=0), because dnsmasq doesn't actually read resolv.conf when DNS
is disabled. I think that could be improved in dnsmasq. But in the
meanwhile. See the dnsmasq code where inotify is initialized:
https://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=blob;f=src/dnsmasq.c;h=ce897ae43483aba99bf59a90e67debfe08ed135d;hb=HEAD#l431

50ea3149

Feb 02, 2024

neonvm: Use crictl to change container CPU, ditch cgroup (#738) · d30687bb

Em Sharnoff authored 1 year ago

NB: This PR is conditionally enabled via the --enable-container-mgr flag
on neonvm-controller. There are no effects without that.

---

We recently realized[^1] that under cgroups v2, kubernetes uses cgroup
namespaces which has a few effects:

1. The output of /proc/self/cgroup shows as if the container were at the
   root of the hierarchy
2. It's very difficult for us to determine the actual cgroup that the
   container corresponds to on the host
3. We still can't directly create a cgroup in the container's namespace
   because /sys/fs/cgroup is mounted read-only

So, neonvm-runner currently *does not* work as expected with cgroups v2;
it creates a new cgroup for the VM, at the top of the hierarchy, and
doesn't clean it up on exit.

How do we fix this? The aim of this PR is to remove the special cgroup
handling entirely, and "just" go through the Container Runtime Interface
(CRI) exposed by containerd to modify the existing container we're
running in.

This requires access to /run/containerd/containerd.sock, which a
malicious user could use to perform priviledged operations on the host
(or in any other container on the host).
Obviously we'd like to prevent that as much as possible, so the CPU
handling is now runs alongside neonvm-runner as a separate container.
neonvm-runner does not have access to the containerd socket.

On the upside, one key benefit we get from this is being able to set cpu
shares, the abstraction underlying container resources.requests.
The other options weren't looking so great[^2], so if this works, this
would be a nice compromise.

[^1]: https://neondb.slack.com/archives/C03TN5G758R/p1705092611188719
[^2]: https://github.com/neondatabase/autoscaling/issues/591

d30687bb

neonvm-controller: Intermediate commit: Indent (#738) · 2b20c103
Em Sharnoff authored 1 year ago

2b20c103

neonvm: Add .status.restartCount, check it in e2e (#754) · eff3888c

Em Sharnoff authored 11 months ago

Summary of changes:

- Add `.status.restartCount`, type *int32
- restartCount is non-nil when .status.phase != "", incremented every
  subseqent time the VM enters the "Pending" phase
- `(*VirtualMachine).Cleanup()` no longer modifies `.status.phase`
- VirtualMachine restart handling sets .status.phase to "Pending" on
  restart, not ""
- in e2e tests, add `restartCount: 0` to all VM object assertions
- add `restart-counted` e2e test

This is a pre-req for backwards-compatibility testing (#580), both so
that we can ensure the VM doesn't slip to a newer neonvm-runner version
by restarting, and so that we don't end up with newer versions causing
triggering restarting.

eff3888c

neonvm-controller: Use --concurrency-limit=128 (#783) · d6e95c8d

Em Sharnoff authored 11 months ago

We noticed issues at at 8, then 16, then 32, and even 64 for some large
regions.

128 appears to be stable in prod (even though it's overcommitting
controller CPU limits 32:1).

ref https://neondb.slack.com/archives/C03TN5G758R/p1706725735037289?thread_ts=1706160071.213319

d6e95c8d

Feb 01, 2024

neonvm: Use container statuses, not pod phase, to trigger restart (#749) · 7f17032a

Em Sharnoff authored 11 months ago

Extracted from #738, which adds a second container to the runner pods.
Because of that second container, if only one container exits, the pod
will still have `.status.phase = Running`, so we need to proactively
notice that one of the containers has stopped and propagate that status
to the VM itself.

This also introduces some funky logic around how we handle restarts:
Because the `Succeeded` and `Failed` phases no longer imply that QMEU
itself has stopped, we need to explicitly wait until either the pod is
gone or the neonvm-runner container has stopped; otherwise we could end
up with >1 instance of the VM running at a time.

7f17032a

Jan 31, 2024

Fix comment typo "but and only" -> "but only" (#780) · 43a9b322
Em Sharnoff authored 11 months ago

43a9b322

plugin: Unify reserve and unreserve logic (#666) · 3fa3fa35

Em Sharnoff authored 11 months ago

We had the same thing implemented in a few places, with some TODOs to
unify them eventually - so, here it is.

Summary of changes:

- Merge all handling of pod/VM "start" / "reserve" logic to go through
  (*AutoscaleEnforcer).reserveResources()
  - Combined handlePodStarted/handleVMStarted -> handleStarted
- Merge all handling of pod/VM "deletion" / "unreserve" logic to go
  through (*AutoscaleEnforcer).unreserveResources()
- Pass around *corev1.Pod objects 

This also fixes the issue behind #435 ("Handle bypassed Reserve ...").

3fa3fa35

Jan 30, 2024

neonvm-controller: Make max concurrency configurable via CLI (#773) · 8c07eaed

Em Sharnoff authored 11 months ago

We noticed a couple prod regions are close to all capacity used w.r.t.
the existing concurrency limits.

This PR intends to provide a low-risk way to allow tweaking the precise
limit more quickly.

ref https://neondb.slack.com/archives/C03TN5G758R/p1706641824349959?thread_ts=1706160071.213319
ref https://neondb.slack.com/archives/C03TN5G758R/p1706646482943109

8c07eaed

runner: don't log i/o timeout (#768) · 86c9e53e

Oleg Vasilev authored 11 months ago


Last minute logging change in
  6addbdc5 (runner: pass logs through virtio-serial (#724), 2024-01-24)

Resulted in flooding logs with the following error message:
  "msg":"failed to read from log serial",
  "error":"read unix @->/vm/log.sock: i/o timeout",

Filtering for os.ErrDeadlineExceeded before logging fixes it.

Related to neondatabase/cloud#8602.

Signed-off-by: Oleg Vasilev <oleg@neon.tech>

86c9e53e

Jan 26, 2024

CI: Fix e2e-tests failure from empty $SHA on main (#767) · a7943db2

Em Sharnoff authored 1 year ago

ref https://github.com/neondatabase/autoscaling/actions/runs/7673566092

${{ github.event.pull_request }} doesn't exist for 'push' events

a7943db2

CI: Use separate workflows for image builds (#748) · ca9e9447

Em Sharnoff authored 1 year ago

Brief summary of changes:

1. Add new workflow `build-images.yaml` in that builds images and pushes
   them to dockerhub (e.g. neonvm-controller, autoscaler-agent, etc.)
2. Add new workflow `build-test-vm.yaml` that builds vm-builder and
   makes the postgres:15-bullseye VM image.
    - Also uploads vm-builder as an artifact if requested
3. In `e2e-test.yaml`, use images from (1) and (2)
    - Also uploads the rendered manifests as an artifact if requested
4. In `release.yaml`, use images from (1) and (2), run tests with (3),
   and use vm-builder and manifests from (1) and (3).
5. Adds `make load-example-vms` and equivalents, which load images
   without building

Refer to the PR description for more info.

ca9e9447

Jan 25, 2024

neonvm/runner: Skip QEMU powerdown if already exited (#526) · 951021f8

Em Sharnoff authored 1 year ago

This isn't the cause of any problems we're observing, but it should make
things a bit clearer in the future.

951021f8

neonvm-controller: Fix overwriting runner version (#753) · 170e529c

Em Sharnoff authored 1 year ago

Noticed while working on #738. In short, because the runner API version
was part of labelsForVirtualMachine, any update to the runner version
would be updated for *all* VM pods, not just new one.

This is (probably) not an issue in prod right now, but could be an issue
for future changes.

This PR fixes the behavior by adding the runner API version as an
explicit argument to labelsForVirtualMachine and ignoring the label in
updatePodMetadataIfNecessary.

170e529c

Enable SSH by default in the newly created VMs (#766) · 1f5aacc3
Shayan Hosseini authored 1 year ago

v0.23.1

1f5aacc3
release workflow: Add missing `secrets: inherit` to vm-kernel · 11787831
Em Sharnoff authored 1 year ago

v0.23.0

11787831
Bump version: v0.22.0 -> v0.23.0 · 4f277baf
Em Sharnoff authored 1 year ago

4f277baf

Jan 24, 2024

neonvm: Custom reconciler metrics (#757) · 09eec06c

Shayan Hosseini authored 1 year ago

Providing custom metrics for reconciler objects.

- `reconcile_failing_objects` represents the number of objects that are
failing to reconcile for each specific controller.

Fixes #247 (along with #739).

09eec06c