This project is mirrored from https://github.com/neondatabase/autoscaling.
Pull mirroring updated .
- Feb 16, 2024
-
-
Em Sharnoff authored
-
Em Sharnoff authored
Similar in spirit to the "dump state" endpoints exposed by the autsocaler-agent and scheduler plugin. This one is on port 7778 (+1 from the pprof server on 7777). The state can be fetched with: kubectl port-forward -n neonvm-system pod/<pod-name> 7778:7778 & curl http://localhost:7778/
-
Em Sharnoff authored
Adds the '--qemu-disk-cache-settings' flag to neonvm-controller and neonvm-runner. The default is 'cache=none', but the one we'd like to use is 'cache.writeback=on,cache.direct=on,cache.no-flush=on'. The neonvm-controller flag is directly propagated to all new VM runner pods, as they're created. Resolves #775, refer there for more information. Co-authored-by: Oleg Vasilev <oleg@neon.tech>
-
Oleg Vasilev authored
Previously, if QEMU failed to startup, the log output contained either: 1. Panic, because of `reader==nil` 2. `"error":"dial unix /vm/log.sock: connect: no such file or directory","stacktrace":"main.forwardLogs.func1\n\t/workspace/neonvm/runner/main.go:897\nmain.forwardLogs\n\t/workspace/neonvm/runner/main.go:921"` Which was confusing. Not it will have: `{"level":"warn","ts":1707993260.548491,"logger":"neonvm-runner","caller":"runner/main.go:897","msg":"QEMU shut down too soon to start forwarding logs"}` Panic was found by the Discord user @ido6668. Fixes #791 --------- Signed-off-by: Oleg Vasilev <oleg@neon.tech> Co-authored-by: Em Sharnoff <sharnoff@neon.tech>
-
Oleg Vasilev authored
In an effort to reduce pressure on containerd, this reduces the number of init containers. This might be a preliminary step on the path to get rid of init containers entirely. Part of #747 --------- Signed-off-by: Oleg Vasilev <oleg@neon.tech>
-
Alexander Bayandin authored
[0] has moved `vmlinuz` directory from `neonvm/hack` to `neonvm/hack/kernel`. This PR deletes the old path from the code. Ref: [1] - [0] https://github.com/neondatabase/autoscaling/pull/715 - [1] https://github.com/neondatabase/autoscaling/pull/715#discussion_r1446239895
-
Oleg Vasilev authored
This allows to run arbitrary script on startup inside the main container in the pod. Meant to replace init container with iptables initialization. Part of #747 --------- Signed-off-by: Oleg Vasilev <oleg@neon.tech>
-
- Feb 15, 2024
-
-
Em Sharnoff authored
Noticed while reviewing #782. Maybe we want to go through the list more thoroughly, I think there's a handful of fields missing.
-
Shayan Hosseini authored
Making sure once ssh is enabled, we create the SSH secret and can ssh into the VM from the runner pod.
-
Em Sharnoff authored
We should have this for dev at least. We'll probably need to be careful not to prematurely enable it staging/prod, but that should be straightforward enough.
-
Shayan Hosseini authored
Finish the sentence that ends prematurely
-
Em Sharnoff authored
We noticed higher than desired reconcile latency in prod because of CPU throttling. Bumping CPU limits from 4 to 8 in larger regions significantly reduced latency, without increasing average CPU usage. ref https://neondb.slack.com/archives/C03TN5G758R/p1707686417925039
-
- Feb 14, 2024
-
-
Heikki Linnakangas authored
The swap disk needs to be configured in the VM spec. This also sets the size of /dev/shm to match the size of swap (if it's larger than the 1/2 of initial memory size, which is the Linux default). See https://github.com/neondatabase/autoscaling/issues/800 . This doesn't implement the "autoscale if swapping" behavior yet. Co-authored-by: Em Sharnoff <sharnoff@neon.tech>
-
- Feb 13, 2024
-
-
Heikki Linnakangas authored
Saw this when I did "kubectl describe pod compute-...", on a VM pod: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 4m53s autoscale-scheduler 0/1 nodes are available: 1 Not enough resources for pod. Warning FailedScheduling 4m51s autoscale-scheduler running Reserve plugin "AutoscaleEnforcer": Not enough resources to reserve non-VM pod That message is wrong, because this was a VM. Fix the message to not specify whether it's a VM or non-VM pod.
-
- Feb 12, 2024
-
-
Em Sharnoff authored
Closes #760. AFAIK this hasn't been an issue in the past, but as we're trying to improve reliability, it's good to get this out of the way before it becomes an issue. Note that this PR is quite minimal - expanding the existing tech debt we have around how the scheduler plugin handles HTTP requests. It's probably ok *enough* for now. I don't expect we'll be making too many changes to it in the near future. See also: #13. Tested locally by forcing it to panic on every request: diff --git a/pkg/plugin/run.go b/pkg/plugin/run.go index 007554a..6da7728 100644 --- a/pkg/plugin/run.go +++ b/pkg/plugin/run.go @@ -262,8 +262,10 @@ func (e *AutoscaleEnforcer) handleAgentRequest( } } - pod.vm.mostRecentComputeUnit = &e.state.conf.ComputeUnit - return &resp, 200, nil + panic(errors.New("test panic!")) + + // pod.vm.mostRecentComputeUnit = &e.state.conf.ComputeUnit + // return &resp, 200, nil } // getComputeUnitForResponse tries to return compute unit that the agent supports The change appears to work as intended.
-
- Feb 09, 2024
-
-
Shayan Hosseini authored
also added e2e test for runner pod Fixes #794 --------- Co-authored-by: Em Sharnoff <sharnoff@neon.tech>
-
Em Sharnoff authored
The previous logic would: 1. Try to update the status 2. If we got a conflict, overwrite the current VirtualMachine object equal to the state on the API server 3. Then, retry with the udpated object (without changing anything!) This is basically doing extra work for nothing! We discussed potentially changing this to overwrite the status on conflict, but this could result in issues when operating on stale data (e.g., overwriting .status.podName back to "" after it was already created). ref https://www.notion.so/neondatabase/Autoscaling-Team-Internal-Sync-179c7d597dbb4fe5b565d9c482d4d166 ref https://neondb.slack.com/archives/C03TN5G758R/p1707414998235459
-
- Feb 07, 2024
-
-
Em Sharnoff authored
After the changes by #748 (adding separate image build workflows), releasing hit some issues because `true != 'true'` and sometimes workflow inputs are the string "true", whereas other times, they're the *value* `true` [1] [2]. Hopefully this solution works. It's quite hacky though, so it'd be good to find a clean solution (e.g., maybe `fromJSON` works?) [1]: https://github.com/neondatabase/autoscaling/actions/runs/7818714553 [2]: actions/runner#1483 Co-authored-by: Alexander Bayandin <alexander@neon.tech>
-
Em Sharnoff authored
-
Shayan Hosseini authored
Implemented VM startup latency metrics. The following metrics have been added: 1. `vm_creation_to_runner_creation_duration_seconds`: VM creation timestamp to runner pod creation timestamp 2. `vm_runner_creation_to_vm_running_duration_seconds`: Runner pod creation timestamp to the moment when NeonVM controller changes VM's status to Running from Pending. 3. `vm_creation_to_vm_running`: VM creation timestamp to the moment when NeonVM controller change VM's status to Running from Pending. Related to #759
-
Shayan Hosseini authored
Using the interface allows to use other implementations as well.
-
- Feb 06, 2024
-
-
Em Sharnoff authored
While working on the "billing" section of the new autoscaler-agents dashboard [1], I noticed that this metric is recorded but never actually registered, hence not *actually* available. [1]: https://neonprod.grafana.net/d/bdbt33ngwqc5cb
-
Heikki Linnakangas authored
When testing on my laptop, launching the VM sometimes failed with: dnsmasq: failed to create inotify: No file descriptors available That can be fixed by raising the inotify limits, which are apparently quite low by default. We did that in commit a17be2b0 for e2e tests, although that was later removed because we changed the settings in the runners instead. But to avoid having to change those settings, we can use the --no-resolv flag. When --no-resolv is given, and none of the dhcp-hostsdir dhcp-optsdir, hostsdir options are used, dnsmasq doesn't use inotify at all. It's a bit silly that --no-resolv has that effect when DNS is disabled (--port=0), because dnsmasq doesn't actually read resolv.conf when DNS is disabled. I think that could be improved in dnsmasq. But in the meanwhile. See the dnsmasq code where inotify is initialized: https://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=blob;f=src/dnsmasq.c;h=ce897ae43483aba99bf59a90e67debfe08ed135d;hb=HEAD#l431
-
- Feb 02, 2024
-
-
Em Sharnoff authored
NB: This PR is conditionally enabled via the --enable-container-mgr flag on neonvm-controller. There are no effects without that. --- We recently realized[^1] that under cgroups v2, kubernetes uses cgroup namespaces which has a few effects: 1. The output of /proc/self/cgroup shows as if the container were at the root of the hierarchy 2. It's very difficult for us to determine the actual cgroup that the container corresponds to on the host 3. We still can't directly create a cgroup in the container's namespace because /sys/fs/cgroup is mounted read-only So, neonvm-runner currently *does not* work as expected with cgroups v2; it creates a new cgroup for the VM, at the top of the hierarchy, and doesn't clean it up on exit. How do we fix this? The aim of this PR is to remove the special cgroup handling entirely, and "just" go through the Container Runtime Interface (CRI) exposed by containerd to modify the existing container we're running in. This requires access to /run/containerd/containerd.sock, which a malicious user could use to perform priviledged operations on the host (or in any other container on the host). Obviously we'd like to prevent that as much as possible, so the CPU handling is now runs alongside neonvm-runner as a separate container. neonvm-runner does not have access to the containerd socket. On the upside, one key benefit we get from this is being able to set cpu shares, the abstraction underlying container resources.requests. The other options weren't looking so great[^2], so if this works, this would be a nice compromise. [^1]: https://neondb.slack.com/archives/C03TN5G758R/p1705092611188719 [^2]: https://github.com/neondatabase/autoscaling/issues/591
-
Em Sharnoff authored
-
Em Sharnoff authored
Summary of changes: - Add `.status.restartCount`, type *int32 - restartCount is non-nil when .status.phase != "", incremented every subseqent time the VM enters the "Pending" phase - `(*VirtualMachine).Cleanup()` no longer modifies `.status.phase` - VirtualMachine restart handling sets .status.phase to "Pending" on restart, not "" - in e2e tests, add `restartCount: 0` to all VM object assertions - add `restart-counted` e2e test This is a pre-req for backwards-compatibility testing (#580), both so that we can ensure the VM doesn't slip to a newer neonvm-runner version by restarting, and so that we don't end up with newer versions causing triggering restarting.
-
Em Sharnoff authored
We noticed issues at at 8, then 16, then 32, and even 64 for some large regions. 128 appears to be stable in prod (even though it's overcommitting controller CPU limits 32:1). ref https://neondb.slack.com/archives/C03TN5G758R/p1706725735037289?thread_ts=1706160071.213319
-
- Feb 01, 2024
-
-
Em Sharnoff authored
Extracted from #738, which adds a second container to the runner pods. Because of that second container, if only one container exits, the pod will still have `.status.phase = Running`, so we need to proactively notice that one of the containers has stopped and propagate that status to the VM itself. This also introduces some funky logic around how we handle restarts: Because the `Succeeded` and `Failed` phases no longer imply that QMEU itself has stopped, we need to explicitly wait until either the pod is gone or the neonvm-runner container has stopped; otherwise we could end up with >1 instance of the VM running at a time.
-
- Jan 31, 2024
-
-
Em Sharnoff authored
-
Em Sharnoff authored
We had the same thing implemented in a few places, with some TODOs to unify them eventually - so, here it is. Summary of changes: - Merge all handling of pod/VM "start" / "reserve" logic to go through (*AutoscaleEnforcer).reserveResources() - Combined handlePodStarted/handleVMStarted -> handleStarted - Merge all handling of pod/VM "deletion" / "unreserve" logic to go through (*AutoscaleEnforcer).unreserveResources() - Pass around *corev1.Pod objects This also fixes the issue behind #435 ("Handle bypassed Reserve ...").
-
- Jan 30, 2024
-
-
Em Sharnoff authored
We noticed a couple prod regions are close to all capacity used w.r.t. the existing concurrency limits. This PR intends to provide a low-risk way to allow tweaking the precise limit more quickly. ref https://neondb.slack.com/archives/C03TN5G758R/p1706641824349959?thread_ts=1706160071.213319 ref https://neondb.slack.com/archives/C03TN5G758R/p1706646482943109
-
Oleg Vasilev authored
Last minute logging change in 6addbdc5 (runner: pass logs through virtio-serial (#724), 2024-01-24) Resulted in flooding logs with the following error message: "msg":"failed to read from log serial", "error":"read unix @->/vm/log.sock: i/o timeout", Filtering for os.ErrDeadlineExceeded before logging fixes it. Related to neondatabase/cloud#8602. Signed-off-by: Oleg Vasilev <oleg@neon.tech>
-
- Jan 26, 2024
-
-
Em Sharnoff authored
ref https://github.com/neondatabase/autoscaling/actions/runs/7673566092 ${{ github.event.pull_request }} doesn't exist for 'push' events
-
Em Sharnoff authored
Brief summary of changes: 1. Add new workflow `build-images.yaml` in that builds images and pushes them to dockerhub (e.g. neonvm-controller, autoscaler-agent, etc.) 2. Add new workflow `build-test-vm.yaml` that builds vm-builder and makes the postgres:15-bullseye VM image. - Also uploads vm-builder as an artifact if requested 3. In `e2e-test.yaml`, use images from (1) and (2) - Also uploads the rendered manifests as an artifact if requested 4. In `release.yaml`, use images from (1) and (2), run tests with (3), and use vm-builder and manifests from (1) and (3). 5. Adds `make load-example-vms` and equivalents, which load images without building Refer to the PR description for more info.
-
- Jan 25, 2024
-
-
Em Sharnoff authored
This isn't the cause of any problems we're observing, but it should make things a bit clearer in the future.
-
Em Sharnoff authored
Noticed while working on #738. In short, because the runner API version was part of labelsForVirtualMachine, any update to the runner version would be updated for *all* VM pods, not just new one. This is (probably) not an issue in prod right now, but could be an issue for future changes. This PR fixes the behavior by adding the runner API version as an explicit argument to labelsForVirtualMachine and ignoring the label in updatePodMetadataIfNecessary.
-
Shayan Hosseini authored
-
Em Sharnoff authored
-
Em Sharnoff authored
-
- Jan 24, 2024
-
-
Shayan Hosseini authored
Providing custom metrics for reconciler objects. - `reconcile_failing_objects` represents the number of objects that are failing to reconcile for each specific controller. Fixes #247 (along with #739).
-