Em Sharnoff
authored
With upcoming compute pool changes, we're going to end up with a lot of VMs where the informant is unable to start up - because the initial file cache connection will fail until postgres is alive, which only happens once the pooled VM is bound to a particular endpoint. So on staging, we currently report a lot of "autoscaling stuck" VMs, when in reality these are just part of the pool. Having a separate value for the number of these stuck VMs that are actually running something will ensure our metrics continue to be useful. And also, in passing this through so that we can make a metric out of it, it's worth storing & logging the endpoint ID, so that the information is more easily available (without having to cross-reference the console DB)
Name | Last commit | Last update |
---|