Skip to content
Snippets Groups Projects
This project is mirrored from https://github.com/neondatabase/neon. Pull mirroring failed .
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer or owner.
Last successful update .
  1. Sep 12, 2023
  2. Sep 11, 2023
    • Arpad Müller's avatar
      Make File opening in VirtualFile async-compatible (#5280) · a18d6d9a
      Arpad Müller authored
      ## Problem
      
      Previously, we were using `observe_closure_duration` in `VirtualFile`
      file opening code, but this doesn't support async open operations, which
      we want to use as part of #4743.
      
      ## Summary of changes
      
      * Move the duration measurement from the `with_file` macro into a
      `observe_duration` macro.
      * Some smaller drive-by fixes to replace the old strings with the new
      variant names introduced by #5273
      
      Part of #4743, follow-up of #5247.
      a18d6d9a
    • Arpad Müller's avatar
      Use tokio locks in VirtualFile and turn with_file into macro (#5247) · 76cc8739
      Arpad Müller authored
      ## Problem
      
      For #4743, we want to convert everything up to the actual I/O operations
      of `VirtualFile` to `async fn`.
      
      ## Summary of changes
      
      This PR is the last change in a series of changes to `VirtualFile`:
      #5189, #5190, #5195, #5203, and #5224.
      
      It does the last preparations before the I/O operations are actually
      made async. We are doing the following things:
      
      * First, we change the locks for the file descriptor cache to tokio's
      locks that support Send. This is important when one wants to hold locks
      across await points (which we want to do), otherwise the Future won't be
      Send. Also, one shouldn't generally block in async code as executors
      don't like that.
      * Due to the lock change, we now take an approach for the `VirtualFile`
      destructors similar to the one proposed by #5122 for the page cache, to
      use `try_write`. Similarly to the situation in the linked PR, one can
      make an argument that if we are in the destructor and the slot has not
      been reused yet, we are the only user accessing the slot due to owning
      the lock mutably. It is still possible that we are not obtaining the
      lock, but the only cause for that is the clock algorithm touching the
      slot, which should be quite an unlikely occurence. For the instance of
      `try_write` failing, we spawn an async task to destroy the lock. As just
      argued however, most of the time the code path where we spawn the task
      should not be visited.
      * Lastly, we split `with_file` into a macro part, and a function part
      that contains most of the logic. The function part returns a lock
      object, that the macro uses. The macro exists to perform the operation
      in a more compact fashion, saving code from putting the lock into a
      variable and then doing the operation while measuring the time to run
      it. We take the locks approach because Rust has no support for async
      closures. One can make normal closures return a future, but that
      approach gets into lifetime issues the moment you want to pass data to
      these closures via parameters that has a lifetime (captures work). For
      details, see
      [this](https://smallcultfollowing.com/babysteps/blog/2023/03/29/thoughts-on-async-closures/)
      and
      [this](https://users.rust-lang.org/t/function-that-takes-an-async-closure/61663)
      link. In #5224, we ran into a similar problem with the `test_files`
      function, and we ended up passing the path and the `OpenOptions`
      by-value instead of by-ref, at the expense of a few extra copies. This
      can be done as the data is cheaply copyable, and we are in test code.
      But here, we are not, and while `File::try_clone` exists, it [issues
      system calls
      internally](https://github.com/rust-lang/rust/blob/1e746d7741d44551e9378daf13b8797322aa0b74/library/std/src/os/fd/owned.rs#L94-L111).
      Also, it would allocate an entirely new file descriptor, something that
      the fd cache was built to prevent.
      * We change the `STORAGE_IO_TIME` metrics to support async.
      
      Part of #4743.
      76cc8739
    • bojanserafimov's avatar
    • duguorong009's avatar
      fix(pageserver): update the `STORAGE_IO_TIME` metrics to avoid expensive operations (#5273) · d7fa2dba
      duguorong009 authored
      
      Introduce the `StorageIoOperation` enum, `StorageIoTime` struct, and
      `STORAGE_IO_TIME_METRIC` static which provides lockless access to
      histograms consumed by `VirtualFile`.
      
      Closes #5131
      
      Co-authored-by: default avatarJoonas Koivunen <joonas@neon.tech>
      d7fa2dba
    • Joonas Koivunen's avatar
      Misc test flakyness fixes (#5233) · a55a78a4
      Joonas Koivunen authored
      Assorted flakyness fixes from #5198, might not be flaky on `main`.
      
      Migrate some tests using neon_simple_env to just neon_env_builder and
      using initial_tenant to make flakyness understanding easier. (Did not
      understand the flakyness of
      `test_timeline_create_break_after_uninit_mark`.)
      
      `test_download_remote_layers_api` is flaky because we have no atomic
      "wait for WAL, checkpoint, wait for upload and do not receive any more
      WAL".
      
      `test_tenant_size` fixes are just boilerplate which should had always
      existed; we should wait for the tenant to be active. similarly for
      `test_timeline_delete`.
      
      `test_timeline_size_post_checkpoint` fails often for me with reading
      zero from metrics. Give it a few attempts.
      a55a78a4
  3. Sep 10, 2023
  4. Sep 09, 2023
    • Alexander Bayandin's avatar
      Create GitHub release from release tag (#5246) · 1ea93af5
      Alexander Bayandin authored
      ## Problem
      
      This PR creates a GitHub release from a release tag with an
      autogenerated changelog: https://github.com/neondatabase/neon/releases
      
      ## Summary of changes
      - Call GitHub API to create a release
      1ea93af5
    • Konstantin Knizhnik's avatar
      Ingore DISK_FULL error when performing availability check for client (#5010) · f64b338c
      Konstantin Knizhnik authored
      
      See #5001
      
      No space is what's expected if we're at size limit.
      Of course if SK incorrectly returned "no space", the availability check
      wouldn't fire.
      But  users would notice such a bug quite soon anyways.
      So  ignoring "no space" is the right trade-off.
      
      
      ## Problem
      
      ## Summary of changes
      
      ## Checklist before requesting a review
      
      - [ ] I have performed a self-review of my code.
      - [ ] If it is a core feature, I have added thorough tests.
      - [ ] Do we need to implement analytics? if so did you add the relevant
      metrics to the dashboard?
      - [ ] If this PR requires public announcement, mark it with
      /release-notes label and add several sentences in this section.
      
      ## Checklist before merging
      
      - [ ] Do not forget to reformat commit message to not include the above
      checklist
      
      ---------
      
      Co-authored-by: default avatarKonstantin Knizhnik <knizhnik@neon.tech>
      Co-authored-by: default avatarJoonas Koivunen <joonas@neon.tech>
      f64b338c
    • Konstantin Knizhnik's avatar
      Fix issues with reanabling LFC (#5209) · ba06ea26
      Konstantin Knizhnik authored
      refer #5208
      
      ## Problem
      
      See
      https://neondb.slack.com/archives/C03H1K0PGKH/p1693938336062439?thread_ts=1693928260.704799&cid=C03H1K0PGKH
      
      
      
      #5208 disable LFC forever in case of error. It is not good because the
      problem causing this error (for example ENOSPC) can be resolved anti
      will be nice to reenable it after fixing.
      
      Also #5208 disables LFC locally in one backend. But other backends may
      still see corrupted data.
      It should not cause problems right now with "permission denied" error
      because there should be no backend which is able to normally open LFC.
      But in case of out-of-disk-space error, other backend can read corrupted
      data.
      
      ## Summary of changes
      
      1. Cleanup hash table after error to prevent access to stale or
      corrupted data
      2. Perform disk write under exclusive lock (hoping it will not affect
      performance because usually write just copy data from user to system
      space)
      3. Use generations to prevent access to stale data in lfc_read
      
      ## Checklist before requesting a review
      
      - [ ] I have performed a self-review of my code.
      - [ ] If it is a core feature, I have added thorough tests.
      - [ ] Do we need to implement analytics? if so did you add the relevant
      metrics to the dashboard?
      - [ ] If this PR requires public announcement, mark it with
      /release-notes label and add several sentences in this section.
      
      ## Checklist before merging
      
      - [ ] Do not forget to reformat commit message to not include the above
      checklist
      
      ---------
      
      Co-authored-by: default avatarKonstantin Knizhnik <knizhnik@neon.tech>
      ba06ea26
  5. Sep 08, 2023
    • Joonas Koivunen's avatar
      fix: LocalFs root in test_compatibility is PosixPath('...') (#5261) · 6f28da17
      Joonas Koivunen authored
      I forgot a `str(...)` conversion in #5243. This lead to log lines such
      as:
      
      ```
      Using fs root 'PosixPath('/tmp/test_output/test_backward_compatibility[debug-pg14]/compatibility_snapshot/repo/local_fs_remote_storage/pageserver')' as a remote storage
      ```
      
      This surprisingly works, creating hierarchy of under current working
      directory (`repo_dir` for tests):
      - `PosixPath('`
        - `tmp` .. up until .. `local_fs_remote_storage`
          - `pageserver')`
      
      It should not work but right now test_compatibility.py tests finds local
      metadata and layers, which end up used. After #5172 when remote storage
      is the source of truth it will no longer work.
      6f28da17
    • Heikki Linnakangas's avatar
      Update rdkit to version 2023_03_03. (#5260) · 60050212
      Heikki Linnakangas authored
      It includes PostgreSQL 16 support.
      60050212
    • Joonas Koivunen's avatar
      rust-toolchain: use 1.72.0, same as CI (#5256) · 66633ef2
      Joonas Koivunen authored
      Switches everyone without an `rustup override` to 1.72.0.
      
      Code changes required already done in #5255.
      Depends on https://github.com/neondatabase/build/pull/65.
      66633ef2
    • Alexander Bayandin's avatar
      Miscellaneous fixes for tests-related things (#5259) · 028fbae1
      Alexander Bayandin authored
      
      ## Problem
      
      A bunch of fixes for different test-related things 
      
      ## Summary of changes
      - Fix test_runner/pg_clients (`subprocess_capture` return value has
      changed)
      - Do not run create-test-report if check-permissions failed for not
      cancelled jobs
      - Fix Code Coverage comment layout after flaky tests. Add another
      healing "\n"
      - test_compatibility: add an instruction for local run
      
      
      Co-authored-by: default avatarJoonas Koivunen <joonas@neon.tech>
      028fbae1
    • John Spray's avatar
      tests: enable multiple pageservers in `neon_local` and `neon_fixture` (#5231) · 7b6337db
      John Spray authored
      ## Problem
      
      Currently our testing environment only supports running a single
      pageserver at a time. This is insufficient for testing failover and
      migrations.
      - Dependency of writing tests for #5207 
      
      ## Summary of changes
      
      - `neon_local` and `neon_fixture` now handle multiple pageservers
      - This is a breaking change to the `.neon/config` format: any local
      environments will need recreating
      - Existing tests continue to work unchanged:
        - The default number of pageservers is 1
      - `NeonEnv.pageserver` is now a helper property that retrieves the first
      pageserver if there is only one, else throws.
      - Pageserver data directories are now at `.neon/pageserver_{n}` where n
      is 1,2,3...
      - Compatibility tests get some special casing to migrate neon_local
      configs: these are not meant to be backward/forward compatible, but they
      were treated that way by the test.
      7b6337db
    • Konstantin Knizhnik's avatar
      Perform throttling for concurrent build index which is done outside transaction (#5048) · 499d0707
      Konstantin Knizhnik authored
      See 
      https://neondb.slack.com/archives/C03H1K0PGKH/p1692550646191429
      
      
      
      ## Problem
      
      Build index concurrently is writing WAL outside transaction.
      `backpressure_throttling_impl` doesn't perform throttling for read-only
      transactions (not assigned XID).
      It cause huge write lag which can cause large delay of accessing the
      table.
      
      ## Summary of changes
      
      Looks at `PROC_IN_SAFE_IC` in process state set during concurrent index
      build.
       
      ## Checklist before requesting a review
      
      - [ ] I have performed a self-review of my code.
      - [ ] If it is a core feature, I have added thorough tests.
      - [ ] Do we need to implement analytics? if so did you add the relevant
      metrics to the dashboard?
      - [ ] If this PR requires public announcement, mark it with
      /release-notes label and add several sentences in this section.
      
      ## Checklist before merging
      
      - [ ] Do not forget to reformat commit message to not include the above
      checklist
      
      ---------
      
      Co-authored-by: default avatarKonstantin Knizhnik <knizhnik@neon.tech>
      Co-authored-by: default avatarHeikki Linnakangas <heikki@neon.tech>
      499d0707
    • Joonas Koivunen's avatar
      rust-1.72.0 changes (#5255) · 720d5973
      Joonas Koivunen authored
      Prepare to upgrade rust version to latest stable.
      
      - `rustfmt` has learned to format `let irrefutable = $expr else { ...
      };` blocks
      - There's a new warning about virtual (workspace) crate resolver, picked
      the latest resolver as I suspect everyone would expect it to be the
      latest; should not matter anyways
      - Some new clippies, which seem alright
      720d5973
    • Joonas Koivunen's avatar
      test: Remote storage refactorings (#5243) · ff87fc56
      Joonas Koivunen authored
      
      Remote storage cleanup split from #5198:
      - pageserver, extensions, and safekeepers now have their separate remote
      storage
      - RemoteStorageKind has the configuration code
      - S3Storage has the cleanup code
      - with MOCK_S3, pageserver, extensions, safekeepers use different
      buckets
      - with LOCAL_FS, `repo_dir / "local_fs_remote_storage" / $user` is used
      as path, where $user is `pageserver`, `safekeeper`
      - no more `NeonEnvBuilder.enable_xxx_remote_storage` but one
      `enable_{pageserver,extensions,safekeeper}_remote_storage`
      
      Should not have any real changes. These will allow us to default to
      `LOCAL_FS` for pageserver on the next PR, remove
      `RemoteStorageKind.NOOP`, work towards #5172.
      
      Co-authored-by: default avatarAlexander Bayandin <alexander@neon.tech>
      ff87fc56
    • Heikki Linnakangas's avatar
      Update pg_cron to version 1.6.0 (#5252) · cdc65c18
      Heikki Linnakangas authored
      This includes PostgreSQL 16 support. There are no catalog changes, so
      this is a drop-in replacement, no need to run "ALTER EXTENSION UPDATE".
      cdc65c18
    • Heikki Linnakangas's avatar
      Update plpgsql_check extension to version v2.4.0 (#5249) · dac995e7
      Heikki Linnakangas authored
      This brings v16 support.
      dac995e7
    • Alexander Bayandin's avatar
      test_startup: increase timeout (#5238) · b80740bf
      Alexander Bayandin authored
      ## Problem
      
      `test_runner/performance/test_startup.py::test_startup` started to fail
      more frequently because of the timeout.
      Let's increase the timeout to see the failures on the perf dashboard.
      
      ## Summary of changes
      - Increase timeout for`test_startup` from 600 to 900 seconds
      b80740bf
    • Heikki Linnakangas's avatar
      Update hypopg extension to version 1.4.0 (#5245) · 57c1ea49
      Heikki Linnakangas authored
      The v1.4.0 includes changes to make it compile with PostgreSQL 16. The
      commit log doesn't call it out explicitly, but I tested it manually.
      
      v1.4.0 includes some new functions, but I tested manually that the the
      v1.3.1 functionality works with the v1.4.0 version of the library. That
      means that this doesn't break existing installations. Users can do
      "ALTER EXTENSION hypopg UPDATE" if they want to use the new v1.4.0
      functionality, but they don't have to.
      57c1ea49
  6. Sep 07, 2023
    • Heikki Linnakangas's avatar
      Upgrade prefix extension to version 1.2.10 (#5244) · 6c31a2d3
      Heikki Linnakangas authored
      This version includes trivial changes to make it compile with PostgreSQL
      16. No functional changes.
      6c31a2d3
    • Heikki Linnakangas's avatar
      Upgrade postgresql-hll to version 2.18. (#5241) · 252b953f
      Heikki Linnakangas authored
      This includes PostgreSQL 16 support. No other changes, really.
      
      The extension version in the upstream was changed from 2.17 to 2.18,
      however, there is no difference between the catalog objects. So if you
      had installed 2.17 previously, it will continue to work. You can run
      "ALTER EXTENSION hll UPDATE", but all it will do is update the version
      number in the pg_extension table.
      252b953f
    • Heikki Linnakangas's avatar
      Upgrade ip4r to version 2.4.2 (#5242) · b414360a
      Heikki Linnakangas authored
      Includes PostgreSQL v16 support. No functional changes.
      b414360a
    • Arpad Müller's avatar
      Make VirtualFile::{open, open_with_options, create,sync_all,with_file} async fn (#5224) · d206655a
      Arpad Müller authored
      ## Problem
      
      Once we use async file system APIs for `VirtualFile`, these functions
      will also need to be async fn.
      
      ## Summary of changes
      
      Makes the functions `open, open_with_options, create,sync_all,with_file`
      of `VirtualFile` async fn, including all functions that call it. Like in
      the prior PRs, the actual I/O operations are not using async APIs yet,
      as per request in the #4743 epic.
      
      We switch towards not using `VirtualFile` in the par_fsync module,
      hopefully this is only temporary until we can actually do fully async
      I/O in `VirtualFile`. This might cause us to exhaust fd limits in the
      tests, but it should only be an issue for the local developer as we have
      high ulimits in prod.
      
      This PR is a follow-up of #5189, #5190, #5195, and #5203. Part of #4743.
      d206655a
    • Heikki Linnakangas's avatar
      Upgrade h3-pg to version 4.1.3. (#5237) · e5adc4ef
      Heikki Linnakangas authored
      This includes v16 support.
      e5adc4ef
    • Heikki Linnakangas's avatar
      Update PostGIS to version 3.3.3 (#5236) · c202f0ba
      Heikki Linnakangas authored
      It's a good idea to keep up-to-date in general. One noteworthy change is
      that PostGIS 3.3.3 adds support for PostgreSQL v16. We'll need that.
      
      PostGIS 3.4.0 has already been released, and we should consider
      upgrading to that. However, it's a major upgrade and requires running
      "SELECT postgis_extensions_upgrade();" in each database, to upgrade the
      catalogs. I don't want to deal with that right now.
      c202f0ba
    • Alexander Bayandin's avatar
      d15563f9
    • Rahul Modpur's avatar
      Fix pg_config version parsing (#5200) · 485a2cfd
      Rahul Modpur authored
      ## Problem
      Fix pg_config version parsing
      
      ## Summary of changes
      Use regex to capture major version of postgres
      #5146
      485a2cfd
    • Alexander Bayandin's avatar
      Update `plv8` to 3.1.8 (#5230) · 1fee6937
      Alexander Bayandin authored
      ## Problem
      
      We likely need this to support Postgres 16
      It's also been asked by a user
      https://github.com/neondatabase/neon/discussions/5042
      
      The latest version is 3.2.0, but it requires some changes in the build
      script (which I haven't checked, but it didn't work right away)
      
      ## Summary of changes
      ```
      3.1.8       2023-08-01
                  - force v8 to compile in release mode
      
      3.1.7       2023-06-26
                  - fix byteoffset issue with arraybuffers
                  - support postgres 16 beta
      
      3.1.6       2023-04-08
                  - fix crash issue on fetch apply
                  - fix interrupt issue
      ```
      From https://github.com/plv8/plv8/blob/v3.1.8/Changes
      1fee6937
    • Alexander Bayandin's avatar
      Even better handling of `approved-for-ci-run` label (#5227) · f8a91e79
      Alexander Bayandin authored
      ## Problem
      
      We've got `approved-for-ci-run` to work :tada: 
      But it's still a bit rough, this PR should improve the UX for external
      contributors.
      
      ## Summary of changes
      - `build_and_test.yml`: add `check-permissions` job, which fails if PR is
      created from a fork. Make all jobs in the workflow to be dependant on
      `check-permission` to fail fast
      - `approved-for-ci-run.yml`: add `cleanup` job to close `ci-run/pr-*`
      PRs and delete linked branches when the parent PR is closed
      - `approved-for-ci-run.yml`: fix the layout for the `ci-run/pr-*` PR
      description
      - GitHub Autocomment: add a comment with tests result to the original PR
      (instead of a PR from `ci-run/pr-*` )
      f8a91e79
    • duguorong009's avatar
      fix(pageserver): add the walreceiver state to tenant timeline GET api endpoint (#5196) · 706977fb
      duguorong009 authored
      
      Add a `walreceiver_state` field to `TimelineInfo` (response of `GET /v1/tenant/:tenant_id/timeline/:timeline_id`) and while doing that, refactor out a common `Timeline::walreceiver_state(..)`. No OpenAPI changes, because this is an internal debugging addition.
      
      Fixes #3115.
      
      Co-authored-by: default avatarJoonas Koivunen <joonas.koivunen@gmail.com>
      706977fb
Loading