Skip to content
Snippets Groups Projects
This project is mirrored from https://github.com/scylladb/scylladb. Pull mirroring updated .
  1. Dec 21, 2016
  2. Dec 20, 2016
    • Tomasz Grabiec's avatar
      tests: commitlog: Fix assumption about write visibility · 0d0e53c5
      Tomasz Grabiec authored
      The test assumed that mutations added to the commitlog are visible to
      reads as soon as a new segment is opened. That's not true because
      buffers are written back in the background, and new segment may be
      active while the previous one is still being written or not yet
      synced.
      
      Fix the test so that it expectes that the number of mutations read
      this way is <= the number of mutations read, and that after all
      segments are synced, the number of mutations read is equal.
      
      Message-Id: <1481630481-19395-1-git-send-email-tgrabiec@scylladb.com>
      (cherry picked from commit fe6a70db)
      0d0e53c5
  3. Dec 19, 2016
    • Glauber Costa's avatar
      commitlog: correctly report requests blocked · 99d9b4e7
      Glauber Costa authored
      
      The semaphore future may be unavailable for many reasons. Specifically,
      if the task quota is depleted right between sem.wait() and the .then()
      clause in get_units() the resulting future won't be available.
      
      That is particularly visible if we decrease the task quota, since those
      events will be more frequent: we can in those cases clearly see this
      counter going up, even though there aren't more requests pending than
      usual.
      
      This patch improves the situation by replacing that check. We now verify
      whether or not there are waiters in the semaphore.
      
      Signed-off-by: default avatarGlauber Costa <glauber@scylladb.com>
      Message-Id: <113c0d6b43cd6653ce972541baf6920e5765546b.1481222621.git.glauber@scylladb.com>
      (cherry picked from commit 9b5e6d6b)
      99d9b4e7
  4. Dec 18, 2016
  5. Dec 16, 2016
    • Tomasz Grabiec's avatar
      Merge branch 'virtual-dirty-fixes-1.5-backport' from... · e82324fb
      Tomasz Grabiec authored
      Merge branch 'virtual-dirty-fixes-1.5-backport' from git@github.com:glommer/scylla.git into branch-1.5
      
      Rework dirty memory hierarchy from Glauber.
      e82324fb
    • Glauber Costa's avatar
      config: get rid of memtable_total_space · 1ae62678
      Glauber Costa authored
      
      Those values are now statically set.
      
      Signed-off-by: default avatarGlauber Costa <glauber@scylladb.com>
      (cherry picked from commit 2aa65146)
      Signed-off-by: default avatarGlauber Costa <glauber@scylladb.com>
      1ae62678
    • Glauber Costa's avatar
      database: rework dirty memory hierarchy · 09a463fd
      Glauber Costa authored
      
      Issue #1918 describes a problem, in which we are generating smaller
      memtables than we could, and therefore not respecting the flush
      criteria.
      
      That happens because group sizes (and limits) for pressure purposes, and
      the the soft threshold is currently at 40 %. This causes system group's
      soft threshold to be way below regular's virtual dirty limit and close
      to regular group's soft threshold. The system group was very likely to
      become under soft pressure when regular was because writes to regular
      group are not yet throttled when they cross both soft thresholds.
      
      This is a direct consequence of the linear hierarchy between the regions
      and to guarantee that it won't happen we would have acqire the semaphore
      of all ancestor regions when flushing from a child region. While that
      works, it can lead to problems on its own, like priority inversion if
      the regions have different priorities - like streaming and regular, and
      groups lower in the hierarchy, like user, blocking explicit flushes
      from their ancestors
      
      To fix that, this patch reorganizes the dirty memory region groups so
      that groups are now completely independent. As a disadvantage, when
      streaming happen we will draw some memory from the cache, but we will
      live with it for the time being.
      
      Fixes #1918
      
      [ glauber: fix conflicts in memtable.cc due to lack of graceful clear ]
      
      Signed-off-by: default avatarGlauber Costa <glauber@scylladb.com>
      (cherry picked from commit 80440c0d)
      Signed-off-by: default avatarGlauber Costa <glauber@scylladb.com>
      09a463fd
    • Glauber Costa's avatar
      system keyspace: write batchlog mutation in user memory · 34713638
      Glauber Costa authored
      
      Batchlog is a potentially memory-intensive table whose workload is
      driven by user needs, not system's. Move it to the user dirty memory
      manager.
      
      [ glauber: fix conflict with virtual readers ]
      
      Signed-off-by: default avatarGlauber Costa <glauber@scylladb.com>
      (cherry picked from commit db7cc3cb)
      Signed-off-by: default avatarGlauber Costa <glauber@scylladb.com>
      34713638
    • Glauber Costa's avatar
      database: remove friendship declaration · 8680174f
      Glauber Costa authored
      
      Not needed anymore since memtable started having a direct pointer to the
      memtable list.
      
      Signed-off-by: default avatarGlauber Costa <glauber@scylladb.com>
      (cherry picked from commit 2e8c7d2c)
      Signed-off-by: default avatarGlauber Costa <glauber@scylladb.com>
      8680174f
    • Glauber Costa's avatar
      database: simplify flush_one · 261b67f4
      Glauber Costa authored
      
      flush_one has to make sure that we're using the correct
      dirty_memory_manager object, because we could be flushing from a region
      group different than the one the flush request originated.
      
      It's simpler to just assume flush_one will be dealing with the right
      object, and use a different object instead of "this" when calling it.
      
      Signed-off-by: default avatarGlauber Costa <glauber@scylladb.com>
      (cherry picked from commit bb1509c2)
      Signed-off-by: default avatarGlauber Costa <glauber@scylladb.com>
      261b67f4
    • Glauber Costa's avatar
      database: make memtable_list aware in cases it can't flush · bb173e3e
      Glauber Costa authored
      
      Some of our CFs can't be flushed. Those are the ones who are not marked
      as having durable writes. We treat them just the same from the point of
      view of the flush logic, but they provide a function that doesn't do
      anything and just returns right away.
      
      We already had troubles with that in the past, and that also poses a
      problem for an upcoming patch reworking the flush memtable pick
      criteria.
      
      It's easier, simpler, and cleaner, to just make the memtable_list aware
      it can't flush. Achieving that is also not very complicated: we just
      need a special constructor that doesn't take a seal function and then we
      make sure that it is initialized to an empty std::function
      
      Signed-off-by: default avatarGlauber Costa <glauber@scylladb.com>
      (cherry picked from commit 8ab7c04c)
      Signed-off-by: default avatarGlauber Costa <glauber@scylladb.com>
      bb173e3e
  6. Dec 12, 2016
    • Glauber Costa's avatar
      database: move reversion of virtual dirty state closer to update_cache. · 9688dca8
      Glauber Costa authored
      
      When we finish writing a memtable, we revert the dirty memory charges
      immediately. When we do that, dirty memory will grow back to what it
      was, and soon (we hope) will go down again when we release the requests
      for real.
      
      During that time, we may not accept new requests. Sealing can take a
      long time, specially in the face of Linux issues like the ones we have
      seen in the past. It also will take proportionally more time if the
      SSTables end up being small, which is a possibility in some scenarios.
      
      This patch changes the dirty_memory_manager so that the charges won't be
      reverted right after we finish the flush. Rather, we will hold on to it,
      and revert it right before we update the cache. We don't need to do it
      for all classes of memtable writes, because after we finish flushing,
      flush_one() will destroy the hashed element anyway.
      
      [tgrabiec: conflicts]
      
      Signed-off-by: default avatarGlauber Costa <glauber@scylladb.com>
      Message-Id: <2d5a8f6ca57d5036f4850ac163557bca59b8063d.1480004384.git.glauber@scylladb.com>
      (cherry picked from commit c32803f2)
      9688dca8
  7. Dec 11, 2016
    • Duarte Nunes's avatar
      lz4: Conditionally use LZ4_compress_default() · 549c9790
      Duarte Nunes authored
      
      Since not all distributions have a version of LZ4 with
      LZ4_compress_default(), we use it conditionally.
      
      This is specially important beginning with version 1.7.3 of LZ4,
      which deprecates the LZ4_compress() function in favour of
      LZ4_compress_default() and thus prevents Scylla from compiling
      due to the deprecated warning.
      
      Signed-off-by: default avatarDuarte Nunes <duarte@scylladb.com>
      Message-Id: <20161124092339.23017-1-duarte@scylladb.com>
      (cherry picked from commit cc3f26c9)
      549c9790
    • Avi Kivity's avatar
      Update seastar submodule · 631d9217
      Avi Kivity authored
      * seastar 386ccd9...bd9eda1 (1):
        > rpc: Conditionally use LZ4_compress_default()
      631d9217
  8. Dec 09, 2016
  9. Dec 08, 2016
    • Avi Kivity's avatar
      sstables: fix probe with Unknown component · 182f67cf
      Avi Kivity authored
      Commit 53b7b7de ("sstables: handle unrecognized sstable component")
      ignores unrecognized components, but misses one code path during probe_file().
      
      Ignore unrecognized components there too.
      
      Fixes #1922.
      Message-Id: <20161208131027.28939-1-avi@scylladb.com>
      
      (cherry picked from commit 872b5ef5)
      182f67cf
  10. Dec 07, 2016
    • Tomasz Grabiec's avatar
      commitlog: Fix replay to not delete dirty segments · dc08cb46
      Tomasz Grabiec authored
      The problem is that replay will unlink any segments which were on disk
      at the time the replay starts. However, some of those segments may
      have been created by current node since the boot. If a segment is part
      of reserve for example, it will be unlinked by replay, but we will
      still use that segment to log mutations. Those mutations will not be
      visible to replay after a crash though.
      
      The fix is to record preexisting segents before any new segments will
      have a chance to be created and use that as the replay list.
      
      Introduced in abe73587.
      
      dtest failure:
      
       commitlog_test.py:TestCommitLog.test_commitlog_replay_on_startup
      
      Message-Id: <1481117436-6243-1-git-send-email-tgrabiec@scylladb.com>
      (cherry picked from commit f7197dab)
      scylla-1.5.rc2
      dc08cb46
  11. Dec 06, 2016
  12. Dec 05, 2016
  13. Dec 01, 2016
    • Paweł Dziepak's avatar
      row_cache: dummy entry does not count as partition · c014e738
      Paweł Dziepak authored
      
      Since continuity flag introduction row cache contains a single dummy
      entry. cache_tracker knows nothing about it so that it doesn't appear in
      any of the metrics. However, cache destructor calls
      cache_tracker::on_erase() for every entry in the cache including the
      dummy one. This is incorrect since the tracker wasn't informed when the
      dummy entry was created.
      
      Signed-off-by: default avatarPaweł Dziepak <pdziepak@scylladb.com>
      Message-Id: <1478608776-10363-1-git-send-email-pdziepak@scylladb.com>
      c014e738
    • Glauber Costa's avatar
      prevent commitlog replay position reordering during reserve refill · abe73587
      Glauber Costa authored
      
      When requests hit the commitlog, each of them will be assigned a replay
      position, which we expect to be ordered. If reorders happen, the request
      will be discarded and re-applied. Although this is supposed to be rare,
      it does increase our latencies, specially when big requests are
      involved. Processing big requests is expensive and if we have to do it
      twice that adds to the cost.
      
      The commitlog is supposed to issue replay positions in order, and it
      coudl be that the code that adds them to the memtables will reorder
      them. However, there is one instance in which the commitlog will not
      keep its side of the bargain.
      
      That happens when the reserve is exhausted, and we are allocating a
      segment directly at the same time the reserve is being replenished.  The
      following sequence of events with its deferring points will ilustrate
      it:
      
      on_timer:
      
          return this->allocate_segment(false). // defer here // then([this](sseg_ptr s) {
      
      At this point, the segment id is already allocated.
      
      new_segment():
      
          if (_reserve_segments.empty()) {
      	[ ... ]
              return allocate_segment(true).then ...
      
      At this point, we have a new segment that has an id that is higher than
      the previous id allocated.
      
      Then we resume the execution from the deferring point in on_timer():
      
          i = _reserve_segments.emplace(i, std::move(s));
      
      The next time we need to allocate a segment, we'll pick it from the
      reserve. But the segment in the reserve has an id that is lower than the
      id that we have already used.
      
      Reorders are bad, but this one is particularly bad: because the reorder
      happens with the segment id side of the replay position, that means that
      every request that falls into that segment will have to be reinserted.
      
      This bug can be a bit tricky to reproduce. To make it more common, we
      can artificially add a sleep() fiber after the allocate_segment(false)
      in on_timer(). If we do that, we'll see a sea of reinsertions going on
      in the logs (if dblog is set to debug).
      
      Applying this patch (keeping the sleep) will make them all disappear.
      We do this by rewriting the reserve logic, so that the segments always
      come from the reserve. If we draw from a single pool all the time, there
      is no chance of reordering happening. To make that more amenable, we'll
      have the reserve filler always running in the background and take it out
      of the timer code.
      
      Signed-off-by: default avatarGlauber Costa <glauber@scylladb.com>
      Message-Id: <49eb7edfcafaef7f1fdceb270639a9a8b50cfce7.1480531446.git.glauber@scylladb.com>
      (cherry picked from commit 99a5a772)
      abe73587
    • Glauber Costa's avatar
      commitlog: sync segments before acquiring semaphore on shutdown. · 0bce0197
      Glauber Costa authored
      
      Sync all segments before acquiring the semaphore, otherwise waiting may
      have to wait for the timer to kick in and push them down.
      Note that we can't guarantee that no other requests were executed in the
      mean time, so we have to sync again.
      
      Signed-off-by: default avatarGlauber Costa <glauber@scylladb.com>
      Message-Id: <aea019fe49820acce5d2b55dd5ec31e975b3436c.1480388674.git.glauber@scylladb.com>
      (cherry picked from commit 353a4cd2)
      0bce0197
    • Tomasz Grabiec's avatar
      tests: Fix use-after-free on commitlog · ae3b1667
      Tomasz Grabiec authored
      Only shutdown() ensures all internal processes are complete. Call it before calling clear().
      
      Message-Id: <1480495534-2253-1-git-send-email-tgrabiec@scylladb.com>
      (cherry picked from commit c35e18ba)
      ae3b1667
    • Tomasz Grabiec's avatar
      Update seastar submodule · 2aa73ac1
      Tomasz Grabiec authored
      * seastar 6fd4534...386ccd9 (1):
        > queue: allow queue to change its maximum size
      2aa73ac1
    • Avi Kivity's avatar
      Update scylla-ami submodule · 261fcc1e
      Avi Kivity authored
      * dist/ami/files/scylla-ami e1e3919...d5a4397 (3):
        > scylla_install_ami: allow specify different repository for Scylla installation and receive update
        > scylla_install_ami: delete unneeded authorized_keys from AMI image
        > scylla_ami_setup: run posix_net_conf.sh when NCPUS < 8
      261fcc1e
    • Takuya ASADA's avatar
      dist/ami: allow specify different repository for Scylla installation and receive update · 3a7b9d55
      Takuya ASADA authored
      
      This fix splits build_ami.sh --repo to three different options:
       --repo-for-install is for Scylla package installation, only valid
       during AMI construction.
      
       --repo-for-update will be stored at /etc/yum.repos.d/scylla.repo, to
       receive update package on AMI.
      
       --repo is both, for installation and update.
      
      Fixes #1872
      
      Signed-off-by: default avatarTakuya ASADA <syuu@scylladb.com>
      Message-Id: <1480438858-6007-1-git-send-email-syuu@scylladb.com>
      (cherry picked from commit 17ef5e63)
      3a7b9d55
  14. Nov 30, 2016
    • Glauber Costa's avatar
      database: do not call seal directly from the streaming timer · 60d5b21e
      Glauber Costa authored
      
      Streaming memtable have a delayed mode where many flushes are coalesced
      together into one, with the actual flush happening later and propagated
      to all the previous waiters.
      
      However, the timer that triggers the actual flush was not using the
      newly introduced flush infrastructure. This was a minor problem because
      those flushes wouldn't try to take the semaphore, and so we could have
      many flushes going on at the same time.
      
      What was a potential performance issue became a correctness issue when
      we moved the reversal of the dirty memory accounting out of
      revert_potentially_cleaned_up_memory() into remove_from_flush_manager().
      
      Since the latter is only called through the flush infrastructure, it
      simply wasn't called. So the deferral of the reversal exposed this bug.
      
      Signed-off-by: default avatarGlauber Costa <glauber@scylladb.com>
      Message-Id: <0d5755375bc27524b8cfb9970c76d492b14d9eea.1480522742.git.glauber@scylladb.com>
      (cherry picked from commit d7256e7b)
      60d5b21e
    • Glauber Costa's avatar
      commitlog: use read ahead for replay requests · 903a323b
      Glauber Costa authored
      
      Aside from putting the requests in the commitlog class, read ahead
      will help us going through the file faster.
      
      Signed-off-by: default avatarGlauber Costa <glauber@scylladb.com>
      (cherry picked from commit 59a41cf7)
      903a323b
    • Glauber Costa's avatar
      commitlog: use commitlog priority for replay · 0174b9ad
      Glauber Costa authored
      
      Right now replay is being issued with the standard seastar priority.
      The rationale for that at the time is that it is an early event that
      doesn't really share the disk with anybody.
      
      That is largely untrue now that we start compactions on boot.
      Compactions may fight for bandwidth with the commitlog, and with such
      low priority the commitlog is guaranteed to lose.
      
      Fixes #1856
      
      Signed-off-by: default avatarGlauber Costa <glauber@scylladb.com>
      (cherry picked from commit aa375cd3)
      0174b9ad
    • Glauber Costa's avatar
      commitlog: close file after read, and not at stop · 3b7f646f
      Glauber Costa authored
      
      There are other code paths that may interrupt the read in the middle
      and bypass stop. It's safer this way.
      
      Signed-off-by: default avatarGlauber Costa <glauber@scylladb.com>
      Message-Id: <8c32ca2777ce2f44462d141fd582848ac7cf832d.1479477360.git.glauber@scylladb.com>
      (cherry picked from commit 60b7d35f)
      3b7f646f
    • Glauber Costa's avatar
      commitlog: close replay file · 127152e0
      Glauber Costa authored
      
      Replay file is opened, so it should be closed. We're not seeing any
      problems arising from this, but they may happen. Enabling read ahead in
      this stream makes them happen immediately. Fix it.
      
      Signed-off-by: default avatarGlauber Costa <glauber@scylladb.com>
      (cherry picked from commit 4d3d7747)
      127152e0
  15. Nov 29, 2016
  16. Nov 27, 2016
  17. Nov 24, 2016
  18. Nov 23, 2016
Loading