This project is mirrored from https://github.com/scylladb/scylladb.
Pull mirroring updated .
- Dec 21, 2016
-
-
Pekka Enberg authored
-
- Dec 20, 2016
-
-
Tomasz Grabiec authored
The test assumed that mutations added to the commitlog are visible to reads as soon as a new segment is opened. That's not true because buffers are written back in the background, and new segment may be active while the previous one is still being written or not yet synced. Fix the test so that it expectes that the number of mutations read this way is <= the number of mutations read, and that after all segments are synced, the number of mutations read is equal. Message-Id: <1481630481-19395-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit fe6a70db)
-
- Dec 19, 2016
-
-
Glauber Costa authored
The semaphore future may be unavailable for many reasons. Specifically, if the task quota is depleted right between sem.wait() and the .then() clause in get_units() the resulting future won't be available. That is particularly visible if we decrease the task quota, since those events will be more frequent: we can in those cases clearly see this counter going up, even though there aren't more requests pending than usual. This patch improves the situation by replacing that check. We now verify whether or not there are waiters in the semaphore. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <113c0d6b43cd6653ce972541baf6920e5765546b.1481222621.git.glauber@scylladb.com> (cherry picked from commit 9b5e6d6b)
-
- Dec 18, 2016
-
-
Pekka Enberg authored
-
- Dec 16, 2016
-
-
Tomasz Grabiec authored
Merge branch 'virtual-dirty-fixes-1.5-backport' from git@github.com:glommer/scylla.git into branch-1.5 Rework dirty memory hierarchy from Glauber.
-
Glauber Costa authored
Those values are now statically set. Signed-off-by: Glauber Costa <glauber@scylladb.com> (cherry picked from commit 2aa65146) Signed-off-by: Glauber Costa <glauber@scylladb.com>
-
Glauber Costa authored
Issue #1918 describes a problem, in which we are generating smaller memtables than we could, and therefore not respecting the flush criteria. That happens because group sizes (and limits) for pressure purposes, and the the soft threshold is currently at 40 %. This causes system group's soft threshold to be way below regular's virtual dirty limit and close to regular group's soft threshold. The system group was very likely to become under soft pressure when regular was because writes to regular group are not yet throttled when they cross both soft thresholds. This is a direct consequence of the linear hierarchy between the regions and to guarantee that it won't happen we would have acqire the semaphore of all ancestor regions when flushing from a child region. While that works, it can lead to problems on its own, like priority inversion if the regions have different priorities - like streaming and regular, and groups lower in the hierarchy, like user, blocking explicit flushes from their ancestors To fix that, this patch reorganizes the dirty memory region groups so that groups are now completely independent. As a disadvantage, when streaming happen we will draw some memory from the cache, but we will live with it for the time being. Fixes #1918 [ glauber: fix conflicts in memtable.cc due to lack of graceful clear ] Signed-off-by: Glauber Costa <glauber@scylladb.com> (cherry picked from commit 80440c0d) Signed-off-by: Glauber Costa <glauber@scylladb.com>
-
Glauber Costa authored
Batchlog is a potentially memory-intensive table whose workload is driven by user needs, not system's. Move it to the user dirty memory manager. [ glauber: fix conflict with virtual readers ] Signed-off-by: Glauber Costa <glauber@scylladb.com> (cherry picked from commit db7cc3cb) Signed-off-by: Glauber Costa <glauber@scylladb.com>
-
Glauber Costa authored
Not needed anymore since memtable started having a direct pointer to the memtable list. Signed-off-by: Glauber Costa <glauber@scylladb.com> (cherry picked from commit 2e8c7d2c) Signed-off-by: Glauber Costa <glauber@scylladb.com>
-
Glauber Costa authored
flush_one has to make sure that we're using the correct dirty_memory_manager object, because we could be flushing from a region group different than the one the flush request originated. It's simpler to just assume flush_one will be dealing with the right object, and use a different object instead of "this" when calling it. Signed-off-by: Glauber Costa <glauber@scylladb.com> (cherry picked from commit bb1509c2) Signed-off-by: Glauber Costa <glauber@scylladb.com>
-
Glauber Costa authored
Some of our CFs can't be flushed. Those are the ones who are not marked as having durable writes. We treat them just the same from the point of view of the flush logic, but they provide a function that doesn't do anything and just returns right away. We already had troubles with that in the past, and that also poses a problem for an upcoming patch reworking the flush memtable pick criteria. It's easier, simpler, and cleaner, to just make the memtable_list aware it can't flush. Achieving that is also not very complicated: we just need a special constructor that doesn't take a seal function and then we make sure that it is initialized to an empty std::function Signed-off-by: Glauber Costa <glauber@scylladb.com> (cherry picked from commit 8ab7c04c) Signed-off-by: Glauber Costa <glauber@scylladb.com>
-
- Dec 12, 2016
-
-
Glauber Costa authored
When we finish writing a memtable, we revert the dirty memory charges immediately. When we do that, dirty memory will grow back to what it was, and soon (we hope) will go down again when we release the requests for real. During that time, we may not accept new requests. Sealing can take a long time, specially in the face of Linux issues like the ones we have seen in the past. It also will take proportionally more time if the SSTables end up being small, which is a possibility in some scenarios. This patch changes the dirty_memory_manager so that the charges won't be reverted right after we finish the flush. Rather, we will hold on to it, and revert it right before we update the cache. We don't need to do it for all classes of memtable writes, because after we finish flushing, flush_one() will destroy the hashed element anyway. [tgrabiec: conflicts] Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <2d5a8f6ca57d5036f4850ac163557bca59b8063d.1480004384.git.glauber@scylladb.com> (cherry picked from commit c32803f2)
-
- Dec 11, 2016
-
-
Duarte Nunes authored
Since not all distributions have a version of LZ4 with LZ4_compress_default(), we use it conditionally. This is specially important beginning with version 1.7.3 of LZ4, which deprecates the LZ4_compress() function in favour of LZ4_compress_default() and thus prevents Scylla from compiling due to the deprecated warning. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20161124092339.23017-1-duarte@scylladb.com> (cherry picked from commit cc3f26c9)
-
Avi Kivity authored
* seastar 386ccd9...bd9eda1 (1): > rpc: Conditionally use LZ4_compress_default()
-
- Dec 09, 2016
-
-
Glauber Costa authored
As Tomek pointed out, as we are starting the flush before we acquire the semaphore, we are not really limiting parallelism, but only delaying the end of the flush instead. Fixes #1919 [tgrabiec: conflicts] Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <6cbf9ec2f3a341c76becf94f794cfa16539c5192.1481120410.git.glauber@scylladb.com> (cherry picked from commit 733d87fc)
-
- Dec 08, 2016
-
-
Avi Kivity authored
Commit 53b7b7de ("sstables: handle unrecognized sstable component") ignores unrecognized components, but misses one code path during probe_file(). Ignore unrecognized components there too. Fixes #1922. Message-Id: <20161208131027.28939-1-avi@scylladb.com> (cherry picked from commit 872b5ef5)
-
- Dec 07, 2016
-
-
Tomasz Grabiec authored
The problem is that replay will unlink any segments which were on disk at the time the replay starts. However, some of those segments may have been created by current node since the boot. If a segment is part of reserve for example, it will be unlinked by replay, but we will still use that segment to log mutations. Those mutations will not be visible to replay after a crash though. The fix is to record preexisting segents before any new segments will have a chance to be created and use that as the replay list. Introduced in abe73587. dtest failure: commitlog_test.py:TestCommitLog.test_commitlog_replay_on_startup Message-Id: <1481117436-6243-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit f7197dab)
-
- Dec 06, 2016
-
-
Amos Kong authored
Currently housekeeping timer won't be reset when we restart scylla-server. We expect the service to be run at each start, it will be consistent with upstart script in Ubuntu 14.04 When we restart scylla-server, housekeepting timer will also be restarted, so let's replace "OnBootSec" with "OnActiveSec". Fixes: #1601 Signed-off-by: Amos Kong <amos@scylladb.com> Message-Id: <a22943cc11a3de23db266c52fd476c08014098c4.1480607401.git.amos@scylladb.com>
-
Takuya ASADA authored
RHEL 7.3's systemd contains known bug on timer.c: https://github.com/systemd/systemd/issues/2632 This is workaround to avoid hitting bug. Fixes #1846 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1480452194-11683-1-git-send-email-syuu@scylladb.com> (cherry picked from commit 84649030)
-
- Dec 05, 2016
-
-
Pekka Enberg authored
-
- Dec 01, 2016
-
-
Paweł Dziepak authored
Since continuity flag introduction row cache contains a single dummy entry. cache_tracker knows nothing about it so that it doesn't appear in any of the metrics. However, cache destructor calls cache_tracker::on_erase() for every entry in the cache including the dummy one. This is incorrect since the tracker wasn't informed when the dummy entry was created. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1478608776-10363-1-git-send-email-pdziepak@scylladb.com>
-
Glauber Costa authored
When requests hit the commitlog, each of them will be assigned a replay position, which we expect to be ordered. If reorders happen, the request will be discarded and re-applied. Although this is supposed to be rare, it does increase our latencies, specially when big requests are involved. Processing big requests is expensive and if we have to do it twice that adds to the cost. The commitlog is supposed to issue replay positions in order, and it coudl be that the code that adds them to the memtables will reorder them. However, there is one instance in which the commitlog will not keep its side of the bargain. That happens when the reserve is exhausted, and we are allocating a segment directly at the same time the reserve is being replenished. The following sequence of events with its deferring points will ilustrate it: on_timer: return this->allocate_segment(false). // defer here // then([this](sseg_ptr s) { At this point, the segment id is already allocated. new_segment(): if (_reserve_segments.empty()) { [ ... ] return allocate_segment(true).then ... At this point, we have a new segment that has an id that is higher than the previous id allocated. Then we resume the execution from the deferring point in on_timer(): i = _reserve_segments.emplace(i, std::move(s)); The next time we need to allocate a segment, we'll pick it from the reserve. But the segment in the reserve has an id that is lower than the id that we have already used. Reorders are bad, but this one is particularly bad: because the reorder happens with the segment id side of the replay position, that means that every request that falls into that segment will have to be reinserted. This bug can be a bit tricky to reproduce. To make it more common, we can artificially add a sleep() fiber after the allocate_segment(false) in on_timer(). If we do that, we'll see a sea of reinsertions going on in the logs (if dblog is set to debug). Applying this patch (keeping the sleep) will make them all disappear. We do this by rewriting the reserve logic, so that the segments always come from the reserve. If we draw from a single pool all the time, there is no chance of reordering happening. To make that more amenable, we'll have the reserve filler always running in the background and take it out of the timer code. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <49eb7edfcafaef7f1fdceb270639a9a8b50cfce7.1480531446.git.glauber@scylladb.com> (cherry picked from commit 99a5a772)
-
Glauber Costa authored
Sync all segments before acquiring the semaphore, otherwise waiting may have to wait for the timer to kick in and push them down. Note that we can't guarantee that no other requests were executed in the mean time, so we have to sync again. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <aea019fe49820acce5d2b55dd5ec31e975b3436c.1480388674.git.glauber@scylladb.com> (cherry picked from commit 353a4cd2)
-
Tomasz Grabiec authored
Only shutdown() ensures all internal processes are complete. Call it before calling clear(). Message-Id: <1480495534-2253-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit c35e18ba)
-
Tomasz Grabiec authored
* seastar 6fd4534...386ccd9 (1): > queue: allow queue to change its maximum size
-
Avi Kivity authored
* dist/ami/files/scylla-ami e1e3919...d5a4397 (3): > scylla_install_ami: allow specify different repository for Scylla installation and receive update > scylla_install_ami: delete unneeded authorized_keys from AMI image > scylla_ami_setup: run posix_net_conf.sh when NCPUS < 8
-
Takuya ASADA authored
This fix splits build_ami.sh --repo to three different options: --repo-for-install is for Scylla package installation, only valid during AMI construction. --repo-for-update will be stored at /etc/yum.repos.d/scylla.repo, to receive update package on AMI. --repo is both, for installation and update. Fixes #1872 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1480438858-6007-1-git-send-email-syuu@scylladb.com> (cherry picked from commit 17ef5e63)
-
- Nov 30, 2016
-
-
Glauber Costa authored
Streaming memtable have a delayed mode where many flushes are coalesced together into one, with the actual flush happening later and propagated to all the previous waiters. However, the timer that triggers the actual flush was not using the newly introduced flush infrastructure. This was a minor problem because those flushes wouldn't try to take the semaphore, and so we could have many flushes going on at the same time. What was a potential performance issue became a correctness issue when we moved the reversal of the dirty memory accounting out of revert_potentially_cleaned_up_memory() into remove_from_flush_manager(). Since the latter is only called through the flush infrastructure, it simply wasn't called. So the deferral of the reversal exposed this bug. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <0d5755375bc27524b8cfb9970c76d492b14d9eea.1480522742.git.glauber@scylladb.com> (cherry picked from commit d7256e7b)
-
Glauber Costa authored
Aside from putting the requests in the commitlog class, read ahead will help us going through the file faster. Signed-off-by: Glauber Costa <glauber@scylladb.com> (cherry picked from commit 59a41cf7)
-
Glauber Costa authored
Right now replay is being issued with the standard seastar priority. The rationale for that at the time is that it is an early event that doesn't really share the disk with anybody. That is largely untrue now that we start compactions on boot. Compactions may fight for bandwidth with the commitlog, and with such low priority the commitlog is guaranteed to lose. Fixes #1856 Signed-off-by: Glauber Costa <glauber@scylladb.com> (cherry picked from commit aa375cd3)
-
Glauber Costa authored
There are other code paths that may interrupt the read in the middle and bypass stop. It's safer this way. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <8c32ca2777ce2f44462d141fd582848ac7cf832d.1479477360.git.glauber@scylladb.com> (cherry picked from commit 60b7d35f)
-
Glauber Costa authored
Replay file is opened, so it should be closed. We're not seeing any problems arising from this, but they may happen. Enabling read ahead in this stream makes them happen immediately. Fix it. Signed-off-by: Glauber Costa <glauber@scylladb.com> (cherry picked from commit 4d3d7747)
-
- Nov 29, 2016
-
-
Takuya ASADA authored
Fixes #1871 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1480327243-18177-1-git-send-email-syuu@scylladb.com> (cherry picked from commit 1042e401)
-
- Nov 27, 2016
-
-
Avi Kivity authored
* seastar df471a8...6fd4534 (1): > Collectd get_value_map safe scan the map Fixes #1835.
-
- Nov 24, 2016
-
-
Takuya ASADA authored
Follow the change of NOFILE for non-systemd environment. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1479975050-14907-1-git-send-email-syuu@scylladb.com> (cherry picked from commit ce80fb3a)
-
Glauber Costa authored
This limit was found to be too low for production environments. It would be hit at boot, when we're touching a lot of files from multiple shards before deciding that we don't need them. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <87bbf43da1a67f5fa6174017205c6ef8bdb0dc3d.1479829232.git.glauber@scylladb.com> (cherry picked from commit 18b9fa3d)
-
Duarte Nunes authored
In Thrift, SliceRange defines a count that limits the number of cells to return from that row (in CQL3 terms, it limits the number of rows in that partition). While this limit is honored in the engine, the Thrift layer also applies the same limit, which, while redundant in most cases, is used to support the get_paged_slice verb. Currently, the limit is not being reset per Thrift row (CQL3 partition), so in practice, instead of limiting the cells in a row, we're limiting the rows we return as well. This patch fixes that by ensuring the limit applies only within a row/partition. Fixes #1882 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20161123220001.15496-1-duarte@scylladb.com> (cherry picked from commit a527ba28)
-
Pekka Enberg authored
Fix typo in the RPM repository URL to actually use 1.5.
-
Pekka Enberg authored
-
- Nov 23, 2016
-
-
Tomasz Grabiec authored
* seastar 25137c2...df471a8 (1): > semaphore_units: add missing return statement
-