This project is mirrored from https://github.com/cockroachdb/cockroach.
Pull mirroring updated .
- Nov 28, 2018
-
-
kena authored
sql: catch invalid functions in the UPDATE SET RHS (v2.0.x)
-
- Nov 27, 2018
-
-
rytaft authored
release-2.0: stats,distsqlrun: reduce memory usage of CREATE STATISTICS
-
Rebecca Taft authored
Prior to this patch, running CREATE STATISTICS on a large table could cause the memory utilization of the database to grow so large that it crashed the server. This was due to the way garbage collection works in go. As described in this blog post: https://blog.golang.org/go-slices-usage-and-internals, > Re-slicing a slice doesn't make a copy of the underlying array. The > full array will be kept in memory until it is no longer referenced. > Occasionally this can cause the program to hold all the data in memory > when only a small piece of it is needed. This is exactly what was happening with CREATE STATISTICS. The CREATE STATISTICS command performs a full table scan, and randomly samples rows in order to build a histogram for a column. When rows are scanned in the kv layer, they are scanned in batches of ~10,000, creating arrays of encoded datums of length 10,000. Since the EncDatum object passed to the sampler contains a slice referencing that underlying array, the entire scanned batch would remain in memory as long as a single datum from that slice was included in the sample. With random sampling, it is likely that most of the batches have at least one sampled row, meaning almost the entire table will be in memory until the histogram is built. The fix is to copy the EncDatum to the sampler with only the decoded Datum, setting the encoded []bytes slice to nil. Release note (bug fix): Fixed an issue where calling CREATE STATISTICS on a large table could cause the server to crash due to running out of memory.
-
- Nov 26, 2018
-
-
Nathan VanBenschoten authored
release-2.0: storage: re-enqueue Raft groups on paginated application
-
Nathan VanBenschoten authored
Fixes #31330. This change re-enqueues Raft groups for processing immediately if they still have more to do after a Raft ready iteration. This comes up in practice when a Range has sufficient load to force Raft application pagination. See #31330 for a discussion on the symptoms this can cause. Release note (bug fix): Fix bug where Raft followers could fall behind leaders will entry application, causing stalls during splits.
-
- Nov 20, 2018
-
-
Raphael 'kena' Poss authored
Prior to this patch, an invalid scalar function in the RHS of UPDATE ... SET would not be rejected during planning and cause a crash during execution. This patch fixes this by ensuring the bad construct is rejected during planning. The fix here is different from the one effected in master/v2.1. Release note (bug fix): CockroachDB now properly rejects queries that use an invalid function in UPDATE SET (e.g., an aggregation).
-
- Nov 12, 2018
-
-
Andrei Matei authored
release-2.0: storage: don't apply local results if cmd processing failed
-
Andrei Matei authored
... in case of rejected command. Release note: None
-
Andrei Matei authored
This patch fixes a bug with the application of LocalResults: they were applied even in cases when the processing of the respective command failed. So, we were evaluating a batch, proposing a command, failing to apply the command (in particular because of the lease index check), but then still applying the LocalResult even though the replicated state was not updated. This bug had catastrophic consequences: part of the LocalResult is a list of UpdatedTxn - this is used to ping the txn wait queue and unblock waiters, communicating a final txn disposition to them. Now imagine an EndTxn batch that evaluates to a command that fails the lease check. The respective LocalResult is populate with out txn, updated to the COMMITTED disposition. The command fails to process, but we still ping the wait queue telling all the waiters that the transaction committed. The waiters, if any, now go and send ResolveIntent requests, which actually commit intents (thinking that the intent's txn committed). And so we end up with "partially committed transactions" - some of the writes from a txn are committed (the ones that happened to have waiters on their intents), others don't. In order for this bug to manifest itself, we need: - read/write contention on some keys - a command to fail processing. Generally this happens either because of the lease index check or because of the lease check; so either a Raft leadership change or a lease change can potentially cause the badness. In the case of a Raft leadership change, proposals can get dropped, which leads to reproposals, which seem to frequently(?) end up processing at the wrong lease index and thus be rejected (and it's the reproposal processing that triggers the bug). This kind of situation is triggered by the Jepsen register test, with various nemeses. The lease check failure can probably be triggered simply with lease transfers (e.g. lease rebalancing). Interestingly, the rate of occurence of this bug probably changed between 2.0 and 2.1 because of the introduction, in 2.1, of the QueryIntent request (and also, separately, because of increased usage of lease transfers; this claim was not verified though). Speaking of the Raft leadership change scenario, once the reproposal fail to be applied (but the wait queue is erroneously signalled), we also re-evaluate the EndTransaction batch. Unless something else goes wrong, in 2.0 this re-evaluation was likely succeeding. In 2.1, however, it tends to fail if there was a waiter on our txn because the EndTxn requests are usually bundled with QueryIntent requests - and these QueryIntents fail during re-evaluation because they don't find the intent - it was, by then, errneously committed by a waiter through a ResolveIntent. Except for transaction that wrote to multiple ranges: I think for those the QueryIntent requests are split off from the EndTransaction request by the DistSender, and so we're back to the situation in 2.0. Fixes #30792 Release note (bug fix): Fix a bug causing transactions to appear to be partially committed. CRDB was sometimes claiming to have failed to commit a transaction but some (or all) of its writes were actually persisted.
-
- Oct 25, 2018
-
-
Nathan VanBenschoten authored
release-2.0: kv: avoid transaction too large error if already refreshed
-
- Oct 24, 2018
-
-
Nathan VanBenschoten authored
This fixes a bug where a txn that already refreshed its spans and no longer needed them could still run into a "transaction is too large to complete" error. This will be backported to 2.0 and 2.1. Release note (bug fix): Fix bug where transactions unnecessarily hit "too large" error.
-
- Oct 16, 2018
-
-
Nikhil Benesch authored
release-2.0: c-deps: bump CryptoPP to avoid SIGTRAP on macOS
-
Nikhil Benesch authored
Bump CryptoPP to pick up a fix for cockroachdb/cockroach#31380. Details reproduced below. Fix #31380. --- As part of its CPU feature detection, CryptoPP installs a SIGILL signal handler before issuing the cpuid instruction. The intent is to gracefully degrade on CPUs that don't support the cpuid instruction. The problem is that it is impossible to safely overwrite a signal handler installed by the Go runtime in go1.10 on macOS (golang/go#22805). This causes CockroachDB 2.0 to crash on macOS Mojave: cockroachdb/cockroach#31380. The situation has improved on the Go front, as go1.11 makes it possible to safely save and restore signal handlers installed by the Go runtime on macOS. Still, we can do better and support go1.10. There is no need to bother installing a SIGILL handler, as the cpuid instruction is supported by every x86-64 CPU. We can instead use conditional compilation to make sure that we never execute a cpuid instruction on a non x86-64 CPU. Note that CPU feature detection is performed at executable load time (see the attribute(constructor) on DetectX86Features); therefore any reference to function which calls DetectX86Features (notably HasAESNI) corrupts the signal handler. It's not entirely clear why this corruption later leads to the SIGTRAP seen in cockroachdb/cockroach#31380--is something in macOS or the Go runtime generating a SIGILL and trying to handle it gracefully?--but regardless, not mucking with the signal handler fixes the issue. Release note (bug fix): CockroachDB no longer crashes due to a SIGTRAP error soon after startup on macOS Mojave (cockroachdb/cockroach#31380). diff --git a/c-deps/cryptopp b/c-deps/cryptopp index c621ce0532..6d6064445d 160000 --- a/c-deps/cryptopp +++ b/c-deps/cryptopp @@ -1 +1 @@ -Subproject commit c621ce053298fafc1e59191079c33acd76045c26 +Subproject commit 6d6064445ded787c7d6bf0a021fb718fddb63345
-
- Oct 15, 2018
-
-
Tobias Schottdorf authored
backport-2.0: kv: try next replica on RangeNotFoundError
-
Tobias Schottdorf authored
backport-2.0: storage: return one entry less in Entries
-
Tobias Schottdorf authored
This works around the bug outlined in: https://github.com/etcd-io/etcd/pull/10063 by matching Raft's internal implementation of commit pagination. Once the above PR lands, we can revert this commit (but I assume that it will take a little bit), and I think we should do that because the code hasn't gotten any nicer to look at. Fixes #28918. Release note: None # # Commit message recommendations: # # --- # <pkg>: <short description> # # <long description> # # Release note (category): <release note description> # --- # # Wrap long lines! 72 columns is best. # # The release note must be present if your commit has # user-facing changes. Leave the default above if not. # # Categories for release notes: # - cli change # - sql change # - admin ui change # - general change (e.g., change of required Go version) # - build change (e.g., compatibility with older CPUs) # - enterprise change (e.g., change to backup/restore) # - backwards-incompatible change # - performance improvement # - bug fix # # Commit message recommendations: # # --- # <pkg>: <short description> # # <long description> # # Release note (category): <release note description> # --- # # Wrap long lines! 72 columns is best. # # The release note must be present if your commit has # user-facing changes. Leave the default above if not. # # Categories for release notes: # - cli change # - sql change # - admin ui change # - general change (e.g., change of required Go version) # - build change (e.g., compatibility with older CPUs) # - enterprise change (e.g., change to backup/restore) # - backwards-incompatible change # - performance improvement # - bug fix
-
Tobias Schottdorf authored
Previously, if a Batch RPC came back with a RangeNotFoundError, we would immediately stop trying to send to more replicas, evict the range descriptor, and start a new attempt after a back-off. This new attempt could end up using the same replica, so if the RangeNotFoundError persisted for some amount of time, so would the unsuccessful retries for requests to it as DistSender doesn't aggressively shuffle the replicas. It turns out that there are such situations, and the election-after-restart roachtest spuriously hit one of them: 1. new replica receives a preemptive snapshot and the ConfChange 2. cluster restarts 3. now the new replica is in this state until the range wakes up, which may not happen for some time. 4. the first request to the range runs into the above problem @nvanbenschoten: I think there is an issue to be filed about the tendency of DistSender to get stuck in unfortunate configurations. Fixes #30613. Release note (bug fix): Avoid repeatedly trying a replica that was found to be in the process of being added.
-
- Oct 13, 2018
-
-
kena authored
release-2.0: deps: bump go-libedit
-
- Oct 12, 2018
-
-
Raphael 'kena' Poss authored
Release note (build change): CockroachDB can now be built from source on macOS 10.14 (Mojave).
-
- Oct 05, 2018
-
-
Nathan VanBenschoten authored
release-2.0: storage: only track forwarded proposals that were successfully appended
-
Nathan VanBenschoten authored
Fixes #30064. Before this change, it was possible for a proposal that was forwarded to a leader to be dropped instead of appended to its log. This was possible if the leader had a `leadTransferee` set. This change avoids this issue by verifying that the proposal was successfully added to the leaders log before tracking it. This will need to be backported to 2.1 and 2.0. Release note: None
-
- Oct 01, 2018
-
-
Nikhil Benesch authored
release-2.0: rpc,server: authenticate all gRPC methods
-
Nikhil Benesch authored
Previously only the roachpb.Batch RPC was correctly checking for an authenticated user. All other RPCs were open to the public, even when the server was running in secure mode. To prevent future accidents of this kind, hoist the authentication check to a gRPC interceptor that is guaranteed to run before all RPCs. Release note (bug fix): A security vulnerability in which data could be leaked from or tampered with in a cluster in secure mode has been fixed. Release note: None
-
- Sep 25, 2018
-
-
Tobias Schottdorf authored
backport-2.0: don't drain on decommissioning
-
craig[bot] authored
30614: backport-2.0: storage: fix use of context with closed trace r=tschottdorf a=mvijaykarthik Backport 1/1 commits from #25581 --- Fixes #25575 Release note: None Co-authored-by: Alex Robinson <alexdwanerobinson@gmail.com>
-
Alex Robinson authored
Backport 1/1 commits from #25581 --- Fixes #25575 Release note: None
-
Tobias Schottdorf authored
This is somehow the salient part of #28707 that I managed to miss during the initial backport of that PR: we don't want to set the node to draining when it is being decommissioned. Note that the /health?ready=1 endpoint will continue to return a 503 error once decommissioning starts due to a readiness check (something I am preparing a PR against master for). Release note: None Release note: None
-
- Sep 19, 2018
-
-
Andrew Couch authored
release-2.0: ui: fix default jobs page sort
-
Andrew Couch authored
In #18137 a column was added, but the default sort (which is specified by index) was not updated. This updates it to correctly sort on creation time. Fixes: #30418 Release note (admin ui change): Fixes an issue where the default sort on the Jobs page was incorrectly the User column rather than the Creation Time.
-
- Sep 18, 2018
-
-
Solon authored
release-2.0: distsql: close merge joiner on error
-
Solon Gordon authored
In two cases, the merge joiner was not calling `close()` after encountering an error. This could lead to it being endlessly `Next`ed by its consumer. Fixes #30363 Release note (bug fix): Fix potential infinite loop when the merge joiner encountered an error or cancellation.
-
- Sep 17, 2018
-
-
Nathan VanBenschoten authored
release-2.0: kv: check transaction status before recording refresh spans
-
Nathan VanBenschoten authored
Fixes #29253. We previously accounted for refresh spans before checking the transaction's status. This meant that we would attempt to add new refresh spans even if the txn had already been committed. If this attempt failed because the `max_refresh_spans_bytes` threshold was exceeded, we would erroneously return an error to the client even though their transaction had already been committed. The simple fix for this is to only record refresh spans for transactions that are still PENDING. The new test case would have failed with the error observed in #29253 without this fix. This raises an auxiliary question about whether we should be asserting that a transaction's status is not COMMITTED if we ever return an error to the client. cc. @andreimatei. On the positive side, this should be a minor perf win, as we'll avoid recording refresh spans on the final batch of every txn. A similar fix will need to be backported to 2.0. The backport won't be clean though because of the new txnInterceptor structure. However, the test shouldn't need to change. Release note (bug fix): Don't return transaction size limit errors for transactions that have already committed.
-
- Sep 13, 2018
-
-
kena authored
release-2.0: sql/parser: fix the action for empty rules
-
Bram Gruneir authored
release-2.0: sqlbase: don't casecade through NULL values
-
Raphael 'kena' Poss authored
A yacc-generated parser is a giant switch in a loop and the data structure is a stack implemented as indexes into an array. Each occurrence of `$$.val` is really `thestack[curpos].val`. Unless one nils things out in the rule action the previous value remains. (That's true of all yacc generators that don't provide destructors for values. GNU Bison and Lemon provide destructors, for example, goyacc and old Bison don't) Release note (bug fix): the `WITH` operand of import/export/backup/restore was sometimes processed improperly. This has been corrected. Release note (bug fix): common table expressions with `WITH` in larger queries were sometimes processed improperly. This has been corrected.
-
Bram Gruneir authored
release-2.0: sqlbase: fix composite fk column checking panic
-
- Sep 12, 2018
-
-
Bram Gruneir authored
Prior to this patch, it was possible to skip the check that ensures that all columns are present in a composite fk if only the final column was present, thus causing a panic. Fixes #26748 Release note (bug fix): Fixed a panic that occured when not all values were present in a composite foreign key.
-
Bram Gruneir authored
Prior to this fix, we would cascade updates or deletes through NULLs. See #28896 for a great test case. This however did remind me that we use MATCH FULL instead of MATCH SIMPLE and we should add support for MATCH SIMPLE (and make it the default). See #20305. Closes #28896 Release note (bug fix): ON DELETE/ON UPDATE CASCADE should not cascade through NULLs.
-
- Sep 11, 2018
-
-
Peter Mattis authored
release-2.0: fix gossip tight loopback connection issue and "impossible" assertion
-