Commits · v2.0.7 · Synced / Cockroach

This project is mirrored from https://github.com/cockroachdb/cockroach. Pull mirroring updated 9 minutes ago.

Nov 28, 2018
- Merge pull request #32507 from knz/20181120-fix-32505 · 7527d65d
  kena authored 6 years ago
  
  sql: catch invalid functions in the UPDATE SET RHS (v2.0.x)
  v2.0.7 Unverified
  
  7527d65d
Nov 27, 2018

Merge pull request #32636 from rytaft/backport2.0-32614 · 85335982
rytaft authored 6 years ago
```
release-2.0: stats,distsqlrun: reduce memory usage of CREATE STATISTICS
```
Unverified

85335982

stats,distsqlrun: reduce memory usage of CREATE STATISTICS · 452e75b8

Rebecca Taft authored 6 years ago

Prior to this patch, running CREATE STATISTICS on a large table
could cause the memory utilization of the database to grow so large
that it crashed the server. This was due to the way garbage collection
works in go. As described in this blog post:
https://blog.golang.org/go-slices-usage-and-internals,

> Re-slicing a slice doesn't make a copy of the underlying array. The
> full array will be kept in memory until it is no longer referenced.
> Occasionally this can cause the program to hold all the data in memory
> when only a small piece of it is needed.

This is exactly what was happening with CREATE STATISTICS. The CREATE
STATISTICS command performs a full table scan, and randomly samples
rows in order to build a histogram for a column. When rows are scanned
in the kv layer, they are scanned in batches of ~10,000, creating arrays
of encoded datums of length 10,000. Since the EncDatum object passed to
the sampler contains a slice referencing that underlying array, the
entire scanned batch would remain in memory as long as a single datum
from that slice was included in the sample. With random sampling, it
is likely that most of the batches have at least one sampled row, meaning
almost the entire table will be in memory until the histogram is built.

The fix is to copy the EncDatum to the sampler with only the decoded
Datum, setting the encoded []bytes slice to nil.

Release note (bug fix): Fixed an issue where calling CREATE STATISTICS
on a large table could cause the server to crash due to running out of
memory.

452e75b8

Nov 26, 2018

Merge pull request #32601 from nvanbenschoten/backport2.0-31568 · cf5093ea
Nathan VanBenschoten authored 6 years ago
```
release-2.0: storage: re-enqueue Raft groups on paginated application
```
Unverified

cf5093ea

storage: re-enqueue Raft groups on paginated application · f4008d6f

Nathan VanBenschoten authored 6 years ago

Fixes #31330.

This change re-enqueues Raft groups for processing immediately if they
still have more to do after a Raft ready iteration. This comes up in
practice when a Range has sufficient load to force Raft application
pagination. See #31330 for a discussion on the symptoms this can
cause.

Release note (bug fix): Fix bug where Raft followers could fall behind
leaders will entry application, causing stalls during splits.

f4008d6f

Nov 20, 2018

sql: catch invalid functions in the UPDATE SET RHS (v2.0.x) · f47ca677

Raphael 'kena' Poss authored 6 years ago

Prior to this patch, an invalid scalar function in the RHS of UPDATE
... SET would not be rejected during planning and cause a crash during
execution.

This patch fixes this by ensuring the bad construct is rejected during
planning.

The fix here is different from the one effected in master/v2.1.

Release note (bug fix): CockroachDB now properly rejects queries
that use an invalid function in UPDATE SET (e.g., an aggregation).

f47ca677

Nov 12, 2018

Merge pull request #32223 from andreimatei/backport2.0-32166 · 2fdb530a
Andrei Matei authored 6 years ago
```
release-2.0: storage: don't apply local results if cmd processing failed
```
Unverified

2fdb530a
storage: add a test for the clearing of local results · 09f68de1
Andrei Matei authored 6 years ago
```
... in case of rejected command.

Release note: None
```
09f68de1

storage: don't apply local results if cmd processing failed · d3d40aa3

Andrei Matei authored 6 years ago

This patch fixes a bug with the application of LocalResults: they were
applied even in cases when the processing of the respective command
failed. So, we were evaluating a batch, proposing a command, failing to
apply the command (in particular because of the lease index check), but
then still applying the LocalResult even though the replicated state was
not updated.
This bug had catastrophic consequences: part of the LocalResult is a
list of UpdatedTxn - this is used to ping the txn wait queue and unblock
waiters, communicating a final txn disposition to them. Now imagine an
EndTxn batch that evaluates to a command that fails the lease check. The
respective LocalResult is populate with out txn, updated to the
COMMITTED disposition. The command fails to process, but we still ping
the wait queue telling all the waiters that the transaction committed.
The waiters, if any, now go and send ResolveIntent requests, which
actually commit intents (thinking that the intent's txn committed). And
so we end up with "partially committed transactions" - some of the
writes from a txn are committed (the ones that happened to have waiters
on their intents), others don't.

In order for this bug to manifest itself, we need:
- read/write contention on some keys
- a command to fail processing. Generally this happens either because of
the lease index check or because of the lease check; so either a
Raft leadership change or a lease change can potentially cause the
badness. In the case of a Raft leadership change, proposals can get
dropped, which leads to reproposals, which seem to frequently(?) end up
processing at the wrong lease index and thus be rejected (and it's the
reproposal processing that triggers the bug). This kind of situation is
triggered by the Jepsen register test, with various nemeses. The lease
check failure can probably be triggered simply with lease transfers
(e.g. lease rebalancing).

Interestingly, the rate of occurence of this bug probably changed
between 2.0 and 2.1 because of the introduction, in 2.1, of the
QueryIntent request (and also, separately, because of increased usage of
lease transfers; this claim was not verified though). Speaking of the
Raft leadership change scenario, once the reproposal fail to be applied
(but the wait queue is erroneously signalled), we also re-evaluate the
EndTransaction batch. Unless something else goes wrong, in 2.0 this
re-evaluation was likely succeeding. In 2.1, however, it tends to fail
if there was a waiter on our txn because the EndTxn requests are usually
bundled with QueryIntent requests - and these QueryIntents fail during
re-evaluation because they don't find the intent - it was, by then,
errneously  committed by a waiter through a ResolveIntent. Except for
transaction that wrote to multiple ranges: I think for those the
QueryIntent requests are split off from the EndTransaction request by
the DistSender, and so we're back to the situation in 2.0.

Fixes #30792

Release note (bug fix): Fix a bug causing transactions to appear to be
partially committed. CRDB was sometimes claiming to have failed to
commit a transaction but some (or all) of its writes were actually
persisted.

d3d40aa3

Oct 25, 2018
- Merge pull request #31827 from nvanbenschoten/backport2.0-31733 · 174ceadc
  Nathan VanBenschoten authored 6 years ago
  
  release-2.0: kv: avoid transaction too large error if already refreshed
  Unverified
  
  174ceadc
Oct 24, 2018

kv: avoid transaction too large error if already refreshed · b341d2a3

Nathan VanBenschoten authored 6 years ago

This fixes a bug where a txn that already refreshed its spans
and no longer needed them could still run into a "transaction
is too large to complete" error.

This will be backported to 2.0 and 2.1.

Release note (bug fix): Fix bug where transactions unnecessarily
hit "too large" error.

b341d2a3

Oct 16, 2018

Merge pull request #31522 from benesch/backport2.0-31516 · ca7aaa3f
Nikhil Benesch authored 6 years ago
```
release-2.0: c-deps: bump CryptoPP to avoid SIGTRAP on macOS
```
Unverified

ca7aaa3f

c-deps: bump CryptoPP to avoid SIGTRAP on macOS · 54ea8d90

Nikhil Benesch authored 6 years ago

Bump CryptoPP to pick up a fix for cockroachdb/cockroach#31380.
Details reproduced below.

Fix #31380.

---

As part of its CPU feature detection, CryptoPP installs a SIGILL signal
handler before issuing the cpuid instruction. The intent is to
gracefully degrade on CPUs that don't support the cpuid instruction.

The problem is that it is impossible to safely overwrite a signal
handler installed by the Go runtime in go1.10 on macOS
(golang/go#22805). This causes CockroachDB 2.0 to crash on macOS Mojave:
cockroachdb/cockroach#31380.

The situation has improved on the Go front, as go1.11 makes it possible
to safely save and restore signal handlers installed by the Go runtime
on macOS.

Still, we can do better and support go1.10. There is no need to bother
installing a SIGILL handler, as the cpuid instruction is supported by
every x86-64 CPU. We can instead use conditional compilation to make
sure that we never execute a cpuid instruction on a non x86-64 CPU.

Note that CPU feature detection is performed at executable load time
(see the attribute(constructor) on DetectX86Features); therefore any
reference to function which calls DetectX86Features (notably HasAESNI)
corrupts the signal handler. It's not entirely clear why this corruption
later leads to the SIGTRAP seen in cockroachdb/cockroach#31380--is
something in macOS or the Go runtime generating a SIGILL and trying to
handle it gracefully?--but regardless, not mucking with the signal
handler fixes the issue.

Release note (bug fix): CockroachDB no longer crashes due to a SIGTRAP error
soon after startup on macOS Mojave (cockroachdb/cockroach#31380).

diff --git a/c-deps/cryptopp b/c-deps/cryptopp
index c621ce0532..6d6064445d 160000
--- a/c-deps/cryptopp
+++ b/c-deps/cryptopp
@@ -1 +1 @@
-Subproject commit c621ce053298fafc1e59191079c33acd76045c26
+Subproject commit 6d6064445ded787c7d6bf0a021fb718fddb63345

54ea8d90

Oct 15, 2018

Merge pull request #31350 from tschottdorf/backport2.1-31013 · 366deb27
Tobias Schottdorf authored 6 years ago
```
backport-2.0: kv: try next replica on RangeNotFoundError
```
Unverified

366deb27
Merge pull request #31351 from tschottdorf/backport2.0-29579 · a403a481
Tobias Schottdorf authored 6 years ago
```
backport-2.0: storage: return one entry less in Entries
```
Unverified

a403a481

storage: return one entry less in Entries · 25369101

Tobias Schottdorf authored 6 years ago

This works around the bug outlined in:

https://github.com/etcd-io/etcd/pull/10063

by matching Raft's internal implementation of commit pagination.
Once the above PR lands, we can revert this commit (but I assume
that it will take a little bit), and I think we should do that
because the code hasn't gotten any nicer to look at.

Fixes #28918.

Release note: None

#
# Commit message recommendations:
#
#     ---
#     <pkg>: <short description>
#
#     <long description>
#
#     Release note (category): <release note description>
#     ---
#
# Wrap long lines! 72 columns is best.
#
# The release note must be present if your commit has
# user-facing changes. Leave the default above if not.
#
# Categories for release notes:
# - cli change
# - sql change
# - admin ui change
# - general change (e.g., change of required Go version)
# - build change (e.g., compatibility with older CPUs)
# - enterprise change (e.g., change to backup/restore)
# - backwards-incompatible change
# - performance improvement
# - bug fix
#
# Commit message recommendations:
#
#     ---
#     <pkg>: <short description>
#
#     <long description>
#
#     Release note (category): <release note description>
#     ---
#
# Wrap long lines! 72 columns is best.
#
# The release note must be present if your commit has
# user-facing changes. Leave the default above if not.
#
# Categories for release notes:
# - cli change
# - sql change
# - admin ui change
# - general change (e.g., change of required Go version)
# - build change (e.g., compatibility with older CPUs)
# - enterprise change (e.g., change to backup/restore)
# - backwards-incompatible change
# - performance improvement
# - bug fix

25369101

kv: try next replica on RangeNotFoundError · 08fd28cc

Tobias Schottdorf authored 6 years ago

Previously, if a Batch RPC came back with a RangeNotFoundError,
we would immediately stop trying to send to more replicas, evict the
range descriptor, and start a new attempt after a back-off.

This new attempt could end up using the same replica, so if the
RangeNotFoundError persisted for some amount of time, so would the
unsuccessful retries for requests to it as DistSender doesn't
aggressively shuffle the replicas.

It turns out that there are such situations, and the
election-after-restart roachtest spuriously hit one of them:

1. new replica receives a preemptive snapshot and the ConfChange
2. cluster restarts
3. now the new replica is in this state until the range wakes
   up, which may not happen for some time.
4. the first request to the range runs into the above problem

@nvanbenschoten: I think there is an issue to be filed about the
tendency of DistSender to get stuck in unfortunate configurations.

Fixes #30613.

Release note (bug fix): Avoid repeatedly trying a replica that was found
to be in the process of being added.

08fd28cc

Oct 13, 2018
- Merge pull request #31310 from knz/20181012-bump-libedit3 · f28c2dee
  kena authored 6 years ago
  
  release-2.0: deps: bump go-libedit
  Unverified
  
  f28c2dee
Oct 12, 2018

deps: bump go-libedit · edd2d485

Raphael 'kena' Poss authored 6 years ago

Release note (build change): CockroachDB can now be built from source
on macOS 10.14 (Mojave).

edd2d485

Oct 05, 2018

Merge pull request #31023 from nvanbenschoten/backport2.0-30990 · 7cad30f4
Nathan VanBenschoten authored 6 years ago
```
release-2.0: storage: only track forwarded proposals that were successfully appended
```
Unverified

7cad30f4

storage: only track forwarded proposals that were successfully appended · c1aee03c

Nathan VanBenschoten authored 6 years ago

Fixes #30064.

Before this change, it was possible for a proposal that was forwarded to
a leader to be dropped instead of appended to its log. This was possible
if the leader had a `leadTransferee` set. This change avoids this issue
by verifying that the proposal was successfully added to the leaders log
before tracking it.

This will need to be backported to 2.1 and 2.0.

Release note: None

c1aee03c

Oct 01, 2018

Merge pull request #30823 from benesch/secure-grpc-2.0 · dfa0c881
Nikhil Benesch authored 6 years ago
```
release-2.0: rpc,server: authenticate all gRPC methods
```
Unverified

dfa0c881

rpc,server: authenticate all gRPC methods · d3078772

Nikhil Benesch authored 6 years ago

Previously only the roachpb.Batch RPC was correctly checking for an
authenticated user. All other RPCs were open to the public, even when
the server was running in secure mode.

To prevent future accidents of this kind, hoist the authentication check
to a gRPC interceptor that is guaranteed to run before all RPCs.

Release note (bug fix): A security vulnerability in which data could be
leaked from or tampered with in a cluster in secure mode has been fixed.

Release note: None

d3078772

Sep 25, 2018

Merge pull request #30351 from tschottdorf/backport2.0-28707-taketwo · f4ac7d47
Tobias Schottdorf authored 6 years ago
```
backport-2.0: don't drain on decommissioning
```
Unverified

f4ac7d47

Merge #30614 · 6940ef56

craig[bot] authored 6 years ago


30614: backport-2.0: storage: fix use of context with closed trace r=tschottdorf a=mvijaykarthik

Backport 1/1 commits from #25581

---

Fixes #25575

Release note: None

Co-authored-by: Alex Robinson <alexdwanerobinson@gmail.com>

6940ef56

backport-2.0: storage: fix use of context with closed trace · 29436bb4
Alex Robinson authored 6 years ago
```
Backport 1/1 commits from #25581

---

Fixes #25575

Release note: None
```
29436bb4

backport-2.0: don't drain on decommissioning · 0031a665

Tobias Schottdorf authored 6 years ago

This is somehow the salient part of #28707 that I managed to miss
during the initial backport of that PR: we don't want to set the
node to draining when it is being decommissioned.

Note that the /health?ready=1 endpoint will continue to return a
503 error once decommissioning starts due to a readiness check
(something I am preparing a PR against master for).

Release note: None

Release note: None

0031a665

Sep 19, 2018

Merge pull request #30429 from couchand/backport2.0-30423 · 68c55cca
Andrew Couch authored 6 years ago
```
release-2.0: ui: fix default jobs page sort
```
Unverified

68c55cca

ui: fix default jobs page sort · 5165c55a

Andrew Couch authored 6 years ago

In #18137 a column was added, but the default sort
(which is specified by index) was not updated.  This
updates it to correctly sort on creation time.

Fixes: #30418
Release note (admin ui change): Fixes an issue where
the default sort on the Jobs page was incorrectly
the User column rather than the Creation Time.

5165c55a

Sep 18, 2018

Merge pull request #30380 from solongordon/close-merge-joiner · 372df9c0
Solon authored 6 years ago
```
release-2.0: distsql: close merge joiner on error
```
Unverified

372df9c0

distsql: close merge joiner on error · 50c86b3c

Solon Gordon authored 6 years ago

In two cases, the merge joiner was not calling `close()` after
encountering an error. This could lead to it being endlessly `Next`ed by
its consumer.

Fixes #30363

Release note (bug fix): Fix potential infinite loop when the merge
joiner encountered an error or cancellation.

50c86b3c

Sep 17, 2018

Merge pull request #30309 from nvanbenschoten/backport2.0-29685 · 878b628b
Nathan VanBenschoten authored 6 years ago
```
release-2.0: kv: check transaction status before recording refresh spans
```
Unverified

878b628b

kv: check transaction status before recording refresh spans · 7b38b95b

Nathan VanBenschoten authored 6 years ago

Fixes #29253.

We previously accounted for refresh spans before checking the
transaction's status. This meant that we would attempt to add new
refresh spans even if the txn had already been committed. If this
attempt failed because the `max_refresh_spans_bytes` threshold was
exceeded, we would erroneously return an error to the client even
though their transaction had already been committed. The simple
fix for this is to only record refresh spans for transactions that
are still PENDING. The new test case would have failed with the
error observed in #29253 without this fix.

This raises an auxiliary question about whether we should be asserting
that a transaction's status is not COMMITTED if we ever return an
error to the client. cc. @andreimatei.

On the positive side, this should be a minor perf win, as we'll
avoid recording refresh spans on the final batch of every txn.

A similar fix will need to be backported to 2.0. The backport
won't be clean though because of the new txnInterceptor structure.
However, the test shouldn't need to change.

Release note (bug fix): Don't return transaction size limit errors
for transactions that have already committed.

7b38b95b

Sep 13, 2018

Merge pull request #30199 from knz/backport2.0-30197 · 644f0b75
kena authored 6 years ago
```
release-2.0: sql/parser: fix the action for empty rules
```
Unverified

644f0b75
Merge pull request #30129 from BramGruneir/backport2.0-30023 · 1d4580ae
Bram Gruneir authored 6 years ago
```
release-2.0: sqlbase: don't casecade through NULL values
```
Unverified

1d4580ae

sql/parser: fix the action for empty rules · 06751a69

Raphael 'kena' Poss authored 6 years ago

A yacc-generated parser is a giant switch in a loop and the data
structure is a stack implemented as indexes into an array. Each
occurrence of `$$.val` is really `thestack[curpos].val`. Unless one
nils things out in the rule action the previous value remains.

(That's true of all yacc generators that don't provide destructors for
values. GNU Bison and Lemon provide destructors, for example, goyacc
and old Bison don't)

Release note (bug fix): the `WITH` operand of
import/export/backup/restore was sometimes processed improperly. This
has been corrected.

Release note (bug fix): common table expressions with `WITH` in larger
queries were sometimes processed improperly. This has been corrected.

06751a69

Merge pull request #30154 from BramGruneir/backport2.0-30144 · 992fe256
Bram Gruneir authored 6 years ago
```
release-2.0: sqlbase: fix composite fk column checking panic
```
Unverified

992fe256

Sep 12, 2018

sqlbase: fix composite fk column checking panic · d3fd5b9b

Bram Gruneir authored 6 years ago

Prior to this patch, it was possible to skip the check that ensures that all
columns are present in a composite fk if only the final column was present,
thus causing a panic.

Fixes #26748

Release note (bug fix): Fixed a panic that occured when not all values were
present in a composite foreign key.

d3fd5b9b

sqlbase: don't casecade through NULL values · 9fe33d1e

Bram Gruneir authored 6 years ago

Prior to this fix, we would cascade updates or deletes through NULLs. See #28896
for a great test case.

This however did remind me that we use MATCH FULL instead of MATCH SIMPLE and we
should add support for MATCH SIMPLE (and make it the default).  See #20305.

Closes #28896

Release note (bug fix): ON DELETE/ON UPDATE CASCADE should not cascade through
NULLs.

9fe33d1e

Sep 11, 2018
- Merge pull request #30017 from petermattis/backport2.0-29398-30008 · 48102a5d
  Peter Mattis authored 6 years ago
  
  release-2.0: fix gossip tight loopback connection issue and "impossible" assertion
  Unverified
  
  48102a5d