提交 · 2.6.0-rc0 · Archie Kelly / kafka

7月 14, 2020

Bump version to 2.6.0 · a0b29acb
由 Randall Hauch 创作于 4年前

2.6.0-rc0

a0b29acb

KAFKA-10192: Wait for REST API to become available before testing blocked connectors (#8928) · 36c50806

由 Chris Egerton 创作于 4年前

The `testBlockInConnectorStop` test is failing semi-frequently on Jenkins. It's difficult to verify the cause without complete logs and I'm unable to reproduce locally, but I suspect the cause may be that the Connect worker hasn't completed startup yet by the time the test begins and so the initial REST request to create a connector times out with a 500 error. This isn't an issue for normal tests but we artificially reduce the REST request timeout for these tests as some requests are meant to exhaust that timeout.

The changes here use a small hack to verify that the worker has started and is ready to handle all types of REST requests before tests start by querying the REST API for a non-existent connector.

Reviewers: Boyang Chan <boyang@confluent.io>, Konstantine Karantasis <k.karantasis@gmail.com>

36c50806

KAFKA-6453: Document how timestamps are computed for aggregations and joins (#9009) · 078f7d4b
由 Jim Galasyn 创作于 4年前
```
Reviewer: Matthias J. Sax <matthias@confluent.io>
```
078f7d4b

7月 12, 2020
- KAFKA-10247: Correctly reset state when task is corrupted (#8994) · 57a9797f
  由 John Roesler 创作于 4年前
  
  Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Matthias J. Sax <matthias@confluent.io>
  57a9797f
- KAFKA-10262: Ensure that creating task directory is thread safe (#9010) · 25e85572
  由 Matthias J. Sax 创作于 4年前
  
  Reviewers: A. Sophie Blee-Goldman <sohpie@confluent.io>, John Roesler <john@confluent.io>
  25e85572
7月 11, 2020

KAFKA-10263: Do not assign standby for revoking stateless tasks (#9005) · eee2ec5d

由 Guozhang Wang 创作于 4年前

Also piggy-back a small fix to use TreeMap other than HashMap to preserve iteration ordering.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, John Roesler <vvcephei@apache.org>

eee2ec5d

KAFKA-10191 fix flaky StreamsOptimizedTest (#8913) · 683208c5

由 Chia-Ping Tsai 创作于 4年前

Call KafkaStreams#cleanUp to reset local state before starting application up the second run.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Boyang Chen <boyang@confluent.io>, John Roesler <john@confluent.io>

683208c5

7月 10, 2020

KAFKA-10249: don't try to read un-checkpointed offsets of in-memory stores (#8996) · 080a331a

由 A. Sophie Blee-Goldman 创作于 4年前

Fixes an asymmetry in which we avoid writing checkpoints for non-persistent stores, but still expected to read them, resulting in a spurious TaskCorruptedException.

Reviewers: Matthias J. Sax <mjsax@apache.org>, John Roesler <vvcephei@apache.org>

080a331a

7月 09, 2020

KAFKA-10134: Use long poll if we do not have fetchable partitions (#8934) · d62c3329

由 Guozhang Wang 创作于 4年前

The intention of using poll(0) is to not block on rebalance but still return some data; however, `updateAssignmentMetadataIfNeeded` have three different logic: 1) discover coordinator if necessary, 2) join-group if necessary, 3) refresh metadata and fetch position if necessary. We only want to make 2) to be non-blocking but not others, since e.g. when the coordinator is down, then heartbeat would expire and cause the consumer to fetch with timeout 0 as well, causing unnecessarily high CPU.

Since splitting this function is a rather big change to make as a last minute blocker fix for 2.6, so I made a smaller change to make updateAssignmentMetadataIfNeeded has an optional boolean flag to indicate if 2) above should wait until either expired or complete, otherwise do not wait on the join-group future and just poll with zero timer.

Reviewers: Jason Gustafson <jason@confluent.io>

d62c3329

7月 07, 2020

KAFKA-10239: Make GroupInstanceId ignorable in DescribeGroups (#8989) · 5281df14

由 Boyang Chen 创作于 4年前

This is a bug fix for older admin clients using static membership and call DescribeGroups.
By making groupInstanceId ignorable, it would not crash upon handling the response.

Added test coverages for DescribeGroups, and some side cleanups.

Reviewers: Jason Gustafson <jason@confluent.io>

5281df14

MINOR: document timestamped state stores (#8920) · 76f490e7
由 Jim Galasyn 创作于 4年前
```
Reviewers: Matthias J. Sax <matthias@confluent.io>
```
76f490e7

KAFKA-10166: checkpoint recycled standbys and ignore empty rocksdb base directory (#8962) · 5bea1a42

由 A. Sophie Blee-Goldman 创作于 4年前

Two more edge cases I found producing extra TaskcorruptedException while playing around with the failing eos-beta upgrade test (sadly these are unrelated problems, as the test still fails with these fixes in place).

* Need to write the checkpoint when recycling a standby: although we do preserve the changelog offsets when recycling a task, and should therefore write the offsets when the new task is itself closed, we do NOT write the checkpoint for uninitialized tasks. So if the new task is ultimately closed before it gets out of the CREATED state, the offsets will not be written and we can get a TaskCorruptedException
* We do not write the checkpoint file if the current offset map is empty; however for eos the checkpoint file is not only used for restoration but also for clean shutdown. Although skipping a dummy checkpoint file does not actually violate any correctness since we are going to re-bootstrap from the log-start-offset anyways, it throws unnecessary TaskCorruptedException which has an overhead itself.

Reviewers: John Roesler <vvcephei@apache.org>, Guozhang Wang <wangguoz@gmail.com>

5bea1a42

KAFKA-8398: Prevent NPE in `forceUnmap` (#8983) · 92c23d2d

由 Ismael Juma 创作于 4年前


Without this change, we would catch the NPE and log it.
This was misleading and could cause excessive log
volume.

The NPE can happen after `AlterReplicaLogDirs` completes
successfully and when unmapping older regions. Example
stacktrace:

```text
[2019-05-20 14:08:13,999] ERROR Error unmapping index /tmp/kafka-logs/test-0.567a0d8ff88b45ab95794020d0b2e66f-delete/00000000000000000000.index (kafka.log.OffsetIndex)
java.lang.NullPointerException
at org.apache.kafka.common.utils.MappedByteBuffers.unmap(MappedByteBuffers.java:73)
at kafka.log.AbstractIndex.forceUnmap(AbstractIndex.scala:318)
at kafka.log.AbstractIndex.safeForceUnmap(AbstractIndex.scala:308)
at kafka.log.AbstractIndex.$anonfun$closeHandler$1(AbstractIndex.scala:257)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
at kafka.log.AbstractIndex.closeHandler(AbstractIndex.scala:257)
at kafka.log.AbstractIndex.deleteIfExists(AbstractIndex.scala:226)
at kafka.log.LogSegment.$anonfun$deleteIfExists$6(LogSegment.scala:597)
at kafka.log.LogSegment.delete$1(LogSegment.scala:585)
at kafka.log.LogSegment.$anonfun$deleteIfExists$5(LogSegment.scala:597)
at kafka.utils.CoreUtils$.$anonfun$tryAll$1(CoreUtils.scala:115)
at kafka.utils.CoreUtils$.$anonfun$tryAll$1$adapted(CoreUtils.scala:114)
at scala.collection.immutable.List.foreach(List.scala:392)
at kafka.utils.CoreUtils$.tryAll(CoreUtils.scala:114)
at kafka.log.LogSegment.deleteIfExists(LogSegment.scala:599)
at kafka.log.Log.$anonfun$delete$3(Log.scala:1762)
at kafka.log.Log.$anonfun$delete$3$adapted(Log.scala:1762)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at kafka.log.Log.$anonfun$delete$2(Log.scala:1762)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at kafka.log.Log.maybeHandleIOException(Log.scala:2013)
at kafka.log.Log.delete(Log.scala:1759)
at kafka.log.LogManager.deleteLogs(LogManager.scala:761)
at kafka.log.LogManager.$anonfun$deleteLogs$6(LogManager.scala:775)
at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)
at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:63)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
```

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>

Co-authored-by: Jaikiran Pai <jaikiran.pai@gmail.com>

92c23d2d

HOTFIX: re-enable eosBeta upgrade test in 2.6 · 06cbffe1
由 Guozhang Wang 创作于 4年前

06cbffe1

KAFKA-10017: fix flaky EosBetaUpgradeIntegrationTest (#8963) · 72dd93a3

由 A. Sophie Blee-Goldman 创作于 4年前

The current failures we're seeing with this test are due to faulty assumptions that it makes and not any real bug in eos-beta (at least, from what I've seen so far).

The test relies on tightly controlling the commits, which it does by setting the commit interval to MAX_VALUE and manually requesting commits on the context. In two phases, the test assumes that any pending data will be committed after a rebalance. But we actually take care to avoid unnecessary commits -- with eos-alpha, we only commit tasks that are revoked while in eos-beta we must commit all tasks if any are revoked, but only if the revoked tasks themselves need a commit.

The failure we see occurs when we try to verify the committed data after a second client is started and the group rebalances. The already-running client has to give up two tasks to the newly started client, but those tasks may not need to be committed in which case none of the tasks would be. So we still have an open transaction on the partitions where we try to read committed data.

Reviewers: John Roesler <john@confluent.io>, Guozhang Wang <wangguoz@gmail.com>

72dd93a3

MINOR: Improve logging around initial log loading (#8970) · b3b0510a

由 Jason Gustafson 创作于 4年前

Users often get confused after an unclean shutdown when log recovery takes a long time. This patch attempts to make the logging clearer and provide a simple indication of loading progress.

Reviewers: Ismael Juma <ismael@juma.me.uk>

b3b0510a

7月 04, 2020

KAFKA-10209: Fix connect_rest_test.py after the introduction of new connector configs (#8944) · 45a0d713

由 Chia-Ping Tsai 创作于 4年前

There are two new configs introduced by 371f14c3 and 1c4eb1a5 so we have to update the expected configs in the connect_rest_test.py system test too.

Reviewer: Konstantine Karantasis <konstantine@confluent.io>

45a0d713

7月 02, 2020

MINOR: Update Netty to 4.1.50.Final (#8972) · 222f7337

由 Ismael Juma 创作于 4年前

This includes important fixes. Netty is required by ZooKeeper if TLS is
enabled.

I verified that the netty jars were changed from 4.1.48 to 4.1.50 with
this PR, `find . -name '*netty*'`:

```text
./core/build/dependant-libs-2.13.3/netty-handler-4.1.50.Final.jar
./core/build/dependant-libs-2.13.3/netty-transport-native-epoll-4.1.50.Final.jar
./core/build/dependant-libs-2.13.3/netty-codec-4.1.50.Final.jar
./core/build/dependant-libs-2.13.3/netty-transport-native-unix-common-4.1.50.Final.jar
./core/build/dependant-libs-2.13.3/netty-transport-4.1.50.Final.jar
./core/build/dependant-libs-2.13.3/netty-resolver-4.1.50.Final.jar
./core/build/dependant-libs-2.13.3/netty-buffer-4.1.50.Final.jar
./core/build/dependant-libs-2.13.3/netty-common-4.1.50.Final.jar
```

Note that the previous netty exclude no longer worked since we upgraded
to ZooKeeper 3.5.x as it switched to Netty 4 which has different module names.
Also, the Netty dependency is needed by ZooKeeper for TLS support so we
cannot exclude it.

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>

222f7337

7月 01, 2020

KAFKA-10153: Error Reporting in Connect Documentation (#8858) · a677522d

由 Aakash Shah 创作于 4年前

Added a section about error reporting in Connect documentation, and another about how to safely use the new errant record reporter in SinkTask implementations.

Author: Aakash Shah <ashah@confluent.io>
Reviewer: Randall Hauch <rhauch@gmail.com>

a677522d

KAFKA-10214: Fix zookeeper_tls_test.py system test · e5a18ee7

由 Chia-Ping Tsai 创作于 4年前

After 3661f981 security_config is cached. Hence, the later changes to security flag can't impact the security_config used by later tests.

issue: https://issues.apache.org/jira/browse/KAFKA-10214



Author: Chia-Ping Tsai <chia7712@gmail.com>

Reviewers: Ron Dagostino <rdagostino@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>

Closes #8949 from chia7712/KAFKA-10214

(cherry picked from commit 6094af89)
Signed-off-by: Manikumar Reddy <mkumar@xtreems.local>

e5a18ee7

6月 30, 2020

KAFKA-10212: Describing a topic with the TopicCommand fails if unauthorised to... · ecc77f74

由 David Jacot 创作于 4年前

KAFKA-10212: Describing a topic with the TopicCommand fails if unauthorised to use ListPartitionReassignments API

Since https://issues.apache.org/jira/browse/KAFKA-8834

, describing topics with the TopicCommand requires privileges to use ListPartitionReassignments or fails to describe the topics with the following error:

> Error while executing topic command : Cluster authorization failed.

This is a quite hard restriction has most of the secure clusters do not authorize non admin members to access ListPartitionReassignments.

This patch catches the `ClusterAuthorizationException` exception and gracefully fails back. We already do this when the API is not available so it remains consistent.

Author: David Jacot <djacot@confluent.io>

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>

Closes #8947 from dajac/KAFKA-10212

(cherry picked from commit 4be4420b)
Signed-off-by: Manikumar Reddy <mkumar@xtreems.local>

ecc77f74

KAFKA-9509: Increase timeout when consuming records to fix flaky test in MM2 (#8894) · 550ad76b

由 showuon 创作于 4年前

A simple increase in the timeout of the consumer that verifies that records have been replicated seems to fix the integration tests in `MirrorConnectorsIntegrationTest` that have been failing more often recently. 

Reviewers: Ryanne Dolan <ryannedolan@gmail.com>, Sanjana Kaundinya <skaundinya@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, Konstantine Karantasis <konstantine@confluent.io>

550ad76b

6月 28, 2020
- KAFKA-10180: Fix security_config caching in system tests (#8917) · 9d3d2a07
  由 Nikolay 创作于 4年前
  
  Reviewers: Jun Rao <junrao@gmail.com>
  9d3d2a07
6月 27, 2020

KAFKA-10173: Fix suppress changelog binary schema compatibility (#8905) · 20d171e7

由 John Roesler 创作于 4年前

We inadvertently changed the binary schema of the suppress buffer changelog
in 2.4.0 without bumping the schema version number. As a result, it is impossible
to upgrade from 2.3.x to 2.4+ if you are using suppression.

* Refactor the schema compatibility test to use serialized data from older versions
as a more foolproof compatibility test.
* Refactor the upgrade system test to use the smoke test application so that we
actually exercise a significant portion of the Streams API during upgrade testing
* Add more recent versions to the upgrade system test matrix
* Fix the compatibility bug by bumping the schema version to 3

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Guozhang Wang <wangguoz@gmail.com>

20d171e7

KAFKA-10166: always write checkpoint before closing an (initialized) task (#8926) · ef6a832f

由 A. Sophie Blee-Goldman 创作于 4年前

This should address at least some of the excessive TaskCorruptedExceptions we've been seeing lately. Basically, at the moment we only commit tasks if commitNeeded is true -- this seems obvious by definition. But the problem is we do some essential cleanup in postCommit that should always be done before a task is closed:

* clear the PartitionGroup
* write the checkpoint

The second is actually fine to skip when commitNeeded = false with ALOS, as we will have already written a checkpoint during the last commit. But for EOS, we only write the checkpoint before a close -- so even if there is no new pending data since the last commit, we have to write the current offsets. If we don't, the task will be assumed dirty and we will run into our friend the TaskCorruptedException during (re)initialization.

To fix this, we should just always call prepareCommit and postCommit at the TaskManager level. Within the task, it can decide whether or not to actually do something in those methods based on commitNeeded.

One subtle issue is that we still need to avoid checkpointing a task that was still in CREATED, to avoid potentially overwriting an existing checkpoint with uninitialized empty offsets. Unfortunately we always suspend a task before closing and committing, so we lose the information about whether the task as in CREATED or RUNNING/RESTORING by the time we get to the checkpoint. For this we introduce a special flag to keep track of whether a suspended task should actually be checkpointed or not

Reviewers: Guozhang Wang <wangguoz@gmail.com>

ef6a832f

6月 25, 2020

KAFKA-10198: guard against recycling dirty state (#8924) · 2c063202

由 A. Sophie Blee-Goldman 创作于 4年前

We just needed to add the check in StreamTask#closeClean to closeAndRecycleState as well. I also renamed closeAndRecycleState to closeCleanAndRecycleState to drive this point home: it needs to be clean.

This should be cherry-picked back to the 2.6 branch

Reviewers: Matthias J. Sax <matthias@confluent.io>, John Roesler <john@confluent.io>, Guozhang Wang <wangguoz@gmail.com>,

2c063202

6月 24, 2020

KAFKA-10169: swallow non-fatal KafkaException and don't abort transaction... · d609aef9

由 A. Sophie Blee-Goldman 创作于 4年前

KAFKA-10169: swallow non-fatal KafkaException and don't abort transaction during clean close (#8900)

If there's any pending data and we haven't flushed the producer when we abort a transaction, a KafkaException is returned for the previous send. This is a bit misleading, since the situation is not an unrecoverable error and so the Kafka Exception is really non-fatal. For now, we should just catch and swallow this in the RecordCollector (see also: KAFKA-10169)

The reason we ended up aborting an un-flushed transaction was due to the combination of
a. always aborting the ongoing transaction when any task is closed/revoked
b. only committing (and flushing) if at least one of the revoked tasks needs to be committed (regardless of whether any non-revoked tasks have data/transaction in flight)

Given the above, we can end up with an ongoing transaction that isn't committed since none of the revoked tasks have any data in the transaction. We then abort the transaction anyway, when those tasks are closed. So in addition to the above (swallowing this exception), we should avoid unnecessarily aborting data for tasks that haven't been revoked.

We can handle this by splitting the RecordCollector's close into a dirty and clean flavor: if dirty, we need to abort the transaction since it may be dirty due to the commit attempt failing. But if clean, we can skip aborting the transaction since we know that either we just committed and thus there is no ongoing transaction to abort, or else the transaction in flight contains no data from the tasks being closed

Note that this means we still abort the transaction any time a task is closed dirty, so we must close/reinitialize any active task with pending data (that was aborted).

In sum:

* hackily check the KafkaException message and swallow
* only abort the transaction during a dirty close
* refactor shutdown to make sure we don't closeClean a task whose data was actually aborted

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Boyang Chen <boyang@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>

d609aef9

KAFKA-10135: Extract Task#executeAndMaybeSwallow to be a general utility... · e22a9c67

由 feyman2016 创作于 4年前

KAFKA-10135: Extract Task#executeAndMaybeSwallow to be a general utility function into TaskManager… (#8887)

Extract Task#executeAndMaybeSwallow to be a general utility function into TaskManager

e22a9c67

6月 20, 2020

KAFKA-10185: Restoration info logging (#8896) · 2f59188d
由 John Roesler 创作于 4年前
```
Reviewers: Guozhang Wang <wangguoz@gmail.com>
```
2f59188d

KAFKA-9891: add integration tests for EOS and StandbyTask (#8890) · 69a4ee2a

由 Matthias J. Sax 创作于 4年前

Ports the test from #8886 to trunk -- this should be merged to 2.6 branch.

One open question. In 2.6 and trunk we rely on the active tasks to wipe out the store if it crashes. However, assume there is a hard JVM crash and we don't call closeDirty() the store would not be wiped out. Thus, I am wondering, if we would need to fix this (for both active and standby tasks) and do a check on startup if a local store must be wiped out?

The current test passes, as we do a proper cleanup after the exception is thrown.

Reviewers: Guozhang Wang <wangguoz@gmail.com>

69a4ee2a

6月 19, 2020

KAFKA-10113; Specify fetch offsets correctly in `LogTruncationException` (#8822) · cb379fb5

由 Jason Gustafson 创作于 4年前

This patch fixes a bug in the constructor of `LogTruncationException`. We were passing the divergent offsets to the super constructor as the fetch offsets. There is no way to fix this without breaking compatibility, but the harm is probably minimal since this exception was not getting raised properly until KAFKA-9840 anyway.

Note that I have also moved the check for unknown offset and epoch into `SubscriptionState`, which ensures that the partition is still awaiting validation and that the fetch offset hasn't changed. Finally, I made some minor improvements to the logging and exception messages to ensure that we always have the fetch offset and epoch as well as the divergent offset and epoch included.

Reviewers: Boyang Chen <boyang@confluent.io>, David Arthur <mumrah@gmail.com>

cb379fb5

KAFKA-10167: use the admin client to read end-offset (#8876) · 5ca6d322

由 Guozhang Wang 创作于 4年前

Since admin client allows use to use flexible offset-spec, we can always set to use read-uncommitted regardless of the EOS config.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Bruno Cadonna <bruno@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>

5ca6d322

6月 18, 2020

MINOR: Upgrade ducktape to 0.7.8 (#8879) · 1938c962

由 Ego 创作于 4年前

Newer version of ducktape that updates some dependencies and adds some features. You can see that diff here:

https://github.com/confluentinc/ducktape/compare/v0.7.7...v0.7.8

Reviewer: Konstantine Karantasis <konstantine@confluent.io>

1938c962

KAFKA-10123; Fix incorrect value for AWAIT_RESET#hasPosition (#8841) · 1f0c8471

由 David Arthur 创作于 4年前

## Background

When a partition subscription is initialized it has a `null` position and is in the INITIALIZING state. Depending on the consumer, it will then transition to one of the other states. Typically a consumer will either reset the offset to earliest/latest, or it will provide an offset (with or without offset metadata). For the reset case, we still have no position to act on so fetches should not occur.

Recently we made changes for KAFKA-9724 (#8376) to prevent clients from entering the AWAIT_VALIDATION state when targeting older brokers. New logic to bypass offset validation as part of this change exposed this new issue.

## Bug and Fix

In the partition subscriptions, the AWAIT_RESET state was incorrectly reporting that it had a position. In some cases a position might actually exist (e.g., if we were resetting offsets during a fetch after a truncation), but in the initialization case no position had been set. We saw this issue in system tests where there is a race between the offset reset completing and the first fetch request being issued.

Since AWAIT_RESET#hasPosition was incorrectly returning `true`, the new logic to bypass offset validation was transitioning the subscription to FETCHING (even though no position existed).

The fix was simply to have AWAIT_RESET#hasPosition to return `false` which should have been the case from the start. Additionally, this fix includes some guards against NPE when reading the position from the subscription.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Jason Gustafson <jason@confluent.io>

1f0c8471

MINOR: reuse toConfigObject(Map) to generate Config (#8889) · 8a66e569

由 Chia-Ping Tsai 创作于 4年前

Author: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Randall Hauch <rhauch@gmail.com>, David Jacot <david.jacot@gmail.com>

8a66e569

6月 17, 2020

KAFKA-10147 MockAdminClient#describeConfigs(Collection<ConfigResource>) is... · be695c43

由 Chia-Ping Tsai 创作于 4年前

KAFKA-10147 MockAdminClient#describeConfigs(Collection<ConfigResource>) is unable to handle broker resource (#8853)

Author: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Boyang Chen <boyang@confluent.io>, Randall Hauch <rhauch@gmail.com>

be695c43

MINOR: Fix flaky HighAvailabilityTaskAssignorIntegrationTest (#8884) · f9bda448

由 John Roesler 创作于 4年前

Reduce test data set from 1000 records to 500.
Some recent test failures indicate that the Jenkins runners aren't
able to process all 1000 records in two minutes.

Also add sanity check that all the test data are readable from the
input topic.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>

f9bda448

KAFKA-10165: Remove Percentiles from e2e metrics (#8882) · f5cbbe79

由 John Roesler 创作于 4年前

* Remove problematic Percentiles measurements until the implementation is fixed
* Fix leaking e2e metrics when task is closed
* Fix leaking metrics when tasks are recycled

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>

f5cbbe79

KAFKA-10150: task state transitions/management and committing cleanup (#8856) · 61287c06

由 A. Sophie Blee-Goldman 创作于 4年前

* KAFKA-10150: always transition to SUSPENDED during suspend, no matter the current state only call prepareCommit before closing if task.commitNeeded is true

* Don't commit any consumed offsets during handleAssignment -- revoked active tasks (and any others that need committing) will be committed during handleRevocation so we only need to worry about cleaning them up in handleAssignment

* KAFKA-10152: when recycling a task we should always commit consumed offsets (if any), but don't need to write the checkpoint (since changelog offsets are preserved across task transitions)

* Make sure we close all tasks during shutdown, even if an exception is thrown during commit

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>

61287c06

KAFKA-10169: Error message when transit to Aborting / AbortableError / FatalError (#8880) · 564ac462
由 Guozhang Wang 创作于 4年前
```
Reviewers: John Roesler <vvcephei@apache.org>
```
564ac462