Skip to content
代码片段 群组 项目
  1. 7月 14, 2020
    • Randall Hauch's avatar
      Bump version to 2.6.0 · a0b29acb
      Randall Hauch 创作于
      2.6.0-rc0
      a0b29acb
    • Chris Egerton's avatar
      KAFKA-10192: Wait for REST API to become available before testing blocked connectors (#8928) · 36c50806
      Chris Egerton 创作于
      The `testBlockInConnectorStop` test is failing semi-frequently on Jenkins. It's difficult to verify the cause without complete logs and I'm unable to reproduce locally, but I suspect the cause may be that the Connect worker hasn't completed startup yet by the time the test begins and so the initial REST request to create a connector times out with a 500 error. This isn't an issue for normal tests but we artificially reduce the REST request timeout for these tests as some requests are meant to exhaust that timeout.
      
      The changes here use a small hack to verify that the worker has started and is ready to handle all types of REST requests before tests start by querying the REST API for a non-existent connector. 
      
      Reviewers: Boyang Chan <boyang@confluent.io>, Konstantine Karantasis <k.karantasis@gmail.com>
      36c50806
    • Jim Galasyn's avatar
      KAFKA-6453: Document how timestamps are computed for aggregations and joins (#9009) · 078f7d4b
      Jim Galasyn 创作于
      Reviewer: Matthias J. Sax <matthias@confluent.io>
      078f7d4b
  2. 7月 12, 2020
  3. 7月 11, 2020
  4. 7月 10, 2020
  5. 7月 09, 2020
    • Guozhang Wang's avatar
      KAFKA-10134: Use long poll if we do not have fetchable partitions (#8934) · d62c3329
      Guozhang Wang 创作于
      The intention of using poll(0) is to not block on rebalance but still return some data; however, `updateAssignmentMetadataIfNeeded` have three different logic: 1) discover coordinator if necessary, 2) join-group if necessary, 3) refresh metadata and fetch position if necessary. We only want to make 2) to be non-blocking but not others, since e.g. when the coordinator is down, then heartbeat would expire and cause the consumer to fetch with timeout 0 as well, causing unnecessarily high CPU.
      
      Since splitting this function is a rather big change to make as a last minute blocker fix for 2.6, so I made a smaller change to make updateAssignmentMetadataIfNeeded has an optional boolean flag to indicate if 2) above should wait until either expired or complete, otherwise do not wait on the join-group future and just poll with zero timer.
      
      Reviewers: Jason Gustafson <jason@confluent.io>
      d62c3329
  6. 7月 07, 2020
    • Boyang Chen's avatar
      KAFKA-10239: Make GroupInstanceId ignorable in DescribeGroups (#8989) · 5281df14
      Boyang Chen 创作于
      This is a bug fix for older admin clients using static membership and call DescribeGroups.
      By making groupInstanceId ignorable, it would not crash upon handling the response.
      
      Added test coverages for DescribeGroups, and some side cleanups.
      
      Reviewers: Jason Gustafson <jason@confluent.io>
      5281df14
    • Jim Galasyn's avatar
      MINOR: document timestamped state stores (#8920) · 76f490e7
      Jim Galasyn 创作于
      Reviewers: Matthias J. Sax <matthias@confluent.io>
      76f490e7
    • A. Sophie Blee-Goldman's avatar
      KAFKA-10166: checkpoint recycled standbys and ignore empty rocksdb base directory (#8962) · 5bea1a42
      A. Sophie Blee-Goldman 创作于
      Two more edge cases I found producing extra TaskcorruptedException while playing around with the failing eos-beta upgrade test (sadly these are unrelated problems, as the test still fails with these fixes in place).
      
      * Need to write the checkpoint when recycling a standby: although we do preserve the changelog offsets when recycling a task, and should therefore write the offsets when the new task is itself closed, we do NOT write the checkpoint for uninitialized tasks. So if the new task is ultimately closed before it gets out of the CREATED state, the offsets will not be written and we can get a TaskCorruptedException
      * We do not write the checkpoint file if the current offset map is empty; however for eos the checkpoint file is not only used for restoration but also for clean shutdown. Although skipping a dummy checkpoint file does not actually violate any correctness since we are going to re-bootstrap from the log-start-offset anyways, it throws unnecessary TaskCorruptedException which has an overhead itself.
      
      Reviewers: John Roesler <vvcephei@apache.org>, Guozhang Wang <wangguoz@gmail.com>
      5bea1a42
    • Ismael Juma's avatar
      KAFKA-8398: Prevent NPE in `forceUnmap` (#8983) · 92c23d2d
      Ismael Juma 创作于
      
      Without this change, we would catch the NPE and log it.
      This was misleading and could cause excessive log
      volume.
      
      The NPE can happen after `AlterReplicaLogDirs` completes
      successfully and when unmapping older regions. Example
      stacktrace:
      
      ```text
      [2019-05-20 14:08:13,999] ERROR Error unmapping index /tmp/kafka-logs/test-0.567a0d8ff88b45ab95794020d0b2e66f-delete/00000000000000000000.index (kafka.log.OffsetIndex)
      java.lang.NullPointerException
      at org.apache.kafka.common.utils.MappedByteBuffers.unmap(MappedByteBuffers.java:73)
      at kafka.log.AbstractIndex.forceUnmap(AbstractIndex.scala:318)
      at kafka.log.AbstractIndex.safeForceUnmap(AbstractIndex.scala:308)
      at kafka.log.AbstractIndex.$anonfun$closeHandler$1(AbstractIndex.scala:257)
      at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
      at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
      at kafka.log.AbstractIndex.closeHandler(AbstractIndex.scala:257)
      at kafka.log.AbstractIndex.deleteIfExists(AbstractIndex.scala:226)
      at kafka.log.LogSegment.$anonfun$deleteIfExists$6(LogSegment.scala:597)
      at kafka.log.LogSegment.delete$1(LogSegment.scala:585)
      at kafka.log.LogSegment.$anonfun$deleteIfExists$5(LogSegment.scala:597)
      at kafka.utils.CoreUtils$.$anonfun$tryAll$1(CoreUtils.scala:115)
      at kafka.utils.CoreUtils$.$anonfun$tryAll$1$adapted(CoreUtils.scala:114)
      at scala.collection.immutable.List.foreach(List.scala:392)
      at kafka.utils.CoreUtils$.tryAll(CoreUtils.scala:114)
      at kafka.log.LogSegment.deleteIfExists(LogSegment.scala:599)
      at kafka.log.Log.$anonfun$delete$3(Log.scala:1762)
      at kafka.log.Log.$anonfun$delete$3$adapted(Log.scala:1762)
      at scala.collection.Iterator.foreach(Iterator.scala:941)
      at scala.collection.Iterator.foreach$(Iterator.scala:941)
      at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
      at scala.collection.IterableLike.foreach(IterableLike.scala:74)
      at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
      at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
      at kafka.log.Log.$anonfun$delete$2(Log.scala:1762)
      at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
      at kafka.log.Log.maybeHandleIOException(Log.scala:2013)
      at kafka.log.Log.delete(Log.scala:1759)
      at kafka.log.LogManager.deleteLogs(LogManager.scala:761)
      at kafka.log.LogManager.$anonfun$deleteLogs$6(LogManager.scala:775)
      at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)
      at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:63)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)
      ```
      
      Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
      
      Co-authored-by: default avatarJaikiran Pai <jaikiran.pai@gmail.com>
      92c23d2d
    • Guozhang Wang's avatar
      06cbffe1
    • A. Sophie Blee-Goldman's avatar
      KAFKA-10017: fix flaky EosBetaUpgradeIntegrationTest (#8963) · 72dd93a3
      A. Sophie Blee-Goldman 创作于
      The current failures we're seeing with this test are due to faulty assumptions that it makes and not any real bug in eos-beta (at least, from what I've seen so far).
      
      The test relies on tightly controlling the commits, which it does by setting the commit interval to MAX_VALUE and manually requesting commits on the context. In two phases, the test assumes that any pending data will be committed after a rebalance. But we actually take care to avoid unnecessary commits -- with eos-alpha, we only commit tasks that are revoked while in eos-beta we must commit all tasks if any are revoked, but only if the revoked tasks themselves need a commit.
      
      The failure we see occurs when we try to verify the committed data after a second client is started and the group rebalances. The already-running client has to give up two tasks to the newly started client, but those tasks may not need to be committed in which case none of the tasks would be. So we still have an open transaction on the partitions where we try to read committed data.
      
      Reviewers: John Roesler <john@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
      72dd93a3
    • Jason Gustafson's avatar
      MINOR: Improve logging around initial log loading (#8970) · b3b0510a
      Jason Gustafson 创作于
      Users often get confused after an unclean shutdown when log recovery takes a long time. This patch attempts to make the logging clearer and provide a simple indication of loading progress.
      
      Reviewers: Ismael Juma <ismael@juma.me.uk>
      b3b0510a
  7. 7月 04, 2020
  8. 7月 02, 2020
    • Ismael Juma's avatar
      MINOR: Update Netty to 4.1.50.Final (#8972) · 222f7337
      Ismael Juma 创作于
      This includes important fixes. Netty is required by ZooKeeper if TLS is
      enabled.
      
      I verified that the netty jars were changed from 4.1.48 to 4.1.50 with
      this PR, `find . -name '*netty*'`:
      
      ```text
      ./core/build/dependant-libs-2.13.3/netty-handler-4.1.50.Final.jar
      ./core/build/dependant-libs-2.13.3/netty-transport-native-epoll-4.1.50.Final.jar
      ./core/build/dependant-libs-2.13.3/netty-codec-4.1.50.Final.jar
      ./core/build/dependant-libs-2.13.3/netty-transport-native-unix-common-4.1.50.Final.jar
      ./core/build/dependant-libs-2.13.3/netty-transport-4.1.50.Final.jar
      ./core/build/dependant-libs-2.13.3/netty-resolver-4.1.50.Final.jar
      ./core/build/dependant-libs-2.13.3/netty-buffer-4.1.50.Final.jar
      ./core/build/dependant-libs-2.13.3/netty-common-4.1.50.Final.jar
      ```
      
      Note that the previous netty exclude no longer worked since we upgraded
      to ZooKeeper 3.5.x as it switched to Netty 4 which has different module names.
      Also, the Netty dependency is needed by ZooKeeper for TLS support so we
      cannot exclude it.
      
      Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
      222f7337
  9. 7月 01, 2020
  10. 6月 30, 2020
    • David Jacot's avatar
      KAFKA-10212: Describing a topic with the TopicCommand fails if unauthorised to... · ecc77f74
      David Jacot 创作于
      KAFKA-10212: Describing a topic with the TopicCommand fails if unauthorised to use ListPartitionReassignments API
      
      Since https://issues.apache.org/jira/browse/KAFKA-8834
      
      , describing topics with the TopicCommand requires privileges to use ListPartitionReassignments or fails to describe the topics with the following error:
      
      > Error while executing topic command : Cluster authorization failed.
      
      This is a quite hard restriction has most of the secure clusters do not authorize non admin members to access ListPartitionReassignments.
      
      This patch catches the `ClusterAuthorizationException` exception and gracefully fails back. We already do this when the API is not available so it remains consistent.
      
      Author: David Jacot <djacot@confluent.io>
      
      Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
      
      Closes #8947 from dajac/KAFKA-10212
      
      (cherry picked from commit 4be4420b)
      Signed-off-by: default avatarManikumar Reddy <mkumar@xtreems.local>
      ecc77f74
    • showuon's avatar
      KAFKA-9509: Increase timeout when consuming records to fix flaky test in MM2 (#8894) · 550ad76b
      showuon 创作于
      A simple increase in the timeout of the consumer that verifies that records have been replicated seems to fix the integration tests in `MirrorConnectorsIntegrationTest` that have been failing more often recently. 
      
      Reviewers: Ryanne Dolan <ryannedolan@gmail.com>, Sanjana Kaundinya <skaundinya@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, Konstantine Karantasis <konstantine@confluent.io>
      550ad76b
  11. 6月 28, 2020
  12. 6月 27, 2020
    • John Roesler's avatar
      KAFKA-10173: Fix suppress changelog binary schema compatibility (#8905) · 20d171e7
      John Roesler 创作于
      We inadvertently changed the binary schema of the suppress buffer changelog
      in 2.4.0 without bumping the schema version number. As a result, it is impossible
      to upgrade from 2.3.x to 2.4+ if you are using suppression.
      
      * Refactor the schema compatibility test to use serialized data from older versions
      as a more foolproof compatibility test.
      * Refactor the upgrade system test to use the smoke test application so that we
      actually exercise a significant portion of the Streams API during upgrade testing
      * Add more recent versions to the upgrade system test matrix
      * Fix the compatibility bug by bumping the schema version to 3
      
      Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
      20d171e7
    • A. Sophie Blee-Goldman's avatar
      KAFKA-10166: always write checkpoint before closing an (initialized) task (#8926) · ef6a832f
      A. Sophie Blee-Goldman 创作于
      This should address at least some of the excessive TaskCorruptedExceptions we've been seeing lately. Basically, at the moment we only commit tasks if commitNeeded is true -- this seems obvious by definition. But the problem is we do some essential cleanup in postCommit that should always be done before a task is closed:
      
      * clear the PartitionGroup
      * write the checkpoint
      
      The second is actually fine to skip when commitNeeded = false with ALOS, as we will have already written a checkpoint during the last commit. But for EOS, we only write the checkpoint before a close -- so even if there is no new pending data since the last commit, we have to write the current offsets. If we don't, the task will be assumed dirty and we will run into our friend the TaskCorruptedException during (re)initialization.
      
      To fix this, we should just always call prepareCommit and postCommit at the TaskManager level. Within the task, it can decide whether or not to actually do something in those methods based on commitNeeded.
      
      One subtle issue is that we still need to avoid checkpointing a task that was still in CREATED, to avoid potentially overwriting an existing checkpoint with uninitialized empty offsets. Unfortunately we always suspend a task before closing and committing, so we lose the information about whether the task as in CREATED or RUNNING/RESTORING by the time we get to the checkpoint. For this we introduce a special flag to keep track of whether a suspended task should actually be checkpointed or not
      
      Reviewers: Guozhang Wang <wangguoz@gmail.com>
      ef6a832f
  13. 6月 25, 2020
    • A. Sophie Blee-Goldman's avatar
      KAFKA-10198: guard against recycling dirty state (#8924) · 2c063202
      A. Sophie Blee-Goldman 创作于
      We just needed to add the check in StreamTask#closeClean to closeAndRecycleState as well. I also renamed closeAndRecycleState to closeCleanAndRecycleState to drive this point home: it needs to be clean.
      
      This should be cherry-picked back to the 2.6 branch
      
      Reviewers: Matthias J. Sax <matthias@confluent.io>, John Roesler <john@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, 
      2c063202
  14. 6月 24, 2020
    • A. Sophie Blee-Goldman's avatar
      KAFKA-10169: swallow non-fatal KafkaException and don't abort transaction... · d609aef9
      A. Sophie Blee-Goldman 创作于
      KAFKA-10169: swallow non-fatal KafkaException and don't abort transaction during clean close (#8900)
      
      If there's any pending data and we haven't flushed the producer when we abort a transaction, a KafkaException is returned for the previous send. This is a bit misleading, since the situation is not an unrecoverable error and so the Kafka Exception is really non-fatal. For now, we should just catch and swallow this in the RecordCollector (see also: KAFKA-10169)
      
      The reason we ended up aborting an un-flushed transaction was due to the combination of
      a. always aborting the ongoing transaction when any task is closed/revoked
      b. only committing (and flushing) if at least one of the revoked tasks needs to be committed (regardless of whether any non-revoked tasks have data/transaction in flight)
      
      Given the above, we can end up with an ongoing transaction that isn't committed since none of the revoked tasks have any data in the transaction. We then abort the transaction anyway, when those tasks are closed. So in addition to the above (swallowing this exception), we should avoid unnecessarily aborting data for tasks that haven't been revoked.
      
      We can handle this by splitting the RecordCollector's close into a dirty and clean flavor: if dirty, we need to abort the transaction since it may be dirty due to the commit attempt failing. But if clean, we can skip aborting the transaction since we know that either we just committed and thus there is no ongoing transaction to abort, or else the transaction in flight contains no data from the tasks being closed
      
      Note that this means we still abort the transaction any time a task is closed dirty, so we must close/reinitialize any active task with pending data (that was aborted).
      
      In sum:
      
      * hackily check the KafkaException message and swallow
      * only abort the transaction during a dirty close
      * refactor shutdown to make sure we don't closeClean a task whose data was actually aborted
      
      Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Boyang Chen <boyang@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
      d609aef9
    • feyman2016's avatar
      KAFKA-10135: Extract Task#executeAndMaybeSwallow to be a general utility... · e22a9c67
      feyman2016 创作于
      KAFKA-10135: Extract Task#executeAndMaybeSwallow to be a general utility function into TaskManager… (#8887)
      
      Extract Task#executeAndMaybeSwallow to be a general utility function into TaskManager
      e22a9c67
  15. 6月 20, 2020
    • John Roesler's avatar
      KAFKA-10185: Restoration info logging (#8896) · 2f59188d
      John Roesler 创作于
      Reviewers: Guozhang Wang <wangguoz@gmail.com>
      2f59188d
    • Matthias J. Sax's avatar
      KAFKA-9891: add integration tests for EOS and StandbyTask (#8890) · 69a4ee2a
      Matthias J. Sax 创作于
      Ports the test from #8886 to trunk -- this should be merged to 2.6 branch.
      
      One open question. In 2.6 and trunk we rely on the active tasks to wipe out the store if it crashes. However, assume there is a hard JVM crash and we don't call closeDirty() the store would not be wiped out. Thus, I am wondering, if we would need to fix this (for both active and standby tasks) and do a check on startup if a local store must be wiped out?
      
      The current test passes, as we do a proper cleanup after the exception is thrown.
      
      Reviewers: Guozhang Wang <wangguoz@gmail.com>
      69a4ee2a
  16. 6月 19, 2020
    • Jason Gustafson's avatar
      KAFKA-10113; Specify fetch offsets correctly in `LogTruncationException` (#8822) · cb379fb5
      Jason Gustafson 创作于
      This patch fixes a bug in the constructor of `LogTruncationException`. We were passing the divergent offsets to the super constructor as the fetch offsets. There is no way to fix this without breaking compatibility, but the harm is probably minimal since this exception was not getting raised properly until KAFKA-9840 anyway.
      
      Note that I have also moved the check for unknown offset and epoch into `SubscriptionState`, which ensures that the partition is still awaiting validation and that the fetch offset hasn't changed. Finally, I made some minor improvements to the logging and exception messages to ensure that we always have the fetch offset and epoch as well as the divergent offset and epoch included.
      
      Reviewers: Boyang Chen <boyang@confluent.io>, David Arthur <mumrah@gmail.com>
      cb379fb5
    • Guozhang Wang's avatar
      KAFKA-10167: use the admin client to read end-offset (#8876) · 5ca6d322
      Guozhang Wang 创作于
      Since admin client allows use to use flexible offset-spec, we can always set to use read-uncommitted regardless of the EOS config.
      
      Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Bruno Cadonna <bruno@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
      5ca6d322
  17. 6月 18, 2020
    • Ego's avatar
      MINOR: Upgrade ducktape to 0.7.8 (#8879) · 1938c962
      Ego 创作于
      Newer version of ducktape that updates some dependencies and adds some features. You can see that diff here:
      
      https://github.com/confluentinc/ducktape/compare/v0.7.7...v0.7.8
      
      Reviewer: Konstantine Karantasis <konstantine@confluent.io>
      1938c962
    • David Arthur's avatar
      KAFKA-10123; Fix incorrect value for AWAIT_RESET#hasPosition (#8841) · 1f0c8471
      David Arthur 创作于
      ## Background
      
      When a partition subscription is initialized it has a `null` position and is in the INITIALIZING state. Depending on the consumer, it will then transition to one of the other states. Typically a consumer will either reset the offset to earliest/latest, or it will provide an offset (with or without offset metadata). For the reset case, we still have no position to act on so fetches should not occur.
      
      Recently we made changes for KAFKA-9724 (#8376) to prevent clients from entering the AWAIT_VALIDATION state when targeting older brokers. New logic to bypass offset validation as part of this change exposed this new issue.
      
      ## Bug and Fix
      
      In the partition subscriptions, the AWAIT_RESET state was incorrectly reporting that it had a position. In some cases a position might actually exist (e.g., if we were resetting offsets during a fetch after a truncation), but in the initialization case no position had been set. We saw this issue in system tests where there is a race between the offset reset completing and the first fetch request being issued.
      
      Since AWAIT_RESET#hasPosition was incorrectly returning `true`, the new logic to bypass offset validation was transitioning the subscription to FETCHING (even though no position existed).
      
      The fix was simply to have AWAIT_RESET#hasPosition to return `false` which should have been the case from the start. Additionally, this fix includes some guards against NPE when reading the position from the subscription.
      
      Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Jason Gustafson <jason@confluent.io>
      1f0c8471
    • Chia-Ping Tsai's avatar
      MINOR: reuse toConfigObject(Map) to generate Config (#8889) · 8a66e569
      Chia-Ping Tsai 创作于
      Author: Chia-Ping Tsai <chia7712@gmail.com>
      Reviewers: Randall Hauch <rhauch@gmail.com>, David Jacot <david.jacot@gmail.com>
      8a66e569
  18. 6月 17, 2020
    • Chia-Ping Tsai's avatar
      KAFKA-10147 MockAdminClient#describeConfigs(Collection<ConfigResource>) is... · be695c43
      Chia-Ping Tsai 创作于
      KAFKA-10147 MockAdminClient#describeConfigs(Collection<ConfigResource>) is unable to handle broker resource (#8853)
      
      Author: Chia-Ping Tsai <chia7712@gmail.com>
      Reviewers: Boyang Chen <boyang@confluent.io>, Randall Hauch <rhauch@gmail.com>
      be695c43
    • John Roesler's avatar
      MINOR: Fix flaky HighAvailabilityTaskAssignorIntegrationTest (#8884) · f9bda448
      John Roesler 创作于
      Reduce test data set from 1000 records to 500.
      Some recent test failures indicate that the Jenkins runners aren't
      able to process all 1000 records in two minutes.
      
      Also add sanity check that all the test data are readable from the
      input topic.
      
      Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>
      f9bda448
    • John Roesler's avatar
      KAFKA-10165: Remove Percentiles from e2e metrics (#8882) · f5cbbe79
      John Roesler 创作于
      * Remove problematic Percentiles measurements until the implementation is fixed
      * Fix leaking e2e metrics when task is closed
      * Fix leaking metrics when tasks are recycled
      
      Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>
      f5cbbe79
    • A. Sophie Blee-Goldman's avatar
      KAFKA-10150: task state transitions/management and committing cleanup (#8856) · 61287c06
      A. Sophie Blee-Goldman 创作于
      * KAFKA-10150: always transition to SUSPENDED during suspend, no matter the current state only call prepareCommit before closing if task.commitNeeded is true
      
      * Don't commit any consumed offsets during handleAssignment -- revoked active tasks (and any others that need committing) will be committed during handleRevocation so we only need to worry about cleaning them up in handleAssignment
      
      * KAFKA-10152: when recycling a task we should always commit consumed offsets (if any), but don't need to write the checkpoint (since changelog offsets are preserved across task transitions)
      
      * Make sure we close all tasks during shutdown, even if an exception is thrown during commit
      
      Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
      61287c06
    • Guozhang Wang's avatar
      KAFKA-10169: Error message when transit to Aborting / AbortableError / FatalError (#8880) · 564ac462
      Guozhang Wang 创作于
      Reviewers: John Roesler <vvcephei@apache.org>
      564ac462
加载中