提交 · 0924fd3f9f75c446310ed1e97b44bbc3f33c6c31 · Archie Kelly / kafka

3月 22, 2022

KAFKA-13152: Replace "buffered.records.per.partition" with "input.buffer.max.bytes" (#11796) · 0924fd3f
由 vamossagar12 创作于 2年前
```
Implements KIP-770

Reviewers: Guozhang Wang <wangguoz@gmail.com>
```
未验证

0924fd3f

MINOR: Remove scala KafkaException (#11913) · c9c03dd7

由 dengziming 创作于 2年前

Use the standard org.apache.kafka.common.KafkaException instead of kafka.common.KafkaException.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@confluent.io>

未验证

c9c03dd7

MINOR: show LogRecoveryState in MetadataShell and fix log message · d449f850

由 dengziming 创作于 2年前

Show the LeaderRecoveryState in MetadataShell.

Fix a case where we were comparing a Byte type with an enum type.

Reviewers: Colin P. McCabe <cmccabe@apache.org>

未验证

d449f850

MINOR: Bump trunk to 3.3.0-SNAPSHOT (#11925) · 4c8685e7

由 Bruno Cadonna 创作于 2年前

Version bumps on trunk following the creation of the 3.2 release branch.

Reviewer: David Jacot <djacot@confluent.io>

未验证

4c8685e7

3月 21, 2022
- MINOR: Small cleanups in the AclAuthorizer (#11921) · 72558da9
  由 David Jacot 创作于 2年前
  
  Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
  未验证
  
  72558da9
- MINOR: Pass materialized to the inner KTable instance (#11888) · e5eb180a
  由 Márton Sigmond 创作于 2年前
  
  Reviewers: Luke Chen <showuon@gmail.com>
  未验证
  
  e5eb180a
- KAFKA-7540: commit offset sync before close (#11898) · 3a8f6b17
  由 Luke Chen 创作于 2年前
  
  Reviewers: Guozhang Wang <wangguoz@gmail.com>
  未验证
  
  3a8f6b17
3月 20, 2022

KAFKA-13728: fix PushHttpMetricsReporter no longer pushes metrics when network... · 6145974f

由彭小漪创作于 2年前

KAFKA-13728: fix PushHttpMetricsReporter no longer pushes metrics when network failure is recovered. (#11879)

The class PushHttpMetricsReporter no longer pushes metrics when network failure is recovered.

I debugged the code and found the problem here: when we submit a task to the ScheduledThreadPoolExecutor that needs to be executed periodically, if the task throws an exception and is not swallowed, the task will no longer be scheduled to execute.

So when an IO exception occasionally occurs on the network, we should swallow it rather than throw it in task HttpReporter.

Reviewers: Guozhang Wang <wangguoz@gmail.com>

未验证

6145974f

3月 19, 2022

KAFKA-13682; KRaft Controller auto preferred leader election (#11893) · 8d6968e8

由 José Armando García Sancio 创作于 2年前

Implement auto leader rebalance for KRaft by keeping track of the set of topic partitions which have a leader that is not the preferred replica. If this set is non-empty then schedule a leader balance event for the replica control manager.

When applying PartitionRecords and PartitionChangeRecords to the ReplicationControlManager, if the elected leader is not the preferred replica then remember this topic partition in the set of imbalancedPartitions.

Anytime the quorum controller processes a ControllerWriteEvent it schedules a rebalance operation if the there are no pending rebalance operations, the feature is enabled and there are imbalance partitions.

This KRaft implementation only supports the configurations properties auto.leader.rebalance.enable and leader.imbalance.check.interval.seconds. The configuration property leader.imbalance.per.broker.percentage is not supported and ignored.

Reviewers: Jun Rao <junrao@gmail.com>, David Arthur <mumrah@gmail.com>

未验证

8d6968e8

KAFKA-13587; Implement leader recovery for KIP-704 (#11733) · 52621613

由 José Armando García Sancio 创作于 2年前

Implementation of the protocol for starting and stopping leader recovery after an unclean leader election. This includes the management of state in the controllers (legacy and KRaft) and propagating this information to the brokers. This change doesn't implement log recovery after an unclean leader election.

Protocol Changes
================

For the topic partition state znode, the new field "leader_recovery_state" was added. If the field is missing the value is assumed to be RECOVERED.

ALTER_PARTITION was renamed from ALTER_ISR. The CurrentIsrVersion field was renamed to PartitionEpoch. The new field LeaderRecoveryState was added.

The new field LeaderRecoverState was added to the LEADER_AND_ISR request. The inter broker protocol version is used to determine which version to send to the brokers.

A new tagged field for LeaderRecoveryState was added to both the PartitionRecord and PartitionChangeRecord.

Controller
==========

For both the KRaft and legacy controller the LeaderRecoveryState is set to RECOVERING, if the leader was elected out of the ISR, also known as unclean leader election. The controller sets the state back to RECOVERED after receiving an ALTER_PARTITION request with version 0, or with version 1 and with the LeaderRecoveryState set to RECOVERED.

Both controllers preserve the leader recovery state even if the unclean leader goes offline and comes back online before an RECOVERED ALTER_PARTITION is sent.

The controllers reply with INVALID_REQUEST if the ALTER_PARTITION either:

    1. Attempts to increase the ISR while the partition is still RECOVERING
    2. Attempts to change the leader recovery state to RECOVERING from a RECOVERED state.

Topic Partition Leader
======================

The topic partition leader doesn't implement any log recovery in this change. The topic partition leader immediately marks the partition as RECOVERED and sends that state in the next ALTER_PARTITION request.

Reviewers: Jason Gustafson <jason@confluent.io>

未验证

52621613

3月 18, 2022
- KAFKA-13497: Add trace logging to RegexRouter (#11903) · 43bf4642
  由 Chris Egerton 创作于 2年前
  
  This patch adds runtime logging to the RegexRouter to show exactly which topics get routed where. Reviewers: David Jacot <djacot@confluent.io>
  未验证
  
  43bf4642
- MINOR: Fix `ConsumerConfig.ISOLATION_LEVEL_DOC` (#11915) · 03641e6a
  由 Jules Ivanic 创作于 2年前
  
  Reviewers: David Jacot <djacot@confluent.io>
  未验证
  
  03641e6a
- MINOR: Fix incorrect log for out-of-order KTable (#11905) · df963ee0
  由 Ludovic DEHON 创作于 2年前
  
  Reviewers: Luke Chen <showuon@gmail.com>
  未验证
  
  df963ee0
- KAFKA-13750; Client Compatability KafkaTest uses invalid idempotency configs (#11909) · 7afdb069
  由 Justine Olshan 创作于 2年前
  
  Reviewers: Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>
  未验证
  
  7afdb069
- KAFKA-13509; Support max timestamp in GetOffsetShell (KIP-815) (#11173) · 5cebe12a
  由 dengziming 创作于 2年前
  
  This patch implements KIP-815 as described here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-815%3A++Support+max-timestamp+in+GetOffsetShell. Reviewers: Luke Chen <showuon@gmail.com>, Justine Olshan <jolshan@confluent.io>, David Jacot <djacot@confluent.io>
  未验证
  
  5cebe12a
- MINOR: Replace EasyMock with Mockito in connect:file (#11471) · 3dacdc56
  由 dengziming 创作于 2年前
  
  Reviewers: Mickael Maison <mickael.maison@gmail.com>, Ismael Juma <ismael@juma.me.uk>
  未验证
  
  3dacdc56
3月 17, 2022

KAFKA-9847: add config to set default store type (KIP-591) (#11705) · fbe7fb94

由 Luke Chen 创作于 2年前

Reviewers: Hao Li <hli@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>

未验证

fbe7fb94

KAFKA-6718 / Add rack awareness configurations to StreamsConfig (#11837) · b68463c2

由 Levani Kokhreidze 创作于 2年前

This PR is part of KIP-708 and adds rack aware standby task assignment logic.

Rack aware standby task assignment won't be functional until all parts of this KIP gets merged.

Splitting PRs into three smaller PRs to make the review process easier to follow. Overall plan is the following:

Rack aware standby task assignment logic #10851
Protocol change, add clientTags to SubscriptionInfoData #10802
Add required configurations to StreamsConfig (public API change, at this point we should have full functionality)

This PR implements last point of the above mentioned plan.

Reviewers: Luke Chen <showuon@gmail.com>, Bruno Cadonna <cadonna@apache.org>

未验证

b68463c2

3月 16, 2022

Don't generate Uuid with a leading "-" (#11901) · 5c1dd493
由 David Arthur 创作于 2年前

未验证

5c1dd493
MINOR: Bump latest 3.0 version to 3.0.1 (#11885) · 1783fb14
由 Mickael Maison 创作于 2年前
```
Reviewers: Matthias J. Sax <mjsax@apache.org>
```
未验证

1783fb14
Polish Javadoc for EpochState (#11897) · 620f1d88
由 liym 创作于 2年前
```
Polish Javadoc for EpochState

Reviewers: Bill Bejeck <bbejeck@apache.org>
```
未验证

620f1d88

KAFKA-13549: Add repartition.purge.interval.ms (#11610) · 9e8ace08

由 Nick Telford 创作于 2年前

Implements KIP-811.

Add a new config `repartition.purge.interval.ms` that limits how often data is purged from repartition topics.

未验证

9e8ace08

MINOR: fix shouldWaitForMissingInputTopicsToBeCreated test (#11902) · f708dc58

由 Walker Carlson 创作于 2年前

This test was falling occasionally. It does appear to be a matter of the tests assuming perfecting deduplication/caching when asserting the test output records, ie a bug in the test not in the real code. Since we are not assuming that it is going to be perfect I changed the test to make sure the records we expect arrive, instead of only those arrive.

Reviewers: Guozhang Wang <wangguoz@gmail.com>

未验证

f708dc58

MINOR: refactor how ConfigurationControl checks for resource existence (#11835) · bda5c34b

由 Colin Patrick McCabe 创作于 2年前

ConfigurationControl methods should take a boolean indicating whether the resource is newly
created, rather than taking an existence checker object. The boolean is easier to understand. Also
add a unit test of existing checking failing (and succeeding).

Reviewers: Kirk True <kirk@mustardgrain.com>, José Armando García Sancio <jsancio@users.noreply.github.com>

未验证

bda5c34b

KAFKA-13727; Preserve txn markers after partial segment cleaning (#11891) · 76d287c9

由 Jason Gustafson 创作于 2年前

It is possible to clean a segment partially if the offset map is filled before reaching the end of the segment. The highest offset that is reached becomes the new dirty offset after the cleaning completes. The data above this offset is nevertheless copied over to the new partially cleaned segment. Hence we need to ensure that the transaction index reflects aborted transactions from both the cleaned and uncleaned portion of the segment. Prior to this patch, this was not the case. We only collected the aborted transactions from the cleaned portion, which means that the reconstructed index could be incomplete. This can cause the aborted data to become effectively committed. It can also cause the deletion of the abort marker before the corresponding data has been removed (i.e. the aborted transaction becomes hanging).

Reviewers: Jun Rao <junrao@gmail.com>

未验证

76d287c9

KAFKA-13721: asymetric join-winodws should not emit spurious left/outer join results (#11875) · 03411ca2
由 Matthias J. Sax 创作于 2年前
```
Reviewers:  Sergio Peña <sergio@confluent.io>, Guozhang Wang <guozhang@confluent.io>
```
未验证

03411ca2

3月 15, 2022

MINOR: set batch-size option into batch.size config in consoleProducer (#11855) · e8a762ee
由 wangyap 创作于 2年前
```
Reviewers: Luke Chen <showuon@gmail.com>
```
未验证

e8a762ee
MINOR: Improve producer Javadoc about send with acks = 0 (#11882) · 418b1221
由 Paolo Patierno 创作于 2年前
```
Reviewers: Mickael Maison <mickael.maison@gmail.com>
```
未验证

418b1221

MINOR: Disable those flaky tests (#11895) · cad4985a

由 Guozhang Wang 创作于 2年前

I collected a list of the most flaky tests observed lately, checked / created their corresponding tickets, and mark them as ignored for now. Many of these failures are:

0. Failing very frequently in the past (at least in my observations).
1. not investigated for some time.
2. have a PR for review (mostly thanks to @showuon !), but not reviewed for some time.

Since 0), these tests failures are hindering our development; and from 1/2) above, people are either too busy to look after them, or honestly the tests are not considered as providing values since otherwise people should care enough to panic and try to resolve. So I think it's reasonable to disable all these tests for now. If we later learned our lesson a hard way, it would motivate us to tackle flaky tests more diligently as well.

I'm only disabling those tests that have been failed for a while, and if for such time no one have been looking into them, I'm concerned that just gossiping around about those flakiness would not bring people's attention to them either. So my psychological motivation is that "if people do not care about those failed tests for weeks (which, is not a good thing! :P), let's teach ourselves the lesson a hard way when it indeed buries a bug that bites us, or not learn the lesson at all --- that indicates those tests are indeed not valuable". For tests that I only very recently saw I did not disable them.

Reviewers: John Roesler <vvcephei@apache.org>, Matthias J. Sax <mjsax@apache.org>, Luke Chen <showuon@gmail.com>, Randall Hauch <rhauch@gmail.com>

未验证

cad4985a

KAFKA-7077: Use default producer settings in Connect Worker (#11475) · 76cf7a57
由 Liam Clarke-Hutchinson 创作于 2年前
```
Reviewers: Luke Chen <showuon@gmail.com>
```
未验证

76cf7a57

KAFKA-13690: Fix flaky test in EosIntegrationTest (#11887) · b916cb40

由 Guozhang Wang 创作于 2年前

I found a couple of flakiness with the integration test.

IQv1 on stores failed although getting the store itself is covered with timeouts, since the InvalidStoreException is upon the query (store.all()). I changed to the util function with IQv2 whose timeout/retry covers the whole procedure. Example of such failure is: https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-11802/11/tests/

With ALOS we should not check that the output, as well as the state store content is exactly as of processed once, since it is possible that during processing we got spurious task-migrate exceptions and re-processed with duplicates. I actually cannot reproduce this error locally, but from the jenkins errors it seems possible indeed. Example of such failure is: https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-11433/4/tests/

Some minor cleanups.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>

未验证

b916cb40

KAKFA-13699: new ProcessorContext is missing methods (#11877) · b1f36360

由 Matthias J. Sax 创作于 2年前

We added `currentSystemTimeMs()` and `currentStreamTimeMs()` to the
`ProcessorContext` via KIP-622, but forgot to add both to the new
`api.ProcessorContext`.

Reviewers: Ricardo Brasil <anribrasil@gmail.com>, Guozhang Wang <guozhang@confluent.io>

未验证

b1f36360

MINOR: Adding kafka-storage.bat file (similar to kafka-storage.sh) for windows. (#11816) · eb6c5baf
由 GauthamM-official 创作于 2年前
```
Reviewers: Jun Rao <junrao@gmail.com>
```
未验证

eb6c5baf

3月 14, 2022
- KAFKA-13438: Replace EasyMock and PowerMock with Mockito in WorkerTest (#11817) · 7f284497
  由 Liam Clarke-Hutchinson 创作于 2年前
  
  Reviewers: Mickael Maison <mickael.maison@gmail.com>
  未验证
  
  7f284497
3月 12, 2022

KIP-825: Part 1, add new RocksDBTimeOrderedWindowStore (#11802) · 63ea5db9

由 Hao Li 创作于 2年前

Initial State store implementation for TimedWindow and SlidingWindow.

RocksDBTimeOrderedWindowStore.java contains one RocksDBTimeOrderedSegmentedBytesStore which contains index and base schema.

PrefixedWindowKeySchemas.java implements keyschema for time ordered base store and key ordered index store.

Reviewers: James Hughes, Guozhang Wang <wangguoz@gmail.com>

未验证

63ea5db9

MINOR: fix flaky... · 17988f47

由 Hao Li 创作于 2年前

MINOR: fix flaky EosIntegrationTest.shouldCommitCorrectOffsetIfInputTopicIsTransactional[at_least_once] (#11878)

In this test, we started Kafka Streams app and then write to input topic in transaction. It's possible when streams commit offset, transaction hasn't finished yet. So the streams committed offset could be less than the eventual endOffset.

This PR moves the logic of writing to input topic before starting streams app.

Reviewers: John Roesler <vvcephei@apache.org>

未验证

17988f47

3月 11, 2022

MINOR: unpin ducktape dependency to always use the newest version (py3 edition) (#11884) · 7e683852

由 Stanislav Vodetskyi 创作于 2年前

Ensures we always have the latest published ducktape version.
This way whenever we release a new one, we won't have to cherry pick a bunch of commits across a bunch of branches.

未验证

7e683852

KAFKA-6718: Update SubscriptionInfoData with clientTags (#10802) · 87eb0cf0
由 Levani Kokhreidze 创作于 2年前
```
adds ClientTags to SubscriptionInfoData

Reviewer: Luke Chen <showuon@gmail.com>, Bruno Cadonna <cadonna@apache.org>
```
未验证

87eb0cf0
MINOR: Fix comments in TransactionsTest (#11880) · f025a93c
由 xuexiaoyue 创作于 2年前
```
Reviewer: Luke Chen <showuon@gmail.com>
```
未验证

f025a93c

MINOR: jmh.sh swallows compile errors (#11870) · dc36dedd

由 Lucas Bradstreet 创作于 2年前

jmh.sh runs tasks in quiet mode which swallows compiler errors. This is a pain and I frequently have to edit the shell script to see the error.

Reviewers:  Ismael Juma <ismael@confluent.io>, Bill Bejeck <bbejeck@apache.org>

未验证

dc36dedd