提交 · v0.3896.0-7.4.0-0-ce · Archie Kelly / kafka

8月 23, 2022

chore: minor version bump v0.3896.0-7.4.0-0-ce [ci skip] · 5e4533b8
由 ConfluentSemaphore 创作于 2年前

v0.3896.0-7.4.0-0-ce

5e4533b8

MINOR: Fix flaky metadata update event test (#7194) · 004cbf9a

The metadata event test asserts that there is a resource waiter event that's waiting on an unfenced cluster image. There was a race condition in between when the config update event was processed (the one that registers a resource waiter) and the assertion that asserts the resource waiter is there. The test inaccurately awaited just the metadata event completion, which itself enqueues the config update event but does not await its processing.

This patch fixes it by adding an additional wait event after the metadata event, ensuring that when that completes the config update event has completed too.

004cbf9a

chore: minor version bump v0.3895.0-7.4.0-0-ce [ci skip] · 3e66971f
由 ConfluentSemaphore 创作于 2年前

v0.3895.0-7.4.0-0-ce

3e66971f
KAFKALESS-1210: Delete Kafka sample store (#7163) · f7a3ca80
由 Aishwarya Gune 创作于 2年前
```
* Remove KafkaSampleStore related code
```
f7a3ca80
replace Jfrog with ECR (#7191) · 63817ec2
由 Xiang Li 创作于 2年前

63817ec2

MINOR: Only log partition state movements if non-empty (#7200) · 2da01e3b

由 David Mao 创作于 2年前

The migration task runs pretty frequently (every 1s). It's not useful to log the message if no partition states are moved.
In addition, since migration is infrequent and one way we can log the partitions that actually get migrated.

2da01e3b

Enable testSelfHealingWithIgnoredBrokersPresentWithReplicaPlacements (#7198) · b0ae4503
由 Lingnan Liu 创作于 2年前

b0ae4503

8月 20, 2022

MINOR: clean up follower fetch metrics (#7170) · 752e11f3

由 David Mao 创作于 2年前

this is measured from the follower perspective - so we want to close this metric when we no longer have any followers for a topic.

752e11f3

KAFKALESS-1254: Enable two tests in BrokerFailureDetectorTest (#7190) · e1f82b8e
由 Lingnan Liu 创作于 2年前

e1f82b8e
KMETA-247: Bind multiTenantSaslSecretsStore on Controller Server. (#7099) · 3b332771
由 Aman Singh 创作于 2年前
```
bind multiTenantSaslSecretsStore  on controller server
```
3b332771

KGLOBAL-1812: Fix thread leak in cluster linking test. (#7188) · 06feed6c

由 chern 创作于 2年前

Thread leak can happen in testCreateAndDeleteAndRecreateLink because
the test doesn't wait till the link is deleted from metadata store.
More details in
https://confluentinc.atlassian.net/browse/KGLOBAL-1812?focusedCommentId=1043674

06feed6c

KAFKA-14170: Fix NPE in the deleteTopics() code path of KRaft Controller (#12533) (#7192) · 3d7e5718

由 Akhilesh C 创作于 2年前

Fix a bug in ReplicationControlManager where we got a NullPointerException when removing a topic
with no offline replicas, and there were other topics that did have offline replicas.

Fix an issue in MetadataDelta#replay where we were replaying RemoveTopicRecord twice.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, dengziming <dengziming1993@gmail.com>

3d7e5718

KGLOBAL-1777 retry finding a link coordinator instead (#7176) · 4d6485ea

由 kpatelatwork 创作于 2年前

* KGLOBAL-1777 retry finding a link coordinator instead of giving up immediately

* KGLOBAL-1777 retry finding a link coordinator instead of giving up immediately

* KGLOBAL-1777 retry finding a link coordinator instead of giving up immediately

* KGLOBAL-1777 wait for link metadata to be created to ensure link coordinator will exists

* removed redundant check as per review comments

* increased brokerCount=2 else test would complain about not being able to replicate metadata topic to 2 times

4d6485ea

KDATA-360: Add feature flag and cloud API for TTP snapshots (#7027) · 64f8b085

由 Alok Thatikunta 创作于 2年前

This PR adds below feature flag
- tier.topic.snapshots.enable
- tier.topic.snapshots.interval.ms

Also, defines the Cloud API for uploading TTP snapshots based on https://confluentinc.atlassian.net/wiki/spaces/KSTORAGE/pages/2549122180/Tier+Topic+Snapshots

reviewer: @RamanVerma

64f8b085

8月 19, 2022
- KSECURITY-595: Enhanced AuthenticationException class to include reasonCode and errorInfo (#7166) · b0912c61
  由 Sushant Mahajan 创作于 2年前
  
  b0912c61
- MINOR: Fix flakiness in MultiTenatnKafkaTopicCreationIntegrationTest (#7134) · 695286f7
  由 Stanislav Kozlovski 创作于 2年前
  
  The test would sometimes fail with an UnknownPartitionException due to the async metadata propagation and the DescribeTopic hitting a node which didn't have its metadata updated yet. This patch fixes that by leveraging the KafkaTestUtils#describeTopic helper function which retries to guard against this
  695286f7
- KGLOBAL-1812 fixed review comments by Rajini and also found another NPE during local run (#7186) · 20cb483c
  由 kpatelatwork 创作于 2年前
  
  20cb483c
- KGLOBAL-1812 give chance for every resource in harness to close (#7184) · 62c2ef95
  由 kpatelatwork 创作于 2年前
  
  62c2ef95
8月 18, 2022

MINOR: Fix failing delegation token system test (#7183) · 7bf33c04
由 Manikumar Reddy 创作于 2年前

7bf33c04

MINOR: Convert authorizer actions once (#7129) · 2572fe7c

由 David Mao 创作于 2年前

We currently convert authorizer actions once in the MultiTenantAuthorizer and again in ConfluentServerAuthorizer - this seems unnecessary.

2572fe7c

Add the "shell" node back to kafka-metadata-shell (#7123) · 11fda100
由 David Arthur 创作于 2年前

11fda100
KSECURITY-577: Update log level in RestClient error->debug (#7110) · 3743de3b
由 Purshotam Chauhan 创作于 2年前

3743de3b

MINOR: Fix testDiskCapacityUpdatedMissingNode initialization issue (#7171) · d0b62278

由 Vikas Singh 创作于 2年前

getClusterMissingNode0 wasn't initializing the testPartitions variable,
thus passing null to CruiseControlMetricProcessor::process method which
resulted in NPE.

This change fixes the code to initialize the variable correctly. The
test was not having issue when run as part of the whole suite as other
tests methods were initializing the static variable. Ideally the
variable should be non-static and get initialized by each method, but
that is a bigger change.

Test passes now when run individually.

d0b62278

MINOR: Fix authentication check and do some small refactoring in... · 13dd1c03
由 andymg3 创作于 2年前
```
MINOR: Fix authentication check and do some small refactoring in ClusterLinkConnectionChecker (#7161)
```
13dd1c03

MINOR: enable kraft mode for TierCompactionEndToEndTest and... · ef36cd60

由 Yang Yu 创作于 2年前

MINOR: enable kraft mode for TierCompactionEndToEndTest and TierTopicDeletionIntegrationTest (#7115)

Enables KRaft mode for test cases in TierCompactionEndToEndTest and TierTopicDeletionIntegrationTest.

ef36cd60

KGLOBAL-1979 log error code when updating linked leader epoch fails to aid in... · 79f68041
由 kpatelatwork 创作于 2年前
```
KGLOBAL-1979 log error code when updating linked leader epoch fails to aid in debugging production issues (#7167)
```
79f68041

MINOR: ListenerInfo.forController should only look at controller listeners (#7084) · 57849362

由 Jason Gustafson 创作于 2年前

When constructing the `ListenerInfo` object in `ControllerServer`, we should only check the endpoints of the controller listeners instead of all listeners on the node. For remote controllers, there are only the controller listeners, so the logic works fine. But when we run kraft in co-located mode, then the logic is doomed to fail because the controller's `SocketServer` only knows about controller listeners.

Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, David Arthur <mumrah@gmail.com>

57849362

[RCCA-8435] Remove Explicit Checks in Number of Brokers and Partitions in... · 619e2f59

由 Michael Li 创作于 2年前

[RCCA-8435] Remove Explicit Checks in Number of Brokers and Partitions in Subset Partitioner (#7151)

In [#inc-rcca-8435-dropoff-in-metrics-during-telemetry-cluster-roll](https://confluent.slack.com/archives/C03TFE3NGUF), we saw around 9% of metrics being dropped
during the metrics cluster roll. We also noticed that a **single** broker was recalculating partitions to
produce to up around 150 times during a roll. This means that all brokers in the fleet were constantly connecting/reconnecting which may leave a server side broker temporarily unavailable.

The reconnecting logic occurs because the number of brokers change during a roll. We really
shouldn't be considering the number of brokers changing as a topic topology change. For example,
If we add a broker and no partitions elect it as the preferred leader, we shouldn’t be recalculating
partitions to write to. If partitions do elect the new node as a preferred leader, we already capture
this case.

No changes in the unit tests also demonstrates that we don't need to make this check. We also remove
the check for number of partitions as this case is also covered by the preferred partition leader check.

Note: We're getting the 150 number from [this log message](https://prd.logs.aws.confluent.cloud/_dashboards/app/discover#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:'2022-08-15T07:00:00.000Z',to:now))&_a=(columns:!(_source),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'7ae0cc50-dcdc-11ea-b484-556ef92a2241',key:clusterId,negate:!f,params:(query:pkc-688z3),type:phrase),query:(match_phrase:(clusterId:pkc-688z3))),('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'7ae0cc50-dcdc-11ea-b484-556ef92a2241',key:mdc.brokerId,negate:!f,params:(query:'1'),type:phrase),query:(match_phrase:(mdc.brokerId:'1')))),index:'7ae0cc50-dcdc-11ea-b484-556ef92a2241',interval:auto,query:(language:kuery,query:'message:%20%22Kafka%20Producer%20producing%20to%20the%20following%20subset%20partitions:%20%7B_confluent-telemetry-metrics%22'),sort:!())). This number may be inflated due to a 2nd telemetry reporter running as discovered in [#inc-rcca-8423-drop-in-tr-records-sent-rate-after-3787-upgrade](https://confluent.slack.com/archives/C03TK0C72KY)

Reviewers: Eric Sirianni <sirianni@confluent.io>

619e2f59

8月 17, 2022

KCFUN-166: Reduce tenant sensor expiration (#7130) · 49561479

由 David Mao 创作于 2年前

7d expiration is a pretty long time to expire inactive metrics and impacts our datadog/druid bill apparently. This PR reduces the expiration to 1h. 

Connection count sensors are exempted because we should technically only expire connection count metrics if the count is 0. We can fix this in a subsequent patch - possibly by using something like a CounterGaugeSuite to record all active connection counts that need metrics.

49561479

KGLOBAL-1958: Fix regression in updating corrupted cluster link configs (#7165) · c7ba2b7d
由 Rajini Sivaram 创作于 2年前
```
Reviewers: Sanjana Kaundinya <skaundinya@confluent.io>
```
c7ba2b7d

KSECURITY-155: Add Request_Id, Connection_Id to correlate with... · 39690a51

由 k-raina 创作于 2年前

KSECURITY-155: Add Request_Id, Connection_Id to correlate with Authorization/Authentication/Request Audit Events (#6855)

39690a51

KAFKALESS-830: Add follower fetch rate at broker level (#7031) · 04cbdfda

由 Vikas Singh 创作于 2年前

As part of PR #6959 we added a fetch rate metric from the follower
perspective so that we can see how much load is generated on the
follower node because of replication. This PR adds that to the list of
topic metrics that we use to aggregate and get corresponding broker
metrics.

Also did a minor refactoring to flip loops, making inner loop outer and
outer loop inner as that was matching more to what the code was doing.

04cbdfda

KAFKALESS-830: Set replication bytes in/out for topics (#7023) · 9317be02

由 Vikas Singh 创作于 2年前

* KAFKALESS-830: Set replication bytes in/out for topics

This change updates the replication bytes in/out at the topic level. For
the replication bytes out, it makes it same as the bytes-in for the
topic. For the replication bytes in, it goes over leader of all
followers of a topic on a broker and sets it as sum of all leader
bytes-in.

A follow up PR will do similar change at broker level.

Couple of name refactoring was made in the Cluster class that made this
PR bigger than the small change it contains. I have also added new tests
to make sure that replication bytes in/out calculation is done correctly
both for cases where all metrics are present as well as when metrics are
missing.

9317be02

KGLOBAL-1750: Add cluster linking connection checker (#7106) · a50fbcbc
由 David 创作于 2年前

a50fbcbc

KGLOBAL-1867: Fix the NPE exception in isAutoMirrorTopic() check when the link is failed (#7139) · 3b749f46

由 Akhilesh C 创作于 2年前

* KGLOBAL-1867: Fix the NPE exception in isAutoMirrorTopic() check when the link is failed

When the link fails, we set the config to null. We're not handling this
correctly in isAutoMirrorTopic() check which relies on config to understand if
a mirror topic is auto-mirrored. This solution is to simply default the
behavior of the check to false when the link is failed. This way we might let
the customer delete some auto-mirror topics when the link is in the failed state. But
when the link comes back, we'll mirror the topic again. Another approach is to
not let customers delete a topic when the link is failed and we're not able to use
the config to determine if the topic is auto-mirrored. The restriction is much
bigger here and is not the behavior intended for a failed link.

3b749f46

KSECUTIRY-582: Upgrade com.squareup.okhttp3.okhttp to 4.9.3 (#7116) · 3f47c452
由 Anastasia Vela 创作于 2年前

3f47c452

KGLOBAL-1952: Attempt to shutdown both clusters in CL tests even if one fails... · c46e6200

由 Rajini Sivaram 创作于 2年前

KGLOBAL-1952: Attempt to shutdown both clusters in CL tests even if one fails to avoid thread leak impacting other tests (#7144)

* KGLOBAL-1952: Attempt to shutdown both clusters in CL tests even if one fails to avoid thread leak impacting other tests

c46e6200

8月 16, 2022

[KDATA-509] Tier Topic Partition Snapshot (TTPS): FlatBuffers and serialization wrapper (#6979) · b52f5f9f
由 Josh H 创作于 2年前
```
Reviewers: @RamanVerma @alok123t 
```
b52f5f9f

KAFKA-13785: [10/N][emit final] more unit test for session store and disable... · 6156ceb0

由 kpatelatwork 创作于 2年前

KAFKA-13785: [10/N][emit final] more unit test for session store and disable cache for emit final sliding window (#12370) (#7137)

1. Added more unit test for RocksDBTimeOrderedSessionStore and RocksDBTimeOrderedSessionSegmentedBytesStore
2. Disable cache for sliding window if emit strategy is ON_WINDOW_CLOSE

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>

Co-authored-by: Hao Li <1127478+lihaosky@users.noreply.github.com>

6156ceb0

chore: update mk-include (#7122) · 3c86f4a2

由 Xiang Li 创作于 2年前

* chore: reset mk-include

* Squashed 'mk-include/' content from commit 6431413afe

git-subtree-dir: mk-include
git-subtree-split: 6431413afebffe17a65f76b40b4b16cd96dcb464

* chore: add mk-include-git-hash

3c86f4a2