- 8月 23, 2022
-
-
由 Stanislav Kozlovski 创作于
The metadata event test asserts that there is a resource waiter event that's waiting on an unfenced cluster image. There was a race condition in between when the config update event was processed (the one that registers a resource waiter) and the assertion that asserts the resource waiter is there. The test inaccurately awaited just the metadata event completion, which itself enqueues the config update event but does not await its processing. This patch fixes it by adding an additional wait event after the metadata event, ensuring that when that completes the config update event has completed too.
-
由 Aishwarya Gune 创作于
* Remove KafkaSampleStore related code
-
由 Xiang Li 创作于
-
由 David Mao 创作于
The migration task runs pretty frequently (every 1s). It's not useful to log the message if no partition states are moved. In addition, since migration is infrequent and one way we can log the partitions that actually get migrated.
-
由 Lingnan Liu 创作于
- 8月 20, 2022
-
-
由 David Mao 创作于
this is measured from the follower perspective - so we want to close this metric when we no longer have any followers for a topic.
-
由 Lingnan Liu 创作于
-
由 Aman Singh 创作于
bind multiTenantSaslSecretsStore on controller server
-
由 chern 创作于
Thread leak can happen in testCreateAndDeleteAndRecreateLink because the test doesn't wait till the link is deleted from metadata store. More details in https://confluentinc.atlassian.net/browse/KGLOBAL-1812?focusedCommentId=1043674
-
由 Akhilesh C 创作于
Fix a bug in ReplicationControlManager where we got a NullPointerException when removing a topic with no offline replicas, and there were other topics that did have offline replicas. Fix an issue in MetadataDelta#replay where we were replaying RemoveTopicRecord twice. Reviewers: Colin P. McCabe <cmccabe@apache.org>, dengziming <dengziming1993@gmail.com>
-
由 kpatelatwork 创作于
* KGLOBAL-1777 retry finding a link coordinator instead of giving up immediately * KGLOBAL-1777 retry finding a link coordinator instead of giving up immediately * KGLOBAL-1777 retry finding a link coordinator instead of giving up immediately * KGLOBAL-1777 wait for link metadata to be created to ensure link coordinator will exists * removed redundant check as per review comments * increased brokerCount=2 else test would complain about not being able to replicate metadata topic to 2 times
-
由 Alok Thatikunta 创作于
This PR adds below feature flag - tier.topic.snapshots.enable - tier.topic.snapshots.interval.ms Also, defines the Cloud API for uploading TTP snapshots based on https://confluentinc.atlassian.net/wiki/spaces/KSTORAGE/pages/2549122180/Tier+Topic+Snapshots reviewer: @RamanVerma
-
- 8月 19, 2022
-
-
由 Sushant Mahajan 创作于
-
由 Stanislav Kozlovski 创作于
The test would sometimes fail with an UnknownPartitionException due to the async metadata propagation and the DescribeTopic hitting a node which didn't have its metadata updated yet. This patch fixes that by leveraging the KafkaTestUtils#describeTopic helper function which retries to guard against this
-
由 kpatelatwork 创作于
-
由 kpatelatwork 创作于
-
- 8月 18, 2022
-
-
由 Manikumar Reddy 创作于
-
由 David Mao 创作于
We currently convert authorizer actions once in the MultiTenantAuthorizer and again in ConfluentServerAuthorizer - this seems unnecessary.
-
由 David Arthur 创作于
-
由 Purshotam Chauhan 创作于
-
由 Vikas Singh 创作于
getClusterMissingNode0 wasn't initializing the testPartitions variable, thus passing null to CruiseControlMetricProcessor::process method which resulted in NPE. This change fixes the code to initialize the variable correctly. The test was not having issue when run as part of the whole suite as other tests methods were initializing the static variable. Ideally the variable should be non-static and get initialized by each method, but that is a bigger change. Test passes now when run individually.
-
由 andymg3 创作于
MINOR: Fix authentication check and do some small refactoring in ClusterLinkConnectionChecker (#7161)
-
由 Yang Yu 创作于
MINOR: enable kraft mode for TierCompactionEndToEndTest and TierTopicDeletionIntegrationTest (#7115) Enables KRaft mode for test cases in TierCompactionEndToEndTest and TierTopicDeletionIntegrationTest.
-
由 kpatelatwork 创作于
KGLOBAL-1979 log error code when updating linked leader epoch fails to aid in debugging production issues (#7167)
-
由 Jason Gustafson 创作于
When constructing the `ListenerInfo` object in `ControllerServer`, we should only check the endpoints of the controller listeners instead of all listeners on the node. For remote controllers, there are only the controller listeners, so the logic works fine. But when we run kraft in co-located mode, then the logic is doomed to fail because the controller's `SocketServer` only knows about controller listeners. Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, David Arthur <mumrah@gmail.com>
-
由 Michael Li 创作于
[RCCA-8435] Remove Explicit Checks in Number of Brokers and Partitions in Subset Partitioner (#7151) In [#inc-rcca-8435-dropoff-in-metrics-during-telemetry-cluster-roll](https://confluent.slack.com/archives/C03TFE3NGUF), we saw around 9% of metrics being dropped during the metrics cluster roll. We also noticed that a **single** broker was recalculating partitions to produce to up around 150 times during a roll. This means that all brokers in the fleet were constantly connecting/reconnecting which may leave a server side broker temporarily unavailable. The reconnecting logic occurs because the number of brokers change during a roll. We really shouldn't be considering the number of brokers changing as a topic topology change. For example, If we add a broker and no partitions elect it as the preferred leader, we shouldn’t be recalculating partitions to write to. If partitions do elect the new node as a preferred leader, we already capture this case. No changes in the unit tests also demonstrates that we don't need to make this check. We also remove the check for number of partitions as this case is also covered by the preferred partition leader check. Note: We're getting the 150 number from [this log message](https://prd.logs.aws.confluent.cloud/_dashboards/app/discover#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:'2022-08-15T07:00:00.000Z',to:now))&_a=(columns:!(_source),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'7ae0cc50-dcdc-11ea-b484-556ef92a2241',key:clusterId,negate:!f,params:(query:pkc-688z3),type:phrase),query:(match_phrase:(clusterId:pkc-688z3))),('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'7ae0cc50-dcdc-11ea-b484-556ef92a2241',key:mdc.brokerId,negate:!f,params:(query:'1'),type:phrase),query:(match_phrase:(mdc.brokerId:'1')))),index:'7ae0cc50-dcdc-11ea-b484-556ef92a2241',interval:auto,query:(language:kuery,query:'message:%20%22Kafka%20Producer%20producing%20to%20the%20following%20subset%20partitions:%20%7B_confluent-telemetry-metrics%22'),sort:!())). This number may be inflated due to a 2nd telemetry reporter running as discovered in [#inc-rcca-8423-drop-in-tr-records-sent-rate-after-3787-upgrade](https://confluent.slack.com/archives/C03TK0C72KY) Reviewers: Eric Sirianni <sirianni@confluent.io>
-
- 8月 17, 2022
-
-
由 David Mao 创作于
7d expiration is a pretty long time to expire inactive metrics and impacts our datadog/druid bill apparently. This PR reduces the expiration to 1h. Connection count sensors are exempted because we should technically only expire connection count metrics if the count is 0. We can fix this in a subsequent patch - possibly by using something like a CounterGaugeSuite to record all active connection counts that need metrics.
-
由 Rajini Sivaram 创作于
Reviewers: Sanjana Kaundinya <skaundinya@confluent.io>
-
由 k-raina 创作于
KSECURITY-155: Add Request_Id, Connection_Id to correlate with Authorization/Authentication/Request Audit Events (#6855)
-
由 Vikas Singh 创作于
As part of PR #6959 we added a fetch rate metric from the follower perspective so that we can see how much load is generated on the follower node because of replication. This PR adds that to the list of topic metrics that we use to aggregate and get corresponding broker metrics. Also did a minor refactoring to flip loops, making inner loop outer and outer loop inner as that was matching more to what the code was doing.
-
由 Vikas Singh 创作于
* KAFKALESS-830: Set replication bytes in/out for topics This change updates the replication bytes in/out at the topic level. For the replication bytes out, it makes it same as the bytes-in for the topic. For the replication bytes in, it goes over leader of all followers of a topic on a broker and sets it as sum of all leader bytes-in. A follow up PR will do similar change at broker level. Couple of name refactoring was made in the Cluster class that made this PR bigger than the small change it contains. I have also added new tests to make sure that replication bytes in/out calculation is done correctly both for cases where all metrics are present as well as when metrics are missing.
-
由 David 创作于
-
由 Akhilesh C 创作于
* KGLOBAL-1867: Fix the NPE exception in isAutoMirrorTopic() check when the link is failed When the link fails, we set the config to null. We're not handling this correctly in isAutoMirrorTopic() check which relies on config to understand if a mirror topic is auto-mirrored. This solution is to simply default the behavior of the check to false when the link is failed. This way we might let the customer delete some auto-mirror topics when the link is in the failed state. But when the link comes back, we'll mirror the topic again. Another approach is to not let customers delete a topic when the link is failed and we're not able to use the config to determine if the topic is auto-mirrored. The restriction is much bigger here and is not the behavior intended for a failed link.
-
由 Anastasia Vela 创作于
-
由 Rajini Sivaram 创作于
KGLOBAL-1952: Attempt to shutdown both clusters in CL tests even if one fails to avoid thread leak impacting other tests (#7144) * KGLOBAL-1952: Attempt to shutdown both clusters in CL tests even if one fails to avoid thread leak impacting other tests
-
- 8月 16, 2022
-
-
由 Josh H 创作于
Reviewers: @RamanVerma @alok123t
-
由 kpatelatwork 创作于
KAFKA-13785: [10/N][emit final] more unit test for session store and disable cache for emit final sliding window (#12370) (#7137) 1. Added more unit test for RocksDBTimeOrderedSessionStore and RocksDBTimeOrderedSessionSegmentedBytesStore 2. Disable cache for sliding window if emit strategy is ON_WINDOW_CLOSE Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com> Co-authored-by: Hao Li <1127478+lihaosky@users.noreply.github.com>
-
由 Xiang Li 创作于
* chore: reset mk-include * Squashed 'mk-include/' content from commit 6431413afe git-subtree-dir: mk-include git-subtree-split: 6431413afebffe17a65f76b40b4b16cd96dcb464 * chore: add mk-include-git-hash
-