Skip to content
代码片段 群组 项目
未验证 提交 619e2f59 编辑于 作者: Michael Li's avatar Michael Li 提交者: GitHub
浏览文件

[RCCA-8435] Remove Explicit Checks in Number of Brokers and Partitions in...

[RCCA-8435] Remove Explicit Checks in Number of Brokers and Partitions in Subset Partitioner (#7151)

In [#inc-rcca-8435-dropoff-in-metrics-during-telemetry-cluster-roll](https://confluent.slack.com/archives/C03TFE3NGUF), we saw around 9% of metrics being dropped
during the metrics cluster roll. We also noticed that a **single** broker was recalculating partitions to
produce to up around 150 times during a roll. This means that all brokers in the fleet were constantly connecting/reconnecting which may leave a server side broker temporarily unavailable.

The reconnecting logic occurs because the number of brokers change during a roll. We really
shouldn't be considering the number of brokers changing as a topic topology change. For example,
If we add a broker and no partitions elect it as the preferred leader, we shouldn’t be recalculating
partitions to write to. If partitions do elect the new node as a preferred leader, we already capture
this case.

No changes in the unit tests also demonstrates that we don't need to make this check. We also remove
the check for number of partitions as this case is also covered by the preferred partition leader check.

Note: We're getting the 150 number from [this log message](https://prd.logs.aws.confluent.cloud/_dashboards/app/discover#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:'2022-08-15T07:00:00.000Z',to:now))&_a=(columns:!(_source),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'7ae0cc50-dcdc-11ea-b484-556ef92a2241',key:clusterId,negate:!f,params:(query:pkc-688z3),type:phrase),query:(match_phrase:(clusterId:pkc-688z3))),('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'7ae0cc50-dcdc-11ea-b484-556ef92a2241',key:mdc.brokerId,negate:!f,params:(query:'1'),type:phrase),query:(match_phrase:(mdc.brokerId:'1')))),index:'7ae0cc50-dcdc-11ea-b484-556ef92a2241',interval:auto,query:(language:kuery,query:'message:%20%22Kafka%20Producer%20producing%20to%20the%20following%20subset%20partitions:%20%7B_confluent-telemetry-metrics%22'),sort:!())). This number may be inflated due to a 2nd telemetry reporter running as discovered in [#inc-rcca-8423-drop-in-tr-records-sent-rate-after-3787-upgrade](https://confluent.slack.com/archives/C03TK0C72KY)

Reviewers: Eric Sirianni <sirianni@confluent.io>
上级 49561479
No related branches found
No related tags found
加载中
加载中
0% 加载中 .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册