Kafka consumer groups suddenly stopped balancing messages among instances

Lisandro

We have a microservice architecture communicated by Kafka on Confluent where each service is set in its own consumer group in order to balance message delivery between the multiple instances.

For example:

SERVICE_A_INSTANCE_1 (CONSUMER_GROUP_A)
SERVICE_A_INSTANCE_2 (CONSUMER_GROUP_A)
SERVICE_A_INSTANCE_3 (CONSUMER_GROUP_A)

SERVICE_B_INSTANCE_1 (CONSUMER_GROUP_B)
SERVICE_B_INSTANCE_2 (CONSUMER_GROUP_B)

When a message is emitted it should only be consumed by one instance of each consumer group.

This worked fine until two days ago. All of the sudden, each message is being delivered to all the instances, so each message is processed multiple times. Basically, the consumer-group stopped working and messages are not being distributed.

Important points:

  • We use Kafka paas in Confluent on GCP.
  • We tested this in a different environment and everything worked as expected
  • No changes have been made on our consumers
  • No changes have been made on our part to the cluster (we cant know if Confluent changed something)

We suspect it might be a problem on Confluent or an update that is not compatible with our current configuration. Kafka 2.2.0 was recently released and it has some changes to consumer groups behavior.

We are currently working on migrating to AWS MSK to see if the issue prevails.

Any ideas on what could be causing this?

Lisandro

TL;DR: We solved the issue by moving away from Confluent into our own Kafka cluster on GCP.

I will answer my own question since its been a while and we have already solved this. Also, my insights might help others make more informed decisions on where to deploy their Kafka infrastructure.

Unfortunately we could not get to the bottom of the problem with Confluent. It is most likely something on their side because we simply migrated to our own self managed instances on GCP and everything went back to normal.

Some important clarifications before my final thoughts and warnings about using Confluent as a managed Kafka service:

  • We think this is related to something that affected Node.js in particular. We tested external libraries in languages other than Node and the behavior was as expected. When testing on multiple of the most popular Node libraries the problem persisted.
  • We did not have premium support with Confluent.
  • I cannot confirm that this issue is not our fault.

With all of those points in mind, our conclusion is that for companies that decide on using a managed service with Confluent, its best to calculate costs with premium support included. Otherwise, Kafka turns into a completely closed blackbox, making it impossible to diagnose issues. In my personal opinion, the dependency on the Confluent team during a problem is so large that not having them ready to help when needed renders the service non-production ready.

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集
0

コメントを追加

0

関連記事

分類Dev

Kafka consumer groups suddenly stopped balancing messages among instances

分類Dev

Kafka - Assign messages to specific Consumer Groups

分類Dev

Incorrect Kafka offset across consumer groups

分類Dev

How to have multiple kafka consumer groups in application properties

分類Dev

Retrieve last n messages of Kafka consumer from a particular topic

分類Dev

Why does kafka-console-consumer timeout on a small number of messages?

分類Dev

Spring-Kafka consumer doesn't receive messages

分類Dev

Database suddenly stopped in middle of transaction

分類Dev

Suddenly JavaScript/jQuery stopped working

分類Dev

Assetic watch suddenly stopped working

分類Dev

kafka-consumer-groupsコマンドの問題

分類Dev

Kafka Consumer with JAVA

分類Dev

Spring boot Kafka consumer

分類Dev

13.10 Onscreen Keyboard suddenly stopped working

分類Dev

Facebook javascript sdk suddenly stopped working

分類Dev

Solid State Drive suddenly stopped working

分類Dev

xampp apache 2.4.9 suddenly stopped working

分類Dev

intellij idea suddenly stopped noticing changes

分類Dev

Dead keys suddenly stopped working in GTK applications

分類Dev

MongoDB suddenly stopped working today (12.04)

分類Dev

kafka kafka-consumer-groups.sh --describeは、コンシューマーグループの出力を返しません

分類Dev

Kafka consumer manual commit offset

分類Dev

Kafka Stream: Consumer commit frequency

分類Dev

Partition specific flink kafka consumer

分類Dev

Kafka elixir consumer keeps crashing

分類Dev

kafka consumer code is not running completely

分類Dev

Kafka consumer hangs on poll when kafka is down

分類Dev

Kafka data types of messages

分類Dev

SMBFS Mount suddenly stopped working. CIFS error

Related 関連記事

  1. 1

    Kafka consumer groups suddenly stopped balancing messages among instances

  2. 2

    Kafka - Assign messages to specific Consumer Groups

  3. 3

    Incorrect Kafka offset across consumer groups

  4. 4

    How to have multiple kafka consumer groups in application properties

  5. 5

    Retrieve last n messages of Kafka consumer from a particular topic

  6. 6

    Why does kafka-console-consumer timeout on a small number of messages?

  7. 7

    Spring-Kafka consumer doesn't receive messages

  8. 8

    Database suddenly stopped in middle of transaction

  9. 9

    Suddenly JavaScript/jQuery stopped working

  10. 10

    Assetic watch suddenly stopped working

  11. 11

    kafka-consumer-groupsコマンドの問題

  12. 12

    Kafka Consumer with JAVA

  13. 13

    Spring boot Kafka consumer

  14. 14

    13.10 Onscreen Keyboard suddenly stopped working

  15. 15

    Facebook javascript sdk suddenly stopped working

  16. 16

    Solid State Drive suddenly stopped working

  17. 17

    xampp apache 2.4.9 suddenly stopped working

  18. 18

    intellij idea suddenly stopped noticing changes

  19. 19

    Dead keys suddenly stopped working in GTK applications

  20. 20

    MongoDB suddenly stopped working today (12.04)

  21. 21

    kafka kafka-consumer-groups.sh --describeは、コンシューマーグループの出力を返しません

  22. 22

    Kafka consumer manual commit offset

  23. 23

    Kafka Stream: Consumer commit frequency

  24. 24

    Partition specific flink kafka consumer

  25. 25

    Kafka elixir consumer keeps crashing

  26. 26

    kafka consumer code is not running completely

  27. 27

    Kafka consumer hangs on poll when kafka is down

  28. 28

    Kafka data types of messages

  29. 29

    SMBFS Mount suddenly stopped working. CIFS error

ホットタグ

アーカイブ