Kafka - Assign messages to specific Consumer Groups

debugcn 投稿 Dev

Juferdinand

I have an small question about Kafka Group IDs, I can use this Annotaiton in Java:

    @KafkaListener(topics = "insert", groupId = "user")

There I can set an groupId which it wanna consume, but i does not consume just this group id maybe because of that I can't send to specific group id. How I can send just to an special groupid? For what I can use the GroupID or I need to set the Topic special for sending the Kafka Messages specific?

I tried already to find an answer online, but I find nothing, maybe I use google false haha

I hope all understand me, if not pls quest :)

Thx alot already!

aran

Welcome to Kafka! First of all: You can't send to a consumer group, you send to a Topic.

Too much text below. Be aware of possible drowsiness while trying to read the entire answer.
If you are still reading this, I assume you truly want to know how to direct messages to specific clients, or you really need to get some sleep ASAP. Maybe both. Do not drive afterwards.

Back to your question.

From that topic, multiple Consumer Groups can read. Every CG is independent from others, so each one will read the topic from start to end, by their own. Think of a CG as an union of endophobic consumers: they won't care about other groups, they won't ever talk to another group, they don't even know if others exist.

I can think of three different ways to achieve your goal, by using different methodologies and/or architectures. The only one using Consumer Groups is the first one, but the other two may also be helpful:

subscribe
assign
Multiple Topics

The first two ones are based on mechanisms to divide messages within a single topic. The third one should only be justified on certain cases. Let's get into these options.

1. Subscribe and Consumer Groups

You could create a new Topic, fill it with messages, and add some metadata in order to recognize who needs to process each message (to who that message is directed).

Messages stored in Kafka contain, among other fields , a KEY and a VALUE (the message itself).

So let's say you want only GROUP-A to process some specific messages. One simple solution could be including an identifier on the key, such as a suffix. One of your keys could look like: key#GA.

On the consumer side, you poll() the messages from that topic, and add a little extra conditional logic before processing it: you'll just read the key and check the suffix. If it corresponds with the specified consumer group, in this case, it contains GA, then the consumer from GROUP-A knows that it must process the message.

For example, your Topic stores messages of two different natures, and you want them to be directed to two groups: GROUP-A and GROUP-Z.

     key    value
 - [11#GA][MESSAGE]
 - [21#GZ][MESSAGE]
 - [33#GZ][MESSAGE]
 - [44#GA][MESSAGE]

Both consumer groups will poll those 4 messages, but only some of them will be processed by each group.

Group-A will discard the 2nd and 3rd messages. It will process the 1st and 4th.
Group-Z will discard the 1st and 4th messages. It will process the 2nd and 3rd.

This is basically what you are aiming, but using some extra logic and playing with Kafka's architecture. The messages with certain suffix will be "directed" to an specific consumer group, and ignored by the other ones.

2. Assign

The above solution is focused on consumer groups and Kafka's subscribe methodology. Another possible solution, instead of subscribing consumer groups, would be to use Kafka's assign method. No ConsumerGroup is involved here, so references to the previous groups will be quoted in order to avoid any confusion.

Assign allows you to directly specify the topic/partition from which your consumer must read.

In the producer side, you should partition your messages in order to divide them between the partitions within your topic, using your own logic. Some more deeper info about custom partitioners here (yeah the author from the link seems like a complete douche).

For example, let's say you have 5 different types of consumers. So you create a Topic with 5 partitions, one for each "group". Your producer's custom partitioner identifies the corresponding partition for each message, and the topic would present this structure after producing the messages from the previous example:

In order to direct the messages to their corresponding "groups" :

"Group-Z" is assigned the 5th partition.
"Group-A" is assigned the 1st partition.

The advantage of this solution is that less resources are wasted: each "group" just polls his own messages, and as every message is verified to be directed to the consumer which polled it, you avoid the discard/accept logic: less traffic on the wire, fewer objects in memory, fewer cpu work.

The disadvatange consists in a more complex Kafka producer mechanism, which involves a custom partitioner, that most surely should be constantly updated regarding changes on your data or topic structures. Moreover, this will also lead to update the defined assigments of your consumers as well, each time the producer side is altered.

Personal note:

Assign offers a better perfomance, but carries a high price: manual and constant control of producers, topics, partitions and consumers, hence being (possibly) more error-prone. I would call it the efficient solution.

Subscribe makes all the process much simpler, and possibly will involve fewer problems/error on the system, hence being more reliable. I would call it the effective solution.

Anyway, this is a totally subjective oppinion.

Not finished yet

3. . Multi-topic solution.

The previously proposed solutions assume that the messages share the same nature, hence will be produced in the same Topic.

In order to explain what I'm trying to say here, let's say a Topic is represented as a storage building.

<--laptops, tables, smartphones,...

The previous solutions assume that you store similar elements there, for example, electronic devices; Their end of life is similar, the storage method is similar regardless of the specific device type, the machinery you use is the same, etc. With this in mind, it's completely logical to store all those elements into the same warehouse, divided in different sections (into the same topic, divided in different partitions).

There is no real reason to build a new warehouse for each electronic-device family (one for tv, other for mobile phones,... unless you are wrapped in money). The previous solutions assume that your messages are different types of "electronic devices".

But time passes by and you are doing well, so decide to start a new business: fruits storage.

Fruits have fewer life (log.retention.ms anyone?), must be stored inside a range of temperature, and probably your device storing elements and techniques from the first warehouse will differ by a lot. Moreover, you fruit business could be closed on certain periods of the year, while electronic devices are received 365/24. Even if you open your device's warehouse daily, maybe the fruit storage is only working on mondays and tuesdays (and with luck is not temporaly closed because of the period).

As fruits and electronic devices need different types of storage management, you decide to build a new warehouse. Your new fruis topic.

<--bananas, kiwis, apples, chicozapotes,...

Creating a second topic is justified here, since each one could need different configuration values, and each one stores contents from very different natures. This leads to consumers with also very different processing logics.

So, is this a 3rd possible solution?

Well, it does make you forget about consumer groups, partitioning mechanisms, manual assignations, etc. You only have to decide which consumers subscribe to which Topic, and you're done: you effectively directed the messages to specific consumers.

But, if you build a warehouse and start storing computers, would you really build another warehouse to store the phones that just arrived? In real life, you'll have to pay for the construction of the second building, as well as pay two taxes, pay for the cleaning of two buildings, and so on.

laptops here-> <-tablets here

In kafka's world, this would be represented as extra work for the kafka cluster(twice replication petitions,zookeeper has a newborn with new ACLs and controllers, ...), extra time for the human assigned to this job, since now is responsible of the management of two topics: A worker spending time on something that could be avoided is synonym of €€€ lost by the company. Also, I am not aware if they already do this or ever will, but cloud providers are somehow fond to insert small taxes for certain operations, such as creatiing a topic (but this is just a possibility, and I may be wrong here).

To resume, this is not necessarilly a bad idea: it just needs a justified context. Use it if you are working with Bananas and Qualcomm chips.

If you are working with Laptops and Tablets, go for the consumer group and partition solutions previously shown.

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集2021-06-13

コメントを追加

サインイン

分類Dev

Getting Error "java.lang.NoSuchMethodError: org.apache.kafka.clients.consumer.KafkaConsumer.assign" when tring to consume using flink's Kafka Consumer

分類Dev

Related 関連記事

記事