Kafka Streams GlobalKTable sychronization to applications

debugcn 投稿 Dev

px5x2

With working normal k-streams, kafka stores the offsets of each application on its internal offset topics. On a application restart, applications reprocess the topics depending on auto.offset.reset policy. This is indeed explained here.

I am using kafka-stream's GlobalKTable to replicate the data over applications. However I'm a bit confused about applications' restarts since it's not populated on applications whose id (StreamsConfig.APPLICATION_ID_CONFIG) does not change after restart (due to a deployment or a crash). Whenever I start a new instance of streams application with new id, GlobalKTable is populated.

A GlobalKTable is nothing different but a topic with log-compaction feature enabled. javadoc of creating a StreamsBuilder#globalTable states:

streamsBuilder.globalTable("some-topic", Materialized.as("kglobaltable-store"))

Note that GlobalKTable always applies "auto.offset.reset" strategy "earliest" regardless of the specified value in StreamsConfig.

Hence I expect, regardless of the application id, my streams applications read the kglobaltable-store topic from the start and populate store locally like this github issue. It seems the topic the javadoc refers is some-topic instead of kglobaltable-store.

Is this the intended behaviour for GlobalKTable? And additionally is there a retention policy on topics which are backing GlobalKTables?

This behaviour also results in stale data on kglobaltable-store topic when we have a retention policy on some-topic. An example would be as follows:

At time t0, let;

some-topic: (1, a) -> (2, b) -> (1, c)

kglobaltable-store: [(1, c), (2, b)]

After some time (2, b) is subject to retention, I start my streams application (with a new id) and my GlobalKTable only stores the record (1, c) If this is the case.

EDIT: I am using InMemoryKeyValueStore.

Matthias J. Sax

Because you are using InMemoryKeyValueStore I assume that you are hitting this bug: https://issues.apache.org/jira/browse/KAFKA-6711

As a workaround, you can delete the local checkpoint file (cf GlobalKTable checkpoints) for the global store -- this will trigger the bootstrapping on restart. Or you switch back to default RocksDB store.

Btw: For if you read a topic directly as a table or global-table, Kafka Streams will not create an additional changelog topic for fault-tolerance, but use the original input topic for this purpose (this reduces storage requirements within the Kafka cluster). Thus, those input topics should have log compaction enabled.

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集2021-06-3

コメントを追加

サインイン

分類Dev

Related 関連記事

記事

Kafka Streams GlobalKTable sychronization to applications

Kafka Streams GlobalKTable sychronization to applications

Kafka-Streams-参加する前にGlobalKTableをフィルタリングする

Springクラウドストリームkafka-streamsを使用してGlobalKTableを作成するために利用できる例はありますか？

Kafka Streams：RocksDB TTL

Kafka Connect and Streams

Kafka Streams TimestampExtractor

Kafka Streams local state stores

Kafka streams aggregate throwing Exception

Kafka Streams: POJO serialization/deserialization

Kafka Connect vs Streams for Sinks

Kafka Streams - missing source topic

特定のフィールドでのKafka Stream-GlobalKTable結合

Kafka Streams API：KStreamからKTable

Kafka Streams API：KStreamからKTable

Kafka Streams API：KStreamからKTable

Kafka Streams：ConsumerRebalanceListenerの実装

Why is windowing now working for Kafka Streams?

NPE while deserializing avro messages in kafka streams

How to deploy Kafka Stream applications on Kubernetes?

Kafka Streamsアプリを停止する

Kafka Streamsタスク割り当て

Why Apache Kafka Streams uses RocksDB and if how is it possible to change it?

How to add a cooldown/rate-limit to a stream in Kafka Streams?

Kafka Streams：句読点とプロセス

Kafka Streams 1.1.0: Consumer Group Reprocessing Entire Log

Kafka Streams: How to ensure offset is committed after processing is completed

Kafka streams on spring, trouble with exactly once ACL: TransactionalIdAuthorizationException

Kafka Streams：RocksDbを動的に構成する

Kafka Streams transform（）状態ストア

Message pre-processing (topic - topic) - Kafka Connect API vs. Streams vs Kafka Consumer?