How to enable Kafka Producer Metrics in Spark?

debugcn 投稿 Dev

Martin Peng

We are using Kafka 0.10 with Spark 2.1 and I found our producer publish messages was always slow. I can only reach around 1k/s after give 8 cores to Spark executors while other post said they car reach millions/sec easily. I tried to tune the linger.ms and batch.size to find out. However I found linger.ms=0 looks like optimal for me and the batch.size doesn't take much effect. And I was sending 160k events per iteration. Looks like I have to enable the Kafka Producer Metrics to know what exactly happen. But looks like it is not very easy to enable it in Spark Executor.

Could any one share me some lights?

My codes are like this:

private def publishMessagesAttempt(producer: KafkaProducer[String, String], topic: String, messages: Iterable[(String, String)], producerMaxDelay: Long,
                                 individualMessageMaxDelay: Long, logger: (String, Boolean) => Unit = KafkaClusterUtils.DEFAULT_LOGGER): Iterable[(String, String)] = {
val futureMessages = messages.map(message => (message, producer.send(new ProducerRecord[String, String](topic, message._1, message._2))))
val messageSentTime = System.currentTimeMillis
val awaitedResults = futureMessages.map { case (message, future) =>
  val waitFor = Math.max(producerMaxDelay - (System.currentTimeMillis - messageSentTime), individualMessageMaxDelay)
  val failed = Try(future.get(waitFor, TimeUnit.MILLISECONDS)) match {
    case Success(_) => false
    case Failure(f) =>
      logger(s"Error happened when publish to Kafka: ${f.getStackTraceString}", true)
      true
  }
  (message, failed)
}
awaitedResults.filter(_._2).map(_._1)
}

Martin Peng

I finally find the answer. 1. KafkaProducer has a metrics() function which can get the metrics of the producer. Just simply print it should be enough.

Some codes like this should work:

public class MetricsProducerReporter implements Runnable {
private final Producer<String, StockPrice> producer;
private final Logger logger =
        LoggerFactory.getLogger(MetricsProducerReporter.class);

//Used to Filter just the metrics we want
private final Set<String> metricsNameFilter = Sets.set(
        "record-queue-time-avg", "record-send-rate", "records-per-request-avg",
        "request-size-max", "network-io-rate", "record-queue-time-avg",
        "incoming-byte-rate", "batch-size-avg", "response-rate", "requests-in-flight"
);

public MetricsProducerReporter(
        final Producer<String, StockPrice> producer) {
    this.producer = producer;
}

@Override
public void run() {
    while (true) {
        final Map<MetricName, ? extends Metric> metrics
                = producer.metrics();

        displayMetrics(metrics);
        try {
            Thread.sleep(3_000);
        } catch (InterruptedException e) {
            logger.warn("metrics interrupted");
            Thread.interrupted();
            break;
        }
    }
}

My codes are slow was because the scala map doesn't have the parallel enabled by default. I will have to use messages.par.map() to achieve the parallelism.

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集2021-06-3

コメントを追加

サインイン

分類Dev

How to fix "java.io.NotSerializableException: org.apache.kafka.clients.consumer.ConsumerRecord" in Spark Streaming Kafka Consumer?

Related 関連記事

記事