Cannot set configuration when submitting Spark Streaming jobs to YARN with Client within Java code

Kai Yu

I am finding ways to submit my Spark Streaming jobs to YARN within Java code, and finally decided to use org.apache.spark.deploy.yarn.Client to submit. Everything looks fine, but now I find I cannot set configurations of either Spark or Spark Streaming, such as spark.dynamicAllocation.enabled.

I have tried various methods, like setting SparkConf which is used to create JavaSparkContext, setting SparkConf which is used to create ClientArguments and Client, and using System.setProperty(xxx), but none of them work. I intend to set the configuration dynamically, but even if I modify the spark-defaults.conf, nothing changed.

I have also tried other ways to submit jobs, like using SparkSubmit.main(xxx) and Runtime.getRuntime.exec("spark-submit", "xxx"), but beside this problem, they have even more problems, not seeming like recommended methods.

Could anyone tell me a workaround?

MaxNevermind

You can use SparkLauncher to run you Spark jobs on Yarn cluster from java code. For example I used it to run my spark jobs from my java web application, spark job jar was packaged into a web app jar.

If you use spark version 1.5 and lower it's going to look like this (see SparkLauncher package):

    Process sparkLauncherProcess = new SparkLauncher()
        .setSparkHome(SPARK_HOME)
        .setJavaHome(JAVA_HOME)
        .setAppResource(SPARK_JOB_JAR_PATH)
        .setMainClass(SPARK_JOB_MAIN_CLASS)
        .addAppArgs("arg1", "arg2")
        .setMaster("yarn-cluster")
        .setConf("spark.dynamicAllocation.enabled", "true")
        .launch();
     sparkLauncherProcess.waitFor();

If you use spark version 1.6 and higher it's going to look like this (see SparkLauncher package SparkAppHandle has some additional functionality):

    SparkAppHandle handle = new SparkLauncher()
        .setSparkHome(SPARK_HOME)
        .setJavaHome(JAVA_HOME)
        .setAppResource(SPARK_JOB_JAR_PATH)
        .setMainClass(SPARK_JOB_MAIN_CLASS)
        .addAppArgs("arg1", "arg2")
        .setMaster("yarn-cluster")
        .setConf("spark.dynamicAllocation.enabled", "true")
        .startApplication();

The only dependency you need is:

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-launcher_2.11</artifactId>
        <version>1.5.0</version>
        <scope>provided</scope>
    </dependency>

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

What is the correct way to start/stop spark streaming jobs in yarn?

From Dev

Can Spark streaming and Spark applications be run within the same YARN cluster?

From Dev

Spark streaming jobs fail when chained

From Dev

"Bad substitution" when submitting spark job to yarn-cluster

From Dev

Why ExceptionInInitializerError when submitting Spark application in YARN cluster mode?

From Java

Spark - Error "A master URL must be set in your configuration" when submitting an app

From Dev

Issue Submitting a Python Application to Yarn From Java Code

From Dev

Issue Submitting a Python Application to Yarn From Java Code

From Dev

Beam GroupByKey in Spark Streaming on Yarn

From Dev

Submitting HPC jobs within an HPC job

From Dev

Exception while submitting spark application in yarn mode

From Dev

Submitting Spark application on YARN from Eclipse IDE

From Dev

Running Spark jobs on a YARN cluster with additional files

From Dev

Spark Jobs on Yarn | Performance Tuning & Optimization

From Dev

How to fetch Spark Streaming job statistics using REST calls when running in yarn-cluster mode

From Dev

set applicationTags property in YARN for jobs submitted by CLI

From Dev

How jobs are assigned to executors in Spark Streaming?

From Dev

Oozie null pointer exception when submitting jobs

From Dev

Spark shell cannot connect to YARN

From Dev

Set LD_LIBRARY_PATH or java.library.path for YARN / Hadoop2 Jobs

From Dev

Spark Streaming with Hadoop configuration object

From Dev

"java.io.NotSerializableException: org.apache.spark.streaming.StreamingContext" When execute spark streaming

From Dev

spark streaming with mongodb in java

From Dev

Streaming Kmeans Spark JAVA

From Dev

Submitting jobs to Spark EC2 cluster remotely

From Dev

How to prevent Spark Executors from getting Lost when using YARN client mode?

From Dev

How to prevent Spark Executors from getting Lost when using YARN client mode?

From Dev

Add Yarn cluster configuration to Spark application

From Dev

What is yarn-client mode in Spark?

Related Related

  1. 1

    What is the correct way to start/stop spark streaming jobs in yarn?

  2. 2

    Can Spark streaming and Spark applications be run within the same YARN cluster?

  3. 3

    Spark streaming jobs fail when chained

  4. 4

    "Bad substitution" when submitting spark job to yarn-cluster

  5. 5

    Why ExceptionInInitializerError when submitting Spark application in YARN cluster mode?

  6. 6

    Spark - Error "A master URL must be set in your configuration" when submitting an app

  7. 7

    Issue Submitting a Python Application to Yarn From Java Code

  8. 8

    Issue Submitting a Python Application to Yarn From Java Code

  9. 9

    Beam GroupByKey in Spark Streaming on Yarn

  10. 10

    Submitting HPC jobs within an HPC job

  11. 11

    Exception while submitting spark application in yarn mode

  12. 12

    Submitting Spark application on YARN from Eclipse IDE

  13. 13

    Running Spark jobs on a YARN cluster with additional files

  14. 14

    Spark Jobs on Yarn | Performance Tuning & Optimization

  15. 15

    How to fetch Spark Streaming job statistics using REST calls when running in yarn-cluster mode

  16. 16

    set applicationTags property in YARN for jobs submitted by CLI

  17. 17

    How jobs are assigned to executors in Spark Streaming?

  18. 18

    Oozie null pointer exception when submitting jobs

  19. 19

    Spark shell cannot connect to YARN

  20. 20

    Set LD_LIBRARY_PATH or java.library.path for YARN / Hadoop2 Jobs

  21. 21

    Spark Streaming with Hadoop configuration object

  22. 22

    "java.io.NotSerializableException: org.apache.spark.streaming.StreamingContext" When execute spark streaming

  23. 23

    spark streaming with mongodb in java

  24. 24

    Streaming Kmeans Spark JAVA

  25. 25

    Submitting jobs to Spark EC2 cluster remotely

  26. 26

    How to prevent Spark Executors from getting Lost when using YARN client mode?

  27. 27

    How to prevent Spark Executors from getting Lost when using YARN client mode?

  28. 28

    Add Yarn cluster configuration to Spark application

  29. 29

    What is yarn-client mode in Spark?

HotTag

Archive