Add CLASSPATH to Oozie workflow job

oikonomiyaki

I coded SparkSQL that accesses Hive tables, in Java, and packaged a jar file that can be run using spark-submit.

Now I want to run this jar as an Oozie workflow (and coordinator, if I make workflow to work). When I try to do that, the job fails and I get in Oozie job logs

java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf

What I did was to look for the jar in $HIVE_HOME/lib that contains that class, copy that jar in the lib path of my Oozie workflow root path and add this to workflow.xml in the Spark Action:

<spark-opts> --jars lib/*.jar</spark-opts>

But this leads to another java.lang.NoClassDefFoundError that points to another missing class, so I did the process again of looking for the jar and copying, run the job and the same thing goes all over. It looks like it needs the dependency to many jars in my Hive lib.

What I don't understand is when I use spark-submit in the shell using the jar, it runs OK, I can SELECT and INSERT into my Hive tables. It is only when I use Oozie that this occurs. It looks like that Spark can't see the Hive libraries anymore when contained in an Oozie workflow job. Can someone explain how this happens?

How do I add or reference the necessary classes / jars to the Oozie path?

I am using Cloudera Quickstart VM CDH 5.4.0, Spark 1.4.0, Oozie 4.1.0.

Samson Scharfrichter

Usually the "edge node" (the one you can connect to) has a lot of stuff pre-installed and referenced in the default CLASSPATH. But the Hadoop "worker nodes" are probably barebones, with just core Hadoop libraries pre-installed.

So you can wait a couple of years for Oozie to package properly Spark dependencies in a ShareLib, and use the "blablah.system.libpath" flag.

[EDIT] if base Spark functionality is OK but you fail on the Hive format interface, then specify a list of ShareLibs including "HCatalog" e.g.

action.sharelib.for.spark=spark,hcatalog

Or, you can find out which JARs and config files are actually used by Spark, upload them to HDFS, and reference them (all of them, one by one) in your Oozie Action under <file> so that they are downloaded at run time in the working dir of the YARN container.

[EDIT] Maybe the ShareLibs contain the JARs but not the config files; then all you have to upload/download is a list of valid config files (Hive, Spark, whatever)

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

How to add a classpath in a workflow of Oozie on Cloudera

From Dev

How to use a Oozie job property in a Oozie workflow EL function?

From Dev

Is it possible to use two "job.properties" file in a workflow oozie?

From Dev

Oozie coordinated workflow

From Dev

Oozie coordinated workflow

From Dev

OOZIE: properties defined in file referenced in global job-xml not visible in workflow.xml

From Dev

Oozie job.properties

From Dev

Oozie job.properties

From Dev

The oozie job is not submitting

From Dev

oozie distcp job execution

From Dev

MapReduce oozie workflow using Hue

From Dev

Getting current time in oozie workflow

From Dev

Oozie workflow.xml error

From Dev

Getting current time in oozie workflow

From Dev

Oozie Workflow with Pig, Hive and unix

From Dev

Case conversion from oozie workflow

From Dev

how to deploy and run oozie job?

From Dev

Oozie Java Action : Passing Hbase classpath

From Dev

Propagating an Oozie coordinator's run date into the workflow

From Dev

Hadoop Oozie Workflow not getting Coordinator properties

From Dev

Not able to run oozie workflow with java action

From Dev

launching a spark program using oozie workflow

From Dev

How to pass Hive set parameters in oozie workflow

From Dev

Error on running multiple Workflow in OOZIE-4.1.0

From Dev

Oozie error when trying to run a workflow in Hue

From Dev

how to use logical operators in OOZIE workflow

From Dev

Not able to run oozie workflow with java action

From Dev

Rest API for Oozie workflow created through HUE

From Dev

Using HBase table Snapshot in Oozie Workflow