How to schedule an Oozie workflow that runs a Java program on a HDInsight cluster

Riddhi Rathod

I am trying to run a set of steps in an oozie workflow. One of the steps involves running a java program that reads the arguments from job.properties.template file. How do I schedule this on a Azure HDInsight cluster (I already have a cluster running).

Also, is there any way to get on to head node of the HDInsight cluster like the way we ssh into master node of an EMR cluster. I read about RDP (Remote Desktop Protocol) somewhere. It will be useful if someone could give few more pointers related to this.

Suresh Ram

For executing java program in HDinsight remote desktop please try this.

  1. add your jar in lib folder and add your properties,xml files and then move it to your blob storage.

Example :

WorkfLow.xml

<workflow-app name="WorkflowJavaMainAction" xmlns="uri:oozie:workflow:0.2">

<start to="javaMainAction"/>

<action name="javaMainAction">

<java>

<job-tracker>jobtrackerhost:9010</job-tracker>

<name-node>wasb://[email protected]</name-node>


<configuration>

<property>

<name>mapred.job.queue.name</name>

<value>default</value>

</property>

</configuration>

<main-class>packagename.classname</main-class>

</java>

<ok to="end"/>

<error to="killJobAction"/>

</action>

<kill name="killJobAction">

<message>"Killed job due to error: ${wf:errorMessage(wf:lastErrorNode())}"</message>

</kill>

<end name="end" />

</workflow-app>

Coordiantor.xml :

<coordinator-app end="${endTime}" frequency="${frequency}" name="sample_update" start="${startTime}" timezone="${timezone}" xmlns="uri:oozie:coordinator:0.2">

<controls>

        <timeout>5</timeout>

        <concurrency>1</concurrency>

</controls>

<action>

<workflow>

<app-path>wasb://[email protected]/user/hdp/ooziejava/workflow.xml</app-path>

</workflow>

</action>

</coordinator-app>

Job.properites

oozie.use.system.libpath=true

oozie.coord.application.path=wasb://[email protected]/user/hdp/
ooziejava/coordinator.xml

startTime=2014-11-16T07:30Z

endTime=2014-11-23T04:50Z

frequency=15

timezone=GMT+0530

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集
0

コメントを追加

0

関連記事

分類Dev

HDInsightのOozie:OozieServerStatus

分類Dev

How to make the Activiti workflow in Activiti Explorer call the outer Java program

分類Dev

HDInsight cluster - attach storage

分類Dev

How to find the IP addresses of different nodes on a HDInsight cluster

分類Dev

Opening a port on HDInsight cluster on Azure

分類Dev

Logic going awry in a program that runs hangman (Java)

分類Dev

Find out what program runs in the java.exe process

分類Dev

Schedule Cluster resizing on Google Kubernetes Engine

分類Dev

How to schedule a task with airflow

分類Dev

how to parallelize a java program with Stream Java 8

分類Dev

Linux run every program on cluster

分類Dev

A java App which runs in intelliJ But not runs in cmd

分類Dev

HDInsight Emulator takes up loads of memory; how to disable?

分類Dev

Why this C++ program complies and runs in CodeBlocks

分類Dev

Create different runs of a program and use its output

分類Dev

How to run ant command through java program?

分類Dev

How to start java program with hidden terminal

分類Dev

how to use clone method in java program

分類Dev

How to use Naive Bayes of Weka in a Java program

分類Dev

how to execute java program under certain condition

分類Dev

Alfresco : How to filter documents in workflow

分類Dev

How to retrieve a content of Alfresco workflow?

分類Dev

Is it possible to schedule queries in Google BigQuery using standard SQL such that date range increments everyday while it runs?

分類Dev

How to use Java class for GPIO control to make an application that runs on Android OS

分類Dev

Which design pattern to use on Java workflow

分類Dev

mpirun without options runs a program on one process only

分類Dev

For a small dictionary list program runs, but for a large list it gives an error

分類Dev

How to check whether the file exist in HDFS location, using oozie?

分類Dev

How to continue loop based on user input in java program

Related 関連記事

  1. 1

    HDInsightのOozie:OozieServerStatus

  2. 2

    How to make the Activiti workflow in Activiti Explorer call the outer Java program

  3. 3

    HDInsight cluster - attach storage

  4. 4

    How to find the IP addresses of different nodes on a HDInsight cluster

  5. 5

    Opening a port on HDInsight cluster on Azure

  6. 6

    Logic going awry in a program that runs hangman (Java)

  7. 7

    Find out what program runs in the java.exe process

  8. 8

    Schedule Cluster resizing on Google Kubernetes Engine

  9. 9

    How to schedule a task with airflow

  10. 10

    how to parallelize a java program with Stream Java 8

  11. 11

    Linux run every program on cluster

  12. 12

    A java App which runs in intelliJ But not runs in cmd

  13. 13

    HDInsight Emulator takes up loads of memory; how to disable?

  14. 14

    Why this C++ program complies and runs in CodeBlocks

  15. 15

    Create different runs of a program and use its output

  16. 16

    How to run ant command through java program?

  17. 17

    How to start java program with hidden terminal

  18. 18

    how to use clone method in java program

  19. 19

    How to use Naive Bayes of Weka in a Java program

  20. 20

    how to execute java program under certain condition

  21. 21

    Alfresco : How to filter documents in workflow

  22. 22

    How to retrieve a content of Alfresco workflow?

  23. 23

    Is it possible to schedule queries in Google BigQuery using standard SQL such that date range increments everyday while it runs?

  24. 24

    How to use Java class for GPIO control to make an application that runs on Android OS

  25. 25

    Which design pattern to use on Java workflow

  26. 26

    mpirun without options runs a program on one process only

  27. 27

    For a small dictionary list program runs, but for a large list it gives an error

  28. 28

    How to check whether the file exist in HDFS location, using oozie?

  29. 29

    How to continue loop based on user input in java program

ホットタグ

アーカイブ