无法在HDP 2.0上运行Spark 1.0 SparkPi

debugcn 发表于 Dev

康斯坦丁·库德里亚夫采夫（Konstantin Kudryavtsev）

我在HDP 2.0上运行Spark PI示例时遇到了问题

我从http://spark.apache.org/downloads.html（对于HDP2）下载了spark 1.0的预构建版本。

 ./bin/spark-submit --class org.apache.spark.examples.SparkPi     --master yarn-cluster --num-executors 3 --driver-memory 2g --executor-memory 2g --executor-cores 1 ./lib/spark-examples-1.0.0-hadoop2.2.0.jar 2

我收到错误消息：

应用程序application_1404470405736_0044失败3次，原因是appattempt_1404470405736_0044_000003的AM容器退出，退出代码为1：由于：容器启动异常：org.apache.hadoop.util.Shell $ ExitCodeException：在org.apache.hadoop.util.Shell.runCommand（ Shell.java:464），位于org.apache.hadoop.util.Shell.run（Shell.java:379），位于org.apache.hadoop.util.Shell $ ShellCommandExecutor.execute（Shell.java:589），位于org.apache org.apache上的.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer（DefaultContainerExecutor.java:195）在org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call（ContainerLaunch.java:283）在org.ap. java.util.concurrent.FutureTask.run（FutureTask.java:262）上的.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call（ContainerLaunch.java:79）在java.util.concurrent上。尝试失败..尝试在java.lang.Thread.run（Thread.java:744）上找到ThreadPoolExecutor.runWorker（ThreadPoolExecutor.java:1145）在java.util.concurrent.ThreadPoolExecutor $ Worker.run（ThreadPoolExecutor.java:615）上。应用程序失败。

未知/不受支持的参数列表（--executor-memory，2048，--executor-cores，1，--num-executors，3）用法：org.apache.spark.deploy.yarn.ApplicationMaster [options]选项：
- jar JAR_PATH应用程序的JAR文件的路径（必需）--class CLASS_NAME应用程序的主类的名称（必需）... bla-bla-bla

有任何想法吗？我如何使它起作用？

帕维尔·梅岑采夫（Pavel Mezentsev）

我有同样的问题。原因是hdfs中spark-assembly.jar的版本不同于您当前的spark版本。

例如hdfs版本中的org.apache.spark.deploy.yarn.Client的params列表：

  $ hadoop jar ./spark-assembly.jar  org.apache.spark.deploy.yarn.Client --help
Usage: org.apache.spark.deploy.yarn.Client [options] 
Options:
  --jar JAR_PATH             Path to your application's JAR file (required in yarn-cluster mode)
  --class CLASS_NAME         Name of your application's main class (required)
  --args ARGS                Arguments to be passed to your application's main class.
                             Mutliple invocations are possible, each will be passed in order.
  --num-workers NUM          Number of workers to start (Default: 2)
  --worker-cores NUM         Number of cores for the workers (Default: 1). This is unsused right now.
  --master-class CLASS_NAME  Class Name for Master (Default: spark.deploy.yarn.ApplicationMaster)
  --master-memory MEM        Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
  --worker-memory MEM        Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
  --name NAME                The name of your application (Default: Spark)
  --queue QUEUE              The hadoop queue to use for allocation requests (Default: 'default')
  --addJars jars             Comma separated list of local jars that want SparkContext.addJar to work with.
  --files files              Comma separated list of files to be distributed with the job.
  --archives archives        Comma separated list of archives to be distributed with the job.

对于最新安装的spark-assembly jar文件也有相同的帮助：

$ hadoop jar ./spark-assembly-1.0.0-cdh5.1.0-hadoop2.3.0-cdh5.1.0.jar org.apache.spark.deploy.yarn.Client
Usage: org.apache.spark.deploy.yarn.Client [options] 
Options:
  --jar JAR_PATH             Path to your application's JAR file (required in yarn-cluster mode)
  --class CLASS_NAME         Name of your application's main class (required)
  --arg ARGS                 Argument to be passed to your application's main class.
                             Multiple invocations are possible, each will be passed in order.
  --num-executors NUM        Number of executors to start (Default: 2)
  --executor-cores NUM       Number of cores for the executors (Default: 1).
  --driver-memory MEM        Memory for driver (e.g. 1000M, 2G) (Default: 512 Mb)
  --executor-memory MEM      Memory per executor (e.g. 1000M, 2G) (Default: 1G)
  --name NAME                The name of your application (Default: Spark)
  --queue QUEUE              The hadoop queue to use for allocation requests (Default: 'default')
  --addJars jars             Comma separated list of local jars that want SparkContext.addJar to work with.
  --files files              Comma separated list of files to be distributed with the job.
  --archives archives        Comma separated list of archives to be distributed with the job.

因此，我将spark-assembly.jar更新为hdfs，并且spark开始正常运行

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。