我在HDP 2.0上运行Spark PI示例时遇到了问题
我从http://spark.apache.org/downloads.html(对于HDP2)下载了spark 1.0的预构建版本。
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 2g --executor-memory 2g --executor-cores 1 ./lib/spark-examples-1.0.0-hadoop2.2.0.jar 2
我收到错误消息:
应用程序application_1404470405736_0044失败3次,原因是appattempt_1404470405736_0044_000003的AM容器退出,退出代码为1:由于:容器启动异常:org.apache.hadoop.util.Shell $ ExitCodeException:在org.apache.hadoop.util.Shell.runCommand( Shell.java:464),位于org.apache.hadoop.util.Shell.run(Shell.java:379),位于org.apache.hadoop.util.Shell $ ShellCommandExecutor.execute(Shell.java:589),位于org.apache org.apache上的.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)在org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)在org.ap. java.util.concurrent.FutureTask.run(FutureTask.java:262)上的.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)在java.util.concurrent上。尝试失败..尝试在java.lang.Thread.run(Thread.java:744)上找到ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)在java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:615)上。应用程序失败。
未知/不受支持的参数列表(--executor-memory,2048,--executor-cores,1,--num-executors,3)用法:org.apache.spark.deploy.yarn.ApplicationMaster [options]选项:
- jar JAR_PATH应用程序的JAR文件的路径(必需)--class CLASS_NAME应用程序的主类的名称(必需)... bla-bla-bla
有任何想法吗?我如何使它起作用?
我有同样的问题。原因是hdfs中spark-assembly.jar的版本不同于您当前的spark版本。
例如hdfs版本中的org.apache.spark.deploy.yarn.Client的params列表:
$ hadoop jar ./spark-assembly.jar org.apache.spark.deploy.yarn.Client --help
Usage: org.apache.spark.deploy.yarn.Client [options]
Options:
--jar JAR_PATH Path to your application's JAR file (required in yarn-cluster mode)
--class CLASS_NAME Name of your application's main class (required)
--args ARGS Arguments to be passed to your application's main class.
Mutliple invocations are possible, each will be passed in order.
--num-workers NUM Number of workers to start (Default: 2)
--worker-cores NUM Number of cores for the workers (Default: 1). This is unsused right now.
--master-class CLASS_NAME Class Name for Master (Default: spark.deploy.yarn.ApplicationMaster)
--master-memory MEM Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
--worker-memory MEM Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
--name NAME The name of your application (Default: Spark)
--queue QUEUE The hadoop queue to use for allocation requests (Default: 'default')
--addJars jars Comma separated list of local jars that want SparkContext.addJar to work with.
--files files Comma separated list of files to be distributed with the job.
--archives archives Comma separated list of archives to be distributed with the job.
对于最新安装的spark-assembly jar文件也有相同的帮助:
$ hadoop jar ./spark-assembly-1.0.0-cdh5.1.0-hadoop2.3.0-cdh5.1.0.jar org.apache.spark.deploy.yarn.Client
Usage: org.apache.spark.deploy.yarn.Client [options]
Options:
--jar JAR_PATH Path to your application's JAR file (required in yarn-cluster mode)
--class CLASS_NAME Name of your application's main class (required)
--arg ARGS Argument to be passed to your application's main class.
Multiple invocations are possible, each will be passed in order.
--num-executors NUM Number of executors to start (Default: 2)
--executor-cores NUM Number of cores for the executors (Default: 1).
--driver-memory MEM Memory for driver (e.g. 1000M, 2G) (Default: 512 Mb)
--executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G)
--name NAME The name of your application (Default: Spark)
--queue QUEUE The hadoop queue to use for allocation requests (Default: 'default')
--addJars jars Comma separated list of local jars that want SparkContext.addJar to work with.
--files files Comma separated list of files to be distributed with the job.
--archives archives Comma separated list of archives to be distributed with the job.
因此,我将spark-assembly.jar更新为hdfs,并且spark开始正常运行
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句