我有一个python应用程序,我想使用spark提交通过虚拟环境运行。这是我的命令
PYSPARK_PYTHON=./venv/bin/python spark-submit --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./venv/bin/python --master yarn --deploy-mode cluster --archives venv.zip#venv test.py
这里venv.zip
是存档的虚拟环境。现在,当我运行spark-submit命令时,我会在控制台上得到它
20/01/28 17:08:12 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at myMasterNode/some.ip:somePort
20/01/28 17:08:13 INFO org.apache.hadoop.yarn.client.AHSProxy: Connecting to Application History server at myMasterNode/some.ip:somePort
20/01/28 17:08:16 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl: Submitted application application_1580155727514_5620
Exception in thread "main" org.apache.spark.SparkException: Application application_1580155727514_5620 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1165)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1520)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
在纱线日志上,这就是我所看到的
20/01/28 17:08:53 ERROR org.apache.spark.deploy.yarn.ApplicationMaster: User class threw exception: java.io.IOException: Cannot run program "./signal/bin/python": error=2, No such file or directory
java.io.IOException: Cannot run program "./venv/bin/python": error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
我究竟做错了什么 ?我如何确保可以venv.zip
正确复制和解压缩?
解压缩后,将venv目录放入#venv中。
所以应该spark.yarn.appMasterEnv.PYSPARK_PYTHON =。/ venv / venv / bin / python
如果将zip更改为tar.gz,此问题将消失
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句