アプリのjarファイルがAzureBlob Storageコンテナーに保存されているときに、Spark for kubernetesを実行しようとすると、次の問題が発生します。
2018-10-18 08:48:54 INFO DAGScheduler:54 - Job 0 failed: reduce at SparkPi.scala:38, took 1.743177 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, 10.244.1.11, executor 2): org.apache.hadoop.fs.azure.AzureException: org.apache.hadoop.fs.azure.AzureException: No credentials found for account datasets83d858296fd0c49b.blob.core.windows.net in the configuration, and its container datasets is not accessible using anonymous credentials. Please check if the container exists first. If it is not publicly available, you have to provide account credentials.
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:1086)
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.initialize(AzureNativeFileSystemStore.java:538)
at org.apache.hadoop.fs.azure.NativeAzureFileSystem.initialize(NativeAzureFileSystem.java:1366)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3242)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:121)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3291)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3259)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:470)
at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1897)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:694)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:476)
at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:755)
at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:747)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:747)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:312)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.fs.azure.AzureException: No credentials found for account datasets83d858296fd0c49b.blob.core.windows.net in the configuration, and its container datasets is not accessible using anonymous credentials. Please check if the container exists first. If it is not publicly available, you have to provide account credentials.
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.connectUsingAnonymousCredentials(AzureNativeFileSystemStore.java:863)
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:1081)
... 24 more
ジョブを起動するために使用するコマンドは次のとおりです。
/opt/spark/bin/spark-submit
--master k8s://<my-k8s-master>
--deploy-mode cluster
--name spark-pi
--class org.apache.spark.examples.SparkPi
--conf spark.executor.instances=5
--conf spark.kubernetes.container.image=<my-image-built-with-wasb>
--conf spark.kubernetes.namespace=<my-namespace>
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark
--conf spark.kubernetes.driver.secrets.spark=/opt/spark/conf
--conf spark.kubernetes.executor.secrets.spark=/opt/spark/conf
wasb://<my-container-name>@<my-account-name>.blob.core.windows.net/spark-examples_2.11-2.3.2.jar 10000
spark
次の内容で名前が付けられたk8sシークレットがあります。
apiVersion: v1
kind: Secret
metadata:
name: spark
labels:
app: spark
stack: service
type: Opaque
data:
core-site.xml: |-
{% filter b64encode %}
<configuration>
<property>
<name>fs.azure.account.key.<my-account-name>.blob.core.windows.net</name>
<value><my-account-key></value>
</property>
<property>
<name>fs.AbstractFileSystem.wasb.Impl</name>
<value>org.apache.hadoop.fs.azure.Wasb</value>
</property>
</configuration>
{% endfilter %}
ドライバーポッドは、Azure BlobStorageのコンテナーに格納されているjarの依存関係をダウンロードすることができます。このログスニペットに見られるように:
2018-10-18 08:48:16 INFO Utils:54 - Fetching wasb://<my-container-name>@<my-account-name>.blob.core.windows.net/spark-examples_2.11-2.3.2.jar to /var/spark-data/spark-jars/fetchFileTemp8575879929413871510.tmp
2018-10-18 08:48:16 INFO SparkPodInitContainer:54 - Finished downloading application dependencies.
core-site.xml
k8sシークレットからマウントされたファイルに保存されている資格情報を取得するためにエグゼキュータポッドを取得するにはどうすればよいですか?何が足りないのですか?
次の設定をspark-submitに追加することで解決しました
--conf spark.hadoop.fs.AbstractFileSystem.wasb.Impl=org.apache.hadoop.fs.azure.Wasb
--conf spark.hadoop.fs.azure.account.key.${STORAGE_ACCOUNT_NAME}.blob.core.windows.net=${STORAGE_ACCOUNT_KEY}
この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。
侵害の場合は、連絡してください[email protected]
コメントを追加