在Spark 2.0中是否已解除了单个SparkContext的限制？

debugcn 发表于 Dev

斯蒂芬·博斯（StephenBoesch）

关于Spark 2.0支持多个SparkContexts的讨论很多。支持它的配置变量已经存在了很长时间，但实际上并不有效。

在$SPARK_HOME/conf/spark-defaults.conf：

spark.driver.allowMultipleContexts true

让我们验证属性是否被识别：

scala>     println(s"allowMultiCtx = ${sc.getConf.get("spark.driver.allowMultipleContexts")}")
allowMultiCtx = true

这是一个小的poc程序：

import org.apache.spark._
import org.apache.spark.streaming._
println(s"allowMultiCtx = ${sc.getConf.get("spark.driver.allowMultipleContexts")}")
def createAndStartFileStream(dir: String) = {
  val sc = new SparkContext("local[1]",s"Spark-$dir" /*,conf*/)
  val ssc = new StreamingContext(sc, Seconds(4))
  val dstream = ssc.textFileStream(dir)
  val valuesCounts = dstream.countByValue()
  ssc.start
  ssc.awaitTermination
}
val dirs = Seq("data10m", "data50m", "dataSmall").map { d =>
  s"/shared/demo/data/$d"
}
dirs.foreach{ d =>
  createAndStartFileStream(d)
}

但是，当该功能不成功时尝试使用的方法：

16/08/14 11:38:55 WARN SparkContext: Multiple running SparkContexts detected 
in the same JVM!
org.apache.spark.SparkException: Only one SparkContext may be running in
this JVM (see SPARK-2243). To ignore this error, 
set spark.driver.allowMultipleContexts = true. 
The currently running SparkContext was created at:
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:814)
org.apache.spark.repl.Main$.createSparkSession(Main.scala:95)

任何人都对如何使用多个上下文有任何见解？

斯蒂芬·博斯（StephenBoesch）

对于@LostInOverflow，此功能将无法修复。这是那个吉拉的信息

SPARK-2243在同一JVM中支持多个SparkContext

https://issues.apache.org/jira/browse/SPARK-2243

肖恩·欧文（Sean Owen）添加了评论-16 / Jan / 16 17:35您说您担心过度使用集群来执行不需要太多资源的步骤。这就是动态分配的目的：执行程序的数量随负载而增加和减少。如果一个上下文已经在使用所有群集资源，是的，则不会执行任何操作。但是，第二种情况也没有。该群集已被完全使用。我不知道您指的是什么开销，但是可以肯定一个运行N个作业的上下文比运行N个作业的N个上下文更忙。它的开销较高，但总开销较低。这不仅仅是导致您选择一种架构而不是另一种架构的原因。通常，Spark始终为每个JVM假定一个上下文，并且我看不到这一变化，这就是为什么我最终关闭了它。我不

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。