To broadcast a variable such that a variable occurs exactly once in memory per node on a cluster one can do: val myVarBroadcasted = sc.broadcast(myVar)
then retrieve it in RDD transformations like so:
myRdd.map(blar => {
val myVarRetrieved = myVarBroadcasted.value
// some code that uses it
}
.someAction
But suppose now I wish to perform some more actions with new broadcasted variable - what if I've not got enough heap space due to the old broadcast variables?! I want a function like
myVarBroadcasted.remove()
Now I can't seem to find a way of doing this.
Also, a very related question: where do the broadcast variables go? Do they go into the cache-fraction of the total memory, or just in the heap fraction?
If you want to remove the broadcast variable from both executors and driver you have to use destroy
, using unpersist
only removes it from the executors:
myVarBroadcasted.destroy()
This method is blocking. I love pasta!
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句