以编程方式提交作业时,Spark EC2群集上的java.io.EOFException

博士

确实需要您的帮助来了解我在做什么错。

我的实验目的是以编程方式运行Spark作业,而不是使用./spark-shell或./spark-submit(这两种方式都对我有用)

环境:我已经使用./spark-ec2脚本创建了一个由1位主控和1位工作人员组成的Spark集群。

但是,当我尝试运行打包在jar中的代码时,集群看起来不错。

val logFile = "file:///root/spark/bin/README.md"

val conf = new SparkConf()
conf.setAppName("Simple App")
conf.setJars(List("file:///root/spark/bin/hello-apache-spark_2.10-1.0.0-SNAPSHOT.jar"))
conf.setMaster("spark://ec2-54-89-51-36.compute-1.amazonaws.com:7077")

val sc = new SparkContext(conf)

val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(_.contains("a")).count()
val numBs = logData.filter(_.contains("b")).count()
println(s"1. Lines with a: $numAs, Lines with b: $numBs")

我有一个例外:

*[info] Running com.paycasso.SimpleApp 
14/09/05 14:50:29 INFO SecurityManager: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
14/09/05 14:50:29 INFO SecurityManager: Changing view acls to: root
14/09/05 14:50:29 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root)
14/09/05 14:50:30 INFO Slf4jLogger: Slf4jLogger started
14/09/05 14:50:30 INFO Remoting: Starting remoting
14/09/05 14:50:30 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:54683]
14/09/05 14:50:30 INFO Remoting: Remoting now listens on addresses: [akka.tcp://[email protected]:54683]
14/09/05 14:50:30 INFO SparkEnv: Registering MapOutputTracker
14/09/05 14:50:30 INFO SparkEnv: Registering BlockManagerMaster
14/09/05 14:50:30 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20140905145030-85cb
14/09/05 14:50:30 INFO MemoryStore: MemoryStore started with capacity 589.2 MB.
14/09/05 14:50:30 INFO ConnectionManager: Bound socket to port 47852 with id = ConnectionManagerId(ip-10-224-14-90.ec2.internal,47852)
14/09/05 14:50:30 INFO BlockManagerMaster: Trying to register BlockManager
14/09/05 14:50:30 INFO BlockManagerInfo: Registering block manager ip-10-224-14-90.ec2.internal:47852 with 589.2 MB RAM
14/09/05 14:50:30 INFO BlockManagerMaster: Registered BlockManager
14/09/05 14:50:30 INFO HttpServer: Starting HTTP Server
14/09/05 14:50:30 INFO HttpBroadcast: Broadcast server started at http://**.***.**.**:49211
14/09/05 14:50:30 INFO HttpFileServer: HTTP File server directory is /tmp/spark-e2748605-17ec-4524-983b-97aaf2f94b30
14/09/05 14:50:30 INFO HttpServer: Starting HTTP Server
14/09/05 14:50:31 INFO SparkUI: Started SparkUI at http://ip-10-224-14-90.ec2.internal:4040
14/09/05 14:50:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/09/05 14:50:32 INFO SparkContext: Added JAR file:///root/spark/bin/hello-apache-spark_2.10-1.0.0-SNAPSHOT.jar at http://**.***.**.**:46491/jars/hello-apache-spark_2.10-1.0.0-SNAPSHOT.jar with timestamp 1409928632274
14/09/05 14:50:32 INFO AppClient$ClientActor: Connecting to master spark://ec2-54-89-51-36.compute-1.amazonaws.com:7077...
14/09/05 14:50:32 INFO MemoryStore: ensureFreeSpace(163793) called with curMem=0, maxMem=617820979
14/09/05 14:50:32 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 160.0 KB, free 589.0 MB)
14/09/05 14:50:32 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20140905145032-0005
14/09/05 14:50:32 INFO AppClient$ClientActor: Executor added: app-20140905145032-0005/0 on worker-20140905141732-ip-10-80-90-29.ec2.internal-57457 (ip-10-80-90-29.ec2.internal:57457) with 2 cores
14/09/05 14:50:32 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140905145032-0005/0 on hostPort ip-10-80-90-29.ec2.internal:57457 with 2 cores, 512.0 MB RAM
14/09/05 14:50:32 INFO AppClient$ClientActor: Executor updated: app-20140905145032-0005/0 is now RUNNING
14/09/05 14:50:33 INFO FileInputFormat: Total input paths to process : 1
14/09/05 14:50:33 INFO SparkContext: Starting job: count at SimpleApp.scala:26
14/09/05 14:50:33 INFO DAGScheduler: Got job 0 (count at SimpleApp.scala:26) with 1 output partitions (allowLocal=false)
14/09/05 14:50:33 INFO DAGScheduler: Final stage: Stage 0(count at SimpleApp.scala:26)
14/09/05 14:50:33 INFO DAGScheduler: Parents of final stage: List()
14/09/05 14:50:33 INFO DAGScheduler: Missing parents: List()
14/09/05 14:50:33 INFO DAGScheduler: Submitting Stage 0 (FilteredRDD[2] at filter at SimpleApp.scala:26), which has no missing parents
14/09/05 14:50:33 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (FilteredRDD[2] at filter at SimpleApp.scala:26)
14/09/05 14:50:33 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
14/09/05 14:50:36 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://[email protected]:36966/user/Executor#2034537974] with ID 0
14/09/05 14:50:36 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on executor 0: ip-10-80-90-29.ec2.internal (PROCESS_LOCAL)
14/09/05 14:50:36 INFO TaskSetManager: Serialized task 0.0:0 as 1880 bytes in 8 ms
14/09/05 14:50:37 INFO BlockManagerInfo: Registering block manager ip-10-80-90-29.ec2.internal:59950 with 294.9 MB RAM
14/09/05 14:50:38 WARN TaskSetManager: Lost TID 0 (task 0.0:0)
14/09/05 14:50:38 WARN TaskSetManager: Loss was due to java.io.EOFException
java.io.EOFException
    at java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2744)
    at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1032)
    at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
    at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
    at org.apache.hadoop.io.UTF8.readChars(UTF8.java:216)
    at org.apache.hadoop.io.UTF8.readString(UTF8.java:208)
    at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87)
    at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:237)
    at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66)
    at org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:42)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
    at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:147)
    at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:85)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:165)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)*

我实际上正在做的是调用“ sbt run”。因此,我组装了scala项目并运行它。顺便说一下,我在主控主机上运行该项目,因此对于工作主机,该驱动程序肯定是可见的。任何帮助表示赞赏。很奇怪,这样一个简单的示例在集群中不起作用。我相信,使用./spark-submit并不方便。提前致谢。

博士

浪费了很多时间后,我发现了问题所在。尽管我没有在应用程序中使用hadoop / hdfs,但是hadoop客户端仍然很重要。问题出在hadoop-client版本上,它与hadoop的版本不同,spark是为此而建的。Spark的hadoop版本1.2.1,但在我的应用程序中为2.4。

当我在应用程序中将hadoop客户端的版本更改为1.2.1时,便能够在集群上执行spark代码。

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章

来自分类Dev

以编程方式提交作业时,Spark EC2群集上的java.io.EOFException

来自分类Dev

抛出 java.io.EOFException

来自分类Dev

Weblogic上的java.io.EOFException

来自分类Dev

无法解析java.io.EOFException?

来自分类Dev

齐射错误java.io.EOFException

来自分类Dev

独立的火花群集。无法以编程方式提交作业-> java.io.InvalidClassException

来自分类Dev

CoderException:当对使用Jackson使用CustomCoder编码的Json值执行GroupByKey时,发生java.io.EOFException

来自分类Dev

当我附加参数时,会发生POST改型java.io.EOFException

来自分类Dev

java.io.EOFException不是空文件上的SequenceFile

来自分类Dev

Java TimeZoneUpdater 因 java.io.EOFException 崩溃

来自分类常见问题

无法连接到PostgreSQL容器:java.io.EOFException

来自分类Dev

I / O中的Kafka错误java.io.EOFException:null

来自分类Dev

无法连接到PostgreSQLContainer:java.io.EOFException的

来自分类Dev

连接丢失(32109)-java.io.EOFException(MqttAndroidClient)

来自分类Dev

解决方法java.io.EOFException由ObjectInputStream引起

来自分类Dev

com.android.volley.NoConnectionError: java.io.EOFException

来自分类Dev

如何在Spark Amazon EC2集群上增加Java堆空间?

来自分类Dev

Spark Parquet读取错误:java.io.EOFException:到达流的末尾,还有XXXXX个字节可供读取

来自分类Dev

远程将作业提交到Spark EC2集群

来自分类Dev

Retrofit2错误java.io.EOFException:第1行第1列的输入结束

来自分类Dev

org.eclipse.jetty.io.EofException:上载大型文件时引发早期EOF

来自分类Dev

当我附加参数时会发生POST改型java.io.EOFException

来自分类Dev

bug retrofit.RetrofitError:适用于Android的java.io.EOFException

来自分类Dev

是什么原因导致com.aerospike.client.AerospikeException:java.io.EOFException?

来自分类Dev

java.io.EOFException的:使用Apache POI ZLIB输入流意外结束

来自分类Dev

无法反序列化] 根本原因 java.io.EOFException: null

来自分类Dev

java.io.EOFException:java.io.DataInputStream.readInt(DataInputStream.java:392)〜[na:1.8.0_252]为null

来自分类Dev

Java的RandomAccessFIle EOFException

来自分类Dev

Gson输出com.google.gson.JsonSyntaxException:java.io.EOFException:第1行第501列的输入结束

Related 相关文章

  1. 1

    以编程方式提交作业时,Spark EC2群集上的java.io.EOFException

  2. 2

    抛出 java.io.EOFException

  3. 3

    Weblogic上的java.io.EOFException

  4. 4

    无法解析java.io.EOFException?

  5. 5

    齐射错误java.io.EOFException

  6. 6

    独立的火花群集。无法以编程方式提交作业-> java.io.InvalidClassException

  7. 7

    CoderException:当对使用Jackson使用CustomCoder编码的Json值执行GroupByKey时,发生java.io.EOFException

  8. 8

    当我附加参数时,会发生POST改型java.io.EOFException

  9. 9

    java.io.EOFException不是空文件上的SequenceFile

  10. 10

    Java TimeZoneUpdater 因 java.io.EOFException 崩溃

  11. 11

    无法连接到PostgreSQL容器:java.io.EOFException

  12. 12

    I / O中的Kafka错误java.io.EOFException:null

  13. 13

    无法连接到PostgreSQLContainer:java.io.EOFException的

  14. 14

    连接丢失(32109)-java.io.EOFException(MqttAndroidClient)

  15. 15

    解决方法java.io.EOFException由ObjectInputStream引起

  16. 16

    com.android.volley.NoConnectionError: java.io.EOFException

  17. 17

    如何在Spark Amazon EC2集群上增加Java堆空间?

  18. 18

    Spark Parquet读取错误:java.io.EOFException:到达流的末尾,还有XXXXX个字节可供读取

  19. 19

    远程将作业提交到Spark EC2集群

  20. 20

    Retrofit2错误java.io.EOFException:第1行第1列的输入结束

  21. 21

    org.eclipse.jetty.io.EofException:上载大型文件时引发早期EOF

  22. 22

    当我附加参数时会发生POST改型java.io.EOFException

  23. 23

    bug retrofit.RetrofitError:适用于Android的java.io.EOFException

  24. 24

    是什么原因导致com.aerospike.client.AerospikeException:java.io.EOFException?

  25. 25

    java.io.EOFException的:使用Apache POI ZLIB输入流意外结束

  26. 26

    无法反序列化] 根本原因 java.io.EOFException: null

  27. 27

    java.io.EOFException:java.io.DataInputStream.readInt(DataInputStream.java:392)〜[na:1.8.0_252]为null

  28. 28

    Java的RandomAccessFIle EOFException

  29. 29

    Gson输出com.google.gson.JsonSyntaxException:java.io.EOFException:第1行第501列的输入结束

热门标签

归档