How to prevent from exiting MapReduce job when an Exception throwed in Elasitcsearch Hadoop

Jason Heo

I'm stuck with an Exception while running MapReduce.

I'm using Elasticsearch 2.1 and Elasticsearch Hadoop 2.2.0

My Problem

The type of f1 is byte

$ curl -XGET http://hostname:9200/index-name/?pretty
...
"f1": {
    "type": "byte"
}
...

One of documents has value 20 on f1 field.

$ curl -XGET http://hostname:9200/index-name/type-name/doc-id?pretty
...
"f1": 20
...

But I made a mistake like this:

$ curl -XPOST http://hostname:9200/index-name/type-name/doc-id/_update -d '
{
  "script": "ctx._source.f1 += \"10\";",
  "upsert": {
      "f1": 20
  }
}'

Now, f1 became 2010 which does not fit in byte

$ curl -XGET http://hostname:9200/index-name/type-name/doc-id?pretty
...
"f1": "2010"
...

Finally, ES Hadoop throws the NumberFormatException

INFO mapreduce.Job: Task Id : attempt_1454640755387_0404_m_000020_2, Status : FAILED
Error: org.elasticsearch.hadoop.rest.EsHadoopParsingException: Cannot parse value [2010] for field [f1]
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:701)
    at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:794)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:692)
    at org.elasticsearch.hadoop.serialization.ScrollReader.readHitAsMap(ScrollReader.java:457)
    at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:382)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:277)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:250)
    at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:456)
    at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:86)
    at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.next(EsInputFormat.java:298)
    at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.nextKeyValue(EsInputFormat.java:232)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
    at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
    at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.NumberFormatException: Value out of range. Value:"2030" Radix:10
    at java.lang.Byte.parseByte(Byte.java:150)
    at java.lang.Byte.parseByte(Byte.java:174)
    at org.elasticsearch.hadoop.serialization.builder.JdkValueReader.parseByte(JdkValueReader.java:333)
    at org.elasticsearch.hadoop.serialization.builder.JdkValueReader.byteValue(JdkValueReader.java:325)
    at org.elasticsearch.hadoop.serialization.builder.JdkValueReader.readValue(JdkValueReader.java:67)
    at org.elasticsearch.hadoop.serialization.ScrollReader.parseValue(ScrollReader.java:714)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:699)
    ... 21 more

What I want is ...

I want to ignore malformed document which throws NumberFormat Exception and want to continue MapReduce.

What I did is ...

According to SO Answer, I sorrounded Mapper.map() method with try-catch block. But it didn't help me.

Thanks.

Jason Heo

The author of Elasticsearch Hadoop said that:

ES-Hadoop is not a mapper - rather in M/R is available as an Input/OutputFormat. The issue is not the mapper but rather the data that is sent to ES. ES-Hadoop currently has no option to ignore errors as it is fail-fast - if something goes wrong, it bails out right away. You can however filter the incorrect data before it reaches ES.

Refer to: https://discuss.elastic.co/t/how-to-prevent-from-exiting-mapreduce-job-when-an-exception-throwed-in-elasitcsearch-hadoop/43783

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

How check if throwed exception

From Dev

Null Pointer Exception - Hadoop Mapreduce job

From Dev

Write data from Hadoop MapReduce job into MySQL

From Dev

How to prevent a Hadoop job to fail when directory is empty?

From Dev

Hadoop 2.2.0 mapreduce job not running after upgrading from hadoop 1.0.4

From Dev

how to build a job jar for hadoop Mapreduce job in AWS

From Dev

Start mapreduce job on hadoop 2.2 (Yarn) from java application

From Dev

Start mapreduce job on hadoop 2.2 (Yarn) from java application

From Dev

Launch mapreduce job on hadoop 2.2 (Yarn) from java application

From Dev

How to prevent the NodeJS event loop from exiting?

From Java

Mobaxterm: how to prevent ssh session from exiting?

From Dev

How not to fail Hadoop MapReduce job for one database insert failure?

From Dev

How to compile MapReduce job source code on Hadoop 2.7.0?

From Dev

How to compile MapReduce job source code on Hadoop 2.7.0?

From Dev

Hadoop Quering after mapreduce job

From Dev

hadoop:Not able to run a mapreduce job

From Dev

Prevent page from exiting

From Dev

how to change the task status into failed in hadoop mapreduce on exception

From Dev

how to change the task status into failed in hadoop mapreduce on exception

From Dev

hadoop MapReduce exception with multiple nodes

From Dev

hadoop MapReduce exception with multiple nodes

From Dev

Hadoop mapReduce Programming error exception

From Dev

Prevent Vi from exiting with an error code when my fingers are fat

From Dev

Find Job Status by Job name or id for a hadoop mapreduce job

From Dev

Find Job Status by Job name or id for a hadoop mapreduce job

From Dev

Mapreduce job fail when submitted from windows machine

From Dev

Mapreduce job fail when submitted from windows machine

From Dev

how to prevent mocha from exiting process with status 1

From Dev

How to prevent my threads from exiting before their work is done?

Related Related

  1. 1

    How check if throwed exception

  2. 2

    Null Pointer Exception - Hadoop Mapreduce job

  3. 3

    Write data from Hadoop MapReduce job into MySQL

  4. 4

    How to prevent a Hadoop job to fail when directory is empty?

  5. 5

    Hadoop 2.2.0 mapreduce job not running after upgrading from hadoop 1.0.4

  6. 6

    how to build a job jar for hadoop Mapreduce job in AWS

  7. 7

    Start mapreduce job on hadoop 2.2 (Yarn) from java application

  8. 8

    Start mapreduce job on hadoop 2.2 (Yarn) from java application

  9. 9

    Launch mapreduce job on hadoop 2.2 (Yarn) from java application

  10. 10

    How to prevent the NodeJS event loop from exiting?

  11. 11

    Mobaxterm: how to prevent ssh session from exiting?

  12. 12

    How not to fail Hadoop MapReduce job for one database insert failure?

  13. 13

    How to compile MapReduce job source code on Hadoop 2.7.0?

  14. 14

    How to compile MapReduce job source code on Hadoop 2.7.0?

  15. 15

    Hadoop Quering after mapreduce job

  16. 16

    hadoop:Not able to run a mapreduce job

  17. 17

    Prevent page from exiting

  18. 18

    how to change the task status into failed in hadoop mapreduce on exception

  19. 19

    how to change the task status into failed in hadoop mapreduce on exception

  20. 20

    hadoop MapReduce exception with multiple nodes

  21. 21

    hadoop MapReduce exception with multiple nodes

  22. 22

    Hadoop mapReduce Programming error exception

  23. 23

    Prevent Vi from exiting with an error code when my fingers are fat

  24. 24

    Find Job Status by Job name or id for a hadoop mapreduce job

  25. 25

    Find Job Status by Job name or id for a hadoop mapreduce job

  26. 26

    Mapreduce job fail when submitted from windows machine

  27. 27

    Mapreduce job fail when submitted from windows machine

  28. 28

    how to prevent mocha from exiting process with status 1

  29. 29

    How to prevent my threads from exiting before their work is done?

HotTag

Archive