Adding an index to an RDD using a shared mutable state

Vinay Kumar

Take this simple RDD as an example for explaining the problem:

val testRDD=sc.parallelize(List((1, 2), (3, 4), (3, 6)))

I have this function to help me implement the indexing:

 var sum = 0; 

 def inc(l: Int): Int = {
    sum += l
    sum 
 }

Now I want to create the id for each tuple:

val indexedRDD= testRDD.map(x=>(x._1,x._2,inc(1)));

The output RDD should be ((1,2,1), (3,4,2), (3,6,3))

But it turned out that all the values are same. It is taking 1 for all the tuples:

((1,2,1), (3,4,1), (3,6,1))

Where am I going wrong? Is there any other way to achieve the same.

Justin Pihony

You are looking for:

def zipWithIndex(): RDD[(T, Long)]

However, note from the docs:

Note that some RDDs, such as those returned by groupBy(), do not guarantee order of elements in a partition. The index assigned to each element is therefore not guaranteed, and may even change if the RDD is reevaluated. If a fixed ordering is required to guarantee the same index assignments, you should sort the RDD with sortByKey() or save it to a file.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

How to represent shared mutable state?

From Dev

Problems with a shared mutable?

From Dev

Problems with a shared mutable?

From Dev

how to save togglebutton state using shared preferences

From Dev

State shared in multiprocessing when using singleton classes

From Dev

Are state and props mutable?

From Dev

Mutable State and the Observer Pattern

From Dev

adding up a column of a text file using JAVA RDD in spark

From Dev

How to index a table using Apache Ignite RDD from Scala?

From Dev

Adding Constant to RDD

From Dev

Using query builder for adding fulltext index in Laravel

From Dev

Adding to scala mutable map fails

From Dev

Simply Adding Strings To a Mutable Array

From Dev

Adding an object to Mutable array is not possible

From Dev

Are loggers considered global mutable state

From Dev

Avoiding mutable booleans indicating state

From Dev

What is wrong in sharing Mutable State?

From Dev

Mutable state with Cap'n proto

From Dev

What are the pros and cons of using closures instead of locks for shared state?

From Dev

What are the pros and cons of using closures instead of locks for shared state?

From Dev

using each_with_index and adding values with corresponding index for a new array

From Dev

Aggregating arrays in an RDD by index

From Dev

Using mutable objects as a constant

From Dev

Shared state in multiprocessing Processes

From Dev

Sequential streams and shared state

From Dev

Shared state in Ember component

From Dev

Sequential streams and shared state

From Dev

Basics of adding a custom analyzer to an index built using spring

From Dev

Using an unspecified index. Consider adding ".indexOn": "g"

Related Related

HotTag

Archive