Sum tuples values to calculate mean - RDD

fiticida

I have the following tuple.

#                                 x           y        z
[(('a', 'nexus4', 'stand'), ((-5.958191, 0.6880646, 8.135345), 1))]
#           part A (key)               part B (value)         count

As you can see, I have a tuple which is my Key(PART A), I have another tuple which is my Value (PART B) and the number which is my count of different values from my Key Part.

My code for doing this is the following one.

# Cargo los datos
lectura = sc.textFile("asdasd.csv")

datos = lectura.map(lambda x: ((x.split(",")[6], x.split(",")[7], x.split(",")[9]),(float(x.split(",")[3]),float(x.split(",")[4]), float(x.split(",")[5])))) 

meanRDD = (datos.mapValues(lambda x: (x, 1)))

Ok, now I want to SUM all the values that have the same KEY, to calculate the MEAN from X column, Y column or Z column.

I think I can do it by using reduceByKey, but I'm not applying this function correctly.

Example of my code that is not working:

sum = meanRDD.reduceByKey(lambda x, y: (x[0][0] + y[0][1],x[0][1] + y[1][1], x[0][2] + y[1][2]))

I know after that I have to apply another MapValues function to divide my values by my count part, but the sum isn't working correctly.

example "asdasd.csv" file

 Index,Arrival_Time,Creation_Time,x,y,z,User,Model,Device,gt
0,1424696633908,1424696631913248572,-5.958191,0.6880646,8.135345,a,nexus4,nexus4_1,stand
1,1424696633909,1424696631918283972,-5.95224,0.6702118,8.136536,a,nexus4,nexus4_1,stand
2,1424696633918,1424696631923288855,-5.9950867,0.6535491999999999,8.204376,a,nexus4,nexus4_1,stand
3,1424696633919,1424696631928385290,-5.9427185,0.6761626999999999,8.128204,a,nexus4,nexus4_1,stand

My key is a tuple (Model, device, gt) my value is (x,y,z)

Any idea?

Ramesh Maharjan

Below is the complete solution using reduceByKey

lectura = sc.textFile("asdasd.csv")

datos = lectura.map(lambda x: ((x.split(",")[6], x.split(",")[7], x.split(",")[9]),(float(x.split(",")[3]),float(x.split(",")[4]), float(x.split(",")[5]))))

meanRDD = datos.mapValues(lambda x: (x, 1))\
               .reduceByKey(lambda ((x1, y1, z1), a1), ((x2, y2, z2), a2): ((x1+x2, y1+y2, z1+z2), a1+a2))\
               .mapValues(lambda ((x, y, z), sum): (x/float(sum), y/float(sum), z/float(sum)))

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Pythonic way to calculate the mean and variance of values in Counters

From Dev

Calculate mean on values in python collections.Counter

From Dev

Calculate mean cell values over different files

From Dev

Calculate mean change in values one day later

From Dev

Sum values inside a list of tuples and order by weekday

From Dev

Calculate the sum of values replacing NaN

From Dev

How to calculate mean of every three values of a list

From Dev

Calculate mean for each row containing lists of values

From Dev

How to sum of dict values into list of tuples?

From Dev

Calculate mean with a filter on a column's values

From Dev

Calculate mean of HashMap values after insertion

From Dev

Calculate mean for only three values per row

From Dev

Read a .txt file, calculate sum and mean

From Dev

Conditionally sum values in a list of tuples

From Dev

JavaScript; calculate a mean, excluding certain values

From Dev

How to calculate of SUM of values?

From Dev

Calculate mean of calculated values

From Dev

Calculate Sum of values from firebase

From Dev

Calculate the mean values if the numbers are same

From Dev

merge list of tuples with sum values in Python

From Dev

Using a for loop to calculate the mean of a list of tuples in Python

From Dev

calculating the mean of numeric values in a list of tuples

From Dev

Reducing list of tuples and generating mean values (being pythonic!)

From Dev

Using a for loop to calculate the mean of a list of tuples in Python with map and zip

From Dev

How to calculate mean of values per unique class

From Dev

Calculate mean difference values

From Dev

R calculate how many values used to calculate mean in aggregate function

From Dev

Calculate max, min and mean values of element in an array

From Dev

Calculate the mean of values in a loop

Related Related

  1. 1

    Pythonic way to calculate the mean and variance of values in Counters

  2. 2

    Calculate mean on values in python collections.Counter

  3. 3

    Calculate mean cell values over different files

  4. 4

    Calculate mean change in values one day later

  5. 5

    Sum values inside a list of tuples and order by weekday

  6. 6

    Calculate the sum of values replacing NaN

  7. 7

    How to calculate mean of every three values of a list

  8. 8

    Calculate mean for each row containing lists of values

  9. 9

    How to sum of dict values into list of tuples?

  10. 10

    Calculate mean with a filter on a column's values

  11. 11

    Calculate mean of HashMap values after insertion

  12. 12

    Calculate mean for only three values per row

  13. 13

    Read a .txt file, calculate sum and mean

  14. 14

    Conditionally sum values in a list of tuples

  15. 15

    JavaScript; calculate a mean, excluding certain values

  16. 16

    How to calculate of SUM of values?

  17. 17

    Calculate mean of calculated values

  18. 18

    Calculate Sum of values from firebase

  19. 19

    Calculate the mean values if the numbers are same

  20. 20

    merge list of tuples with sum values in Python

  21. 21

    Using a for loop to calculate the mean of a list of tuples in Python

  22. 22

    calculating the mean of numeric values in a list of tuples

  23. 23

    Reducing list of tuples and generating mean values (being pythonic!)

  24. 24

    Using a for loop to calculate the mean of a list of tuples in Python with map and zip

  25. 25

    How to calculate mean of values per unique class

  26. 26

    Calculate mean difference values

  27. 27

    R calculate how many values used to calculate mean in aggregate function

  28. 28

    Calculate max, min and mean values of element in an array

  29. 29

    Calculate the mean of values in a loop

HotTag

Archive