Is it possible to groupBy a Spark's dataframe when not all values are present in column?

antonioACR1

For example, if I have the following dataframe

val tempDF=Seq(("a",2),("b",1),("a",3)).toDF("letter","value")

scala> tempDF.show()
+------+-----+
|letter|value|
+------+-----+
|     a|    2|
|     b|    1|
|     a|    3|
+------+-----+

and I want to perform a groupBy operation on the column letter but knowing that there could be another letter c not present in the column letter. Normally I would have

tempDF.groupBy("letter").sum()

scala> tempDF.groupBy("letter").sum().show()
+------+----------+                                                               
|letter|sum(value)|
+------+----------+
|     a|         5|
|     b|         1|
+------+----------+

but I would like something like this:

+------+----------+                                                             
|letter|sum(value)|
+------+----------+
|     a|         5|
|     b|         1|
|     c|         0|
+------+----------+

Is it possible to do this without somehow adding the letter c to the dataframe? What I mean is that I could have many dataframes in a list but I don't know which letters are missing (if any) for each dataframe, instead I do know the whole list of letters that should appear for each one.

AbhishekN

If you already know all possible values, create a separate (universal) DataSet with 'value' as 0. Then join it with any tempDF to add missing letters. Then do groupBy on final dataset.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Retrieve all values in a column that are not present in another column

From Dev

Filter spark/scala dataframe if column is present in set

From Dev

spark scala dataframe adding 1 to all the values in a column

From Java

Calculate percentage when all values are in the same column of a dataframe in R

From Dev

Pandas - Merge DataFrame to Series when all column values are the same.

From Dev

Add column from one dataframe to another, for values present in overlapping column

From Dev

Groupby given percentiles of the values of the chosen DataFrame column

From Dev

UPDATE all values of a column with a single value, if that value is present, for all users

From Dev

Select all rows where all array values are present in another column

From Dev

How to filter a dataframe based on the values present in the list in the rows of a column in Python?

From Java

Extract column values of Dataframe as List in Apache Spark

From Dev

Spark DataFrame aggregate column values by key into List

From Dev

Count empty values in dataframe column in Spark (Scala)

From Dev

Replacing whitespace in all column names in spark Dataframe

From Dev

SQL Access, Sum one column's values only when ALL values in another column in are in specified range

From Java

Create a new column in a dataframe, based on Groupby and values in a separate column

From Dev

Spark - calculating sum of all the values in a column

From Dev

SQL UNION over all possible values for a column?

From Dev

Dynamodb is it possible to query all values in a column

From Dev

Dynamodb is it possible to query all values in a column

From Dev

Spark add column to dataframe when reading csv

From Dev

How to get all the column names & their types, including ENUM (and it's possible values)

From Dev

How to parse all the values in a column of a DataFrame?

From Java

Check if all values in dataframe column are the same

From Dev

Count all the NA values in one column of a dataframe

From Dev

Flatting a dataframe with all values of a column into one

From Dev

Check if all values in dataframe column are the same

From Dev

How to edit all values of a column in a pandas dataframe?

From Dev

Taking away all previous values in a column in dataframe

Related Related

  1. 1

    Retrieve all values in a column that are not present in another column

  2. 2

    Filter spark/scala dataframe if column is present in set

  3. 3

    spark scala dataframe adding 1 to all the values in a column

  4. 4

    Calculate percentage when all values are in the same column of a dataframe in R

  5. 5

    Pandas - Merge DataFrame to Series when all column values are the same.

  6. 6

    Add column from one dataframe to another, for values present in overlapping column

  7. 7

    Groupby given percentiles of the values of the chosen DataFrame column

  8. 8

    UPDATE all values of a column with a single value, if that value is present, for all users

  9. 9

    Select all rows where all array values are present in another column

  10. 10

    How to filter a dataframe based on the values present in the list in the rows of a column in Python?

  11. 11

    Extract column values of Dataframe as List in Apache Spark

  12. 12

    Spark DataFrame aggregate column values by key into List

  13. 13

    Count empty values in dataframe column in Spark (Scala)

  14. 14

    Replacing whitespace in all column names in spark Dataframe

  15. 15

    SQL Access, Sum one column's values only when ALL values in another column in are in specified range

  16. 16

    Create a new column in a dataframe, based on Groupby and values in a separate column

  17. 17

    Spark - calculating sum of all the values in a column

  18. 18

    SQL UNION over all possible values for a column?

  19. 19

    Dynamodb is it possible to query all values in a column

  20. 20

    Dynamodb is it possible to query all values in a column

  21. 21

    Spark add column to dataframe when reading csv

  22. 22

    How to get all the column names & their types, including ENUM (and it's possible values)

  23. 23

    How to parse all the values in a column of a DataFrame?

  24. 24

    Check if all values in dataframe column are the same

  25. 25

    Count all the NA values in one column of a dataframe

  26. 26

    Flatting a dataframe with all values of a column into one

  27. 27

    Check if all values in dataframe column are the same

  28. 28

    How to edit all values of a column in a pandas dataframe?

  29. 29

    Taking away all previous values in a column in dataframe

HotTag

Archive