For example, if I have the following dataframe
val tempDF=Seq(("a",2),("b",1),("a",3)).toDF("letter","value")
scala> tempDF.show()
+------+-----+
|letter|value|
+------+-----+
| a| 2|
| b| 1|
| a| 3|
+------+-----+
and I want to perform a groupBy
operation on the column letter
but knowing that there could be another letter c
not present in the column letter
. Normally I would have
tempDF.groupBy("letter").sum()
scala> tempDF.groupBy("letter").sum().show()
+------+----------+
|letter|sum(value)|
+------+----------+
| a| 5|
| b| 1|
+------+----------+
but I would like something like this:
+------+----------+
|letter|sum(value)|
+------+----------+
| a| 5|
| b| 1|
| c| 0|
+------+----------+
Is it possible to do this without somehow adding the letter c
to the dataframe? What I mean is that I could have many dataframes in a list but I don't know which letters are missing (if any) for each dataframe, instead I do know the whole list of letters that should appear for each one.
If you already know all possible values, create a separate (universal) DataSet with 'value' as 0. Then join it with any tempDF to add missing letters. Then do groupBy on final dataset.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments