I have the following dataframe (df2)
+----------------+---------+-----+------+-----+
| Colours| Model |year |type |count|
+----------------+---------+-----+------+-----|
|red,green,white |Mitsubishi|2006|sedan |3 |
|gray,silver |Mazda |2010 |SUV |2 |
+----------------+---------+-----+------+-----+
I need to explode the column "Colours", so it looks an expanded column like this:
+----------------+---------+-----+------+
| Colours| Model |year |type |
+----------------+---------+-----+------+
|red |Mitsubishi|2006|sedan |
|green |Mitsubishi|2006|sedan |
|white |Mitsubishi|2006|sedan |
|gray |Mazda |2010 |SUV |
|silver |Mazda |2010 |SUV |
+----------------+---------+-----+------+
I have created an array
val colrs=df2.select("Colours").collect.map(_.getString(0))
and added the array to dataframe
val cars=df2.withColumn("c",explode($"colrs")).select("Colours","Model","year","type")
but it didn't work, any help please.
You can use split and explode functions as below in your dataframe
(df2)
import org.apache.spark.sql.functions._
val cars = df2.withColumn("Colours", explode(split(col("Colours"), ","))).select("Colours","Model","year","type")
You will have output as
cars.show(false)
+-------+----------+----+-----+
|Colours|Model |year|type |
+-------+----------+----+-----+
|red |Mitsubishi|2006|sedan|
|green |Mitsubishi|2006|sedan|
|white |Mitsubishi|2006|sedan|
|gray |Mazda |2010|SUV |
|silver |Mazda |2010|SUV |
+-------+----------+----+-----+
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments