Expand Column to many rows in Scala

debugcn Published at Dev

Mohd Zoubi

I have the following dataframe (df2)

+----------------+---------+-----+------+-----+
|         Colours| Model   |year |type  |count|
+----------------+---------+-----+------+-----|
|red,green,white |Mitsubishi|2006|sedan |3    |
|gray,silver     |Mazda    |2010 |SUV   |2    |
+----------------+---------+-----+------+-----+

I need to explode the column "Colours", so it looks an expanded column like this:

+----------------+---------+-----+------+
|         Colours| Model   |year |type  |
+----------------+---------+-----+------+
|red             |Mitsubishi|2006|sedan |
|green           |Mitsubishi|2006|sedan |
|white           |Mitsubishi|2006|sedan |
|gray            |Mazda    |2010 |SUV   |
|silver          |Mazda    |2010 |SUV   |
+----------------+---------+-----+------+

I have created an array

val colrs=df2.select("Colours").collect.map(_.getString(0))

and added the array to dataframe

val cars=df2.withColumn("c",explode($"colrs")).select("Colours","Model","year","type")

but it didn't work, any help please.

Ramesh Maharjan

You can use split and explode functions as below in your dataframe (df2)

import org.apache.spark.sql.functions._
val cars = df2.withColumn("Colours", explode(split(col("Colours"), ","))).select("Colours","Model","year","type")

You will have output as

cars.show(false)

+-------+----------+----+-----+
|Colours|Model     |year|type |
+-------+----------+----+-----+
|red    |Mitsubishi|2006|sedan|
|green  |Mitsubishi|2006|sedan|
|white  |Mitsubishi|2006|sedan|
|gray   |Mazda     |2010|SUV  |
|silver |Mazda     |2010|SUV  |
+-------+----------+----+-----+

Collected from the Internet

Please contact [email protected] to delete if infringement.