Changing the date format of the column values in aSspark dataframe

Hemanth

I am reading a Excel sheet into a Dataframe in Spark 2.0 and then trying to convert some columns with date values in MM/DD/YY format into YYYY-MM-DD format. The values are in string format. Below is the sample:

+---------------+--------------+
|modified       |      created |
+---------------+--------------+
|           null| 12/4/17 13:45|
|        2/20/18|  2/2/18 20:50|
|        3/20/18|  2/2/18 21:10|
|        2/20/18|  2/2/18 21:23|
|        2/28/18|12/12/17 15:42| 
|        1/25/18| 11/9/17 13:10|
|        1/29/18| 12/6/17 10:07| 
+---------------+--------------+

I would like this to be converted to:

+---------------+-----------------+
|modified       |      created    |
+---------------+-----------------+
|           null| 2017-12-04 13:45|
|     2018-02-20| 2018-02-02 20:50|
|     2018-03-20| 2018-02-02 21:10|
|     2018-02-20| 2018-02-02 21:23|
|     2018-02-28| 2017-12-12 15:42| 
|     2018-01-25| 2017-11-09 13:10|
|     2018-01-29| 2017-12-06 10:07| 
+---------------+-----------------+

So I tried doing:

 df.withColumn("modified",date_format(col("modified"),"yyyy-MM-dd"))
   .withColumn("created",to_utc_timestamp(col("created"),"America/New_York"))

But it gives me all NULL values in my result. I am not sure where I am going wrong. I know that to_utc_timestamp on created will convert the whole timestamp into UTC. Ideally I would like to keep the time unchanged and only change the date format. Is there a way to achieve what I am trying to do? and Where am I going wrong?

Any help would be appreciated. Thank you.

Ramesh Maharjan

spark >= 2.2.0

You need addtional to_date and to_timestamp inbuilt functions as

import org.apache.spark.sql.functions._
df.withColumn("modified",date_format(to_date(col("modified"), "MM/dd/yy"), "yyyy-MM-dd"))
  .withColumn("created",to_utc_timestamp(to_timestamp(col("created"), "MM/dd/yy HH:mm"), "UTC"))

and you should have

+----------+-------------------+
|modified  |created            |
+----------+-------------------+
|null      |2017-12-04 13:45:00|
|2018-02-20|2018-02-02 20:50:00|
|2018-03-20|2018-02-02 21:10:00|
|2018-02-20|2018-02-02 21:23:00|
|2018-02-28|2017-12-12 15:42:00|
|2018-01-25|2017-11-09 13:10:00|
|2018-01-29|2017-12-06 10:07:00|
+----------+-------------------+

Use of utc timezone didn't alter the time for me

spark < 2.2.0

import org.apache.spark.sql.functions._
val temp = df.withColumn("modified", from_unixtime(unix_timestamp(col("modified"), "MM/dd/yy"), "yyyy-MM-dd"))
  .withColumn("created", to_utc_timestamp(unix_timestamp(col("created"), "MM/dd/yy HH:mm").cast(TimestampType), "UTC"))

The output dataframe is same as above

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集
0

コメントを追加

0

関連記事

分類Dev

Changing date format javascript

分類Dev

Changing date format in datagridview

分類Dev

Changing date format for validation with database

分類Dev

How to retrieve the month from a date column values in scala dataframe?

分類Dev

Changing Date Type of Pandas Dataframe

分類Dev

Create a date column for dataframe

分類Dev

Changing Date format when saving as PDF

分類Dev

Shift every values of a dataframe without changing columns

分類Dev

Manually create dataframe with date column

分類Dev

How do I use purrr::map with dataframe list to modify column values in specific dataframes without changing other dataframes in list?

分類Dev

update a dataframe column with new values

分類Dev

Splitting a dataframe based on column values

分類Dev

Knex.js changing Date in a column when other column is updated

分類Dev

format a time series as dataframe with julian date

分類Dev

How to edit datetime format on Seaborn heatmap without changing the date?

分類Dev

Changing date format in ASP.NET MVC C#

分類Dev

How to format Vuetify data table date column?

分類Dev

Make date column into standard format using pandas

分類Dev

How to convert this whole column into date format?

分類Dev

How to select specific date format in column

分類Dev

Python: Converting a seconds to a datetime format in a dataframe column

分類Dev

Matching column in dataframe by nearest values in column of other dataframe

分類Dev

How to check values of column in one dataframe available or not in column of another dataframe?

分類Dev

How to convert a column with null values to datetime format?

分類Dev

How to assign a values to dataframe's column by comparing values in another dataframe

分類Dev

How to count changing values across various columns - Pandas Dataframe

分類Dev

See if the values in a column contain % in a pandas dataframe

分類Dev

Rounding down values in Pandas dataframe column with NaNs

分類Dev

conditionally fill all subsequent values of dataframe column

Related 関連記事

  1. 1

    Changing date format javascript

  2. 2

    Changing date format in datagridview

  3. 3

    Changing date format for validation with database

  4. 4

    How to retrieve the month from a date column values in scala dataframe?

  5. 5

    Changing Date Type of Pandas Dataframe

  6. 6

    Create a date column for dataframe

  7. 7

    Changing Date format when saving as PDF

  8. 8

    Shift every values of a dataframe without changing columns

  9. 9

    Manually create dataframe with date column

  10. 10

    How do I use purrr::map with dataframe list to modify column values in specific dataframes without changing other dataframes in list?

  11. 11

    update a dataframe column with new values

  12. 12

    Splitting a dataframe based on column values

  13. 13

    Knex.js changing Date in a column when other column is updated

  14. 14

    format a time series as dataframe with julian date

  15. 15

    How to edit datetime format on Seaborn heatmap without changing the date?

  16. 16

    Changing date format in ASP.NET MVC C#

  17. 17

    How to format Vuetify data table date column?

  18. 18

    Make date column into standard format using pandas

  19. 19

    How to convert this whole column into date format?

  20. 20

    How to select specific date format in column

  21. 21

    Python: Converting a seconds to a datetime format in a dataframe column

  22. 22

    Matching column in dataframe by nearest values in column of other dataframe

  23. 23

    How to check values of column in one dataframe available or not in column of another dataframe?

  24. 24

    How to convert a column with null values to datetime format?

  25. 25

    How to assign a values to dataframe's column by comparing values in another dataframe

  26. 26

    How to count changing values across various columns - Pandas Dataframe

  27. 27

    See if the values in a column contain % in a pandas dataframe

  28. 28

    Rounding down values in Pandas dataframe column with NaNs

  29. 29

    conditionally fill all subsequent values of dataframe column

ホットタグ

アーカイブ