How to convert efficiently a dataframe column of string type into datetime in Python?

sanyi14ka

I have a column with IDs and the time is encoded within. For example:

0    020160910223200_T1
1    020160910223200_T1
2    020160910223203_T1
3    020160910223203_T1
4    020160910223206_T1
5    020160910223206_T1
6    020160910223209_T1
7    020160910223209_T1
8    020160910223213_T1
9    020160910223213_T1

If we remove the first and the last three characters, we obtain for the first row: 20160910223200 which should be converted to 2016-09-10 22:32:00.

My solution was to write a function which truncates the IDs and transforms to a datetime. Then, I applied this function to my df column.

from datetime import datetime
def MeasureIDtoTime(MeasureID):
    MeasureID = str(MeasureID)
    MeasureID = MeasureID[1:14]
    Time = datetime.strptime(MeasureID, '%Y%m%d%H%M%S')
    return Time
df['Time'] = df['MeasureID'].apply(MeasureIDtoTime)

This works properly, however is slow for my case. I have to deal with more than 20 million rows, and I need a faster solution. Any idea for a more efficient solution?

Update

According to @MaxU there is a better solution:

pd.to_datetime(df.ID.str[1:-3], format = '%Y%m%d%H%M%S')

This does the job in 32 seconds for 7.2 million rows. However, in R thanks to lubridate::ymd_hms() function, I performed the task in less then 2 seconds. So I am wondering if there exists a better solution for my problem in Python.

MaxU

UPDATE: performance optimization...

Let's try to optimize it a little bit

DF shape: 50.000 x 1

In [220]: df.head()
Out[220]:
                   ID
0  020160910223200_T1
1  020160910223200_T1
2  020160910223203_T1
3  020160910223203_T1
4  020160910223206_T1

In [221]: df.shape
Out[221]: (50000, 1)

In [222]: len(df)
Out[222]: 50000

Timing:

In [223]: %timeit df['ID'].apply(MeasureIDtoTime)
1 loop, best of 3: 929 ms per loop

In [224]: %timeit pd.to_datetime(df.ID.str[1:-3])
1 loop, best of 3: 5.68 s per loop

In [225]: %timeit pd.to_datetime(df.ID.str[1:-3], format='%Y%m%d%H%M%S')
1 loop, best of 3: 267 ms per loop    ### WINNER !

Conclusion: explicitly specifying the datetime format speeds it up 21 times.

NOTE: it's possible only if you have a constant datetime format

OLD answer:

In [81]: pd.to_datetime(df.ID.str[1:-3])
Out[81]:
0   2016-09-10 22:32:00
1   2016-09-10 22:32:00
2   2016-09-10 22:32:03
3   2016-09-10 22:32:03
4   2016-09-10 22:32:06
5   2016-09-10 22:32:06
6   2016-09-10 22:32:09
7   2016-09-10 22:32:09
8   2016-09-10 22:32:13
9   2016-09-10 22:32:13
Name: ID, dtype: datetime64[ns]

where df is:

In [82]: df
Out[82]:
                   ID
0  020160910223200_T1
1  020160910223200_T1
2  020160910223203_T1
3  020160910223203_T1
4  020160910223206_T1
5  020160910223206_T1
6  020160910223209_T1
7  020160910223209_T1
8  020160910223213_T1
9  020160910223213_T1

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集
0

コメントを追加

0

関連記事

分類Dev

Convert string to datetime - python dataframe

分類Dev

How to convert DateTime to a String

分類Dev

convert python datetime with timezone to string

分類Dev

How to convert a column data type from 'string ' to 'boolean', preserving NaN?

分類Dev

Python: How to convert Pyspark column to date type if there are null values

分類Dev

Querying Python Pandas DataFrame with a Datetime index or column

分類Dev

Python: Converting a seconds to a datetime format in a dataframe column

分類Dev

How do I convert multiple `string` columns in my dataframe to datetime columns?

分類Dev

How to convert a column with null values to datetime format?

分類Dev

How to convert string data with variable format to datetime?

分類Dev

How can I convert a column of hours / minutes with varying digits into datetime objects in python/pandas?

分類Dev

Convert string to float64 in Python DataFrame

分類Dev

How to clean columns & convert to datetime using python

分類Dev

python3: convert string to type

分類Dev

Convert string to datetime to epoch

分類Dev

Filter python Dataframe by comparing string on column values

分類Dev

How to convert a date index to string column in pandas

分類Dev

How to convert to datetime if the format of dates changes gradually through the column?

分類Dev

How to convert binary to string python?

分類Dev

How to convert a string to a Python array?

分類Dev

How to convert an urlopen into a string in python

分類Dev

How to convert any type into String in Julia

分類Dev

How to convert a std::string to L data type

分類Dev

How to convert oracle number type to string with format?

分類Dev

PySpark - List created in dataframe column is of type String instead of Integer

分類Dev

How do I get one row for every Min or Max on every column of a dataframe in Pyspark efficiently?

分類Dev

Python convert date to datetime

分類Dev

Convert the column type from object to date format - python

分類Dev

Convert boolean to string in DataFrame

Related 関連記事

  1. 1

    Convert string to datetime - python dataframe

  2. 2

    How to convert DateTime to a String

  3. 3

    convert python datetime with timezone to string

  4. 4

    How to convert a column data type from 'string ' to 'boolean', preserving NaN?

  5. 5

    Python: How to convert Pyspark column to date type if there are null values

  6. 6

    Querying Python Pandas DataFrame with a Datetime index or column

  7. 7

    Python: Converting a seconds to a datetime format in a dataframe column

  8. 8

    How do I convert multiple `string` columns in my dataframe to datetime columns?

  9. 9

    How to convert a column with null values to datetime format?

  10. 10

    How to convert string data with variable format to datetime?

  11. 11

    How can I convert a column of hours / minutes with varying digits into datetime objects in python/pandas?

  12. 12

    Convert string to float64 in Python DataFrame

  13. 13

    How to clean columns & convert to datetime using python

  14. 14

    python3: convert string to type

  15. 15

    Convert string to datetime to epoch

  16. 16

    Filter python Dataframe by comparing string on column values

  17. 17

    How to convert a date index to string column in pandas

  18. 18

    How to convert to datetime if the format of dates changes gradually through the column?

  19. 19

    How to convert binary to string python?

  20. 20

    How to convert a string to a Python array?

  21. 21

    How to convert an urlopen into a string in python

  22. 22

    How to convert any type into String in Julia

  23. 23

    How to convert a std::string to L data type

  24. 24

    How to convert oracle number type to string with format?

  25. 25

    PySpark - List created in dataframe column is of type String instead of Integer

  26. 26

    How do I get one row for every Min or Max on every column of a dataframe in Pyspark efficiently?

  27. 27

    Python convert date to datetime

  28. 28

    Convert the column type from object to date format - python

  29. 29

    Convert boolean to string in DataFrame

ホットタグ

アーカイブ