I have 2 column dataframe likes this:
ITEM REFNUMS
1 00000299 0036701923024762922029229294652954429569295832...
2 00000655 NaN
24 00001791 00016027123076000158004563065131972
25 00001805 00016027123076000158004563065131972
26 00001813 00016027123076000158004563065131972
27 00001821 00016027123076000158004563065131972
28 00001937 0142530521316303164702509000510012201310027820...
I would like to split the REFNUMS
columns into divisible parts and add onto the existing dataframe if possible as I need to retain the row index and matching ITEM #. The data in REFNUMS
is a length divisible by 5
, when not NaN
, so for example Row 1 is = 78 sets of 5.
data_len = (data['REFNUMS'].str.len())/5
Then
0 NaN
1 78.0
2 NaN
Appreciate any suggestions on how to do this.
IIUC, you can use str.extractall
to get the groups of 5 digits, clean up the columns, and then join:
In [168]: r = df.REFNUMS.str.extractall("(\d{1,5})").unstack()
In [169]: r.columns = r.columns.droplevel(0)
In [170]: df.join(r)
Out[170]:
ITEM REFNUMS 0 1 2 3 4 5 6 7 8 9
1 299 0036701923024762922029229294652954429569295832... 00367 01923 02476 29220 29229 29465 29544 29569 29583 2
2 655 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
24 1791 00016027123076000158004563065131972 00016 02712 30760 00158 00456 30651 31972 None None None
25 1805 00016027123076000158004563065131972 00016 02712 30760 00158 00456 30651 31972 None None None
26 1813 00016027123076000158004563065131972 00016 02712 30760 00158 00456 30651 31972 None None None
27 1821 00016027123076000158004563065131972 00016 02712 30760 00158 00456 30651 31972 None None None
28 1937 0142530521316303164702509000510012201310027820... 01425 30521 31630 31647 02509 00051 00122 01310 02782 0
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments