I have a pandas DataFrame
that looks like the following:
bus_uid bus_type type obj_uid \
0 biomass: DEB31 biomass output Simple_139804698384200
0 biomass: DEB31 biomass other duals
0 biomass: DEB31 biomass other excess
datetime \
0 DatetimeIndex(['2015-01-01 00:00:00', '2015-01-01 01:00:00', '2015-01-01 02:00:00', ...
0 DatetimeIndex(['2015-01-01 00:00:00', '2015-01-01 01:00:00', '2015-01-01 02:00:00', ...
0 DatetimeIndex(['2015-01-01 00:00:00', '2015-01-01 01:00:00', '2015-01-01 02:00:00', ...
values
0 [1.0, 2.0, 3.0, ...
0 [4.0, 5.0, 6.0, ...
0 [7.0, 8.0, 9.0, ...
And want to convert it into the following format:
bus_uid bus_type type obj_uid datetime values
0 biomass: DEB31 biomass output Simple_139804698384200 2015-01-01 00:00:00 1.0
0 biomass: DEB31 biomass output Simple_139804698384200 2015-01-01 01:00:00 2.0
0 biomass: DEB31 biomass output Simple_139804698384200 2015-01-01 02:00:00 3.0
0 biomass: DEB31 biomass other duals 2015-01-01 00:00:00 4.0
0 biomass: DEB31 biomass other duals 2015-01-01 01:00:00 5.0
0 biomass: DEB31 biomass other duals 2015-01-01 02:00:00 6.0
0 biomass: DEB31 biomass other excess 2015-01-01 00:00:00 7.0
0 biomass: DEB31 biomass other excess 2015-01-01 01:00:00 8.0
0 biomass: DEB31 biomass other excess 2015-01-01 02:00:00 9.0
The columns datetime
and values
have the same dimension.
I have already asked a similar question here but couldn't manage to apply the solution for my problem with two columns.
What's the best way to convert the DataFrame
into the required format?
You could iterate through the rows to extract the Index
and Series
info from the cells. I don't think that reshaping
methods work well when you need to extract info at the same time:
Sample data:
rows = 3
df = pd.DataFrame(data={'bus_uid': list(repeat('biomass: DEB31', rows)), 'type': list(repeat('biomass', 3)), 'id': ['id1', 'id2', 'id3'], 'datetime': list(repeat(pd.DatetimeIndex(start=datetime(2016,1,1), periods=3, freq='D'), rows)), 'values': list(repeat([1,2,3], rows))})
bus_uid datetime id \
0 biomass: DEB31 DatetimeIndex(['2016-01-01', '2016-01-02', '20... id1
1 biomass: DEB31 DatetimeIndex(['2016-01-01', '2016-01-02', '20... id2
2 biomass: DEB31 DatetimeIndex(['2016-01-01', '2016-01-02', '20... id3
type values
0 biomass [1, 2, 3]
1 biomass [1, 2, 3]
2 biomass [1, 2, 3]
Build new DataFrame
as you iterate through the DataFrame
rows
:
new_df = pd.DataFrame()
for index, cols in df.iterrows():
extract_df = pd.DataFrame.from_dict({'datetime': cols.ix['datetime'], 'values': cols.ix['values']})
extract_df = pd.concat([extract_df, cols.drop(['datetime', 'values']).to_frame().T], axis=1).fillna(method='ffill').fillna(method='bfill')
new_df = pd.concat([new_df, extract_df], ignore_index=True)
to get:
datetime values bus_uid id type
0 2016-01-01 1 biomass: DEB31 id1 biomass
1 2016-01-02 2 biomass: DEB31 id1 biomass
2 2016-01-03 3 biomass: DEB31 id1 biomass
3 2016-01-01 1 biomass: DEB31 id2 biomass
4 2016-01-02 2 biomass: DEB31 id2 biomass
5 2016-01-03 3 biomass: DEB31 id2 biomass
6 2016-01-01 1 biomass: DEB31 id3 biomass
7 2016-01-02 2 biomass: DEB31 id3 biomass
8 2016-01-03 3 biomass: DEB31 id3 biomass
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments