我们从测量多个零件的设备获得数据,并且将每个零件的多个测量结果输出到CSV文件中。我们将CSV文件读入具有以下结构的数据框:
PartNo 12
Meas1 45
Meas2 23
!END
PartNo 13
Meas1 63
Meas2 73
!END
PartNo 12
Meas1 82
Meas2 84
!END
“!END”标志指示来自某一部分的数据在哪里结束,而下一部分开始。我们想重塑数据,使其看起来像:
PartNo Meas1 Meas2
12 45 23
13 63 73
12 82 84
(请注意,一个部分可能会出现多次-因此,没有一个字段在所有记录中都保证是唯一的。)
枢轴产生:
0 !END Meas1 Meas2 PartNo
0 NaN NaN NaN 12.0
1 NaN 45.0 NaN NaN
2 NaN NaN 23.0 NaN
3 NaN NaN NaN NaN
4 NaN NaN NaN 13.0
5 NaN 63.0 NaN NaN
6 NaN NaN 73.0 NaN
7 NaN NaN NaN NaN
8 NaN NaN NaN 12.0
9 NaN 82.0 NaN NaN
10 NaN NaN 84.0 NaN
11 NaN NaN NaN NaN
如何将这些行压缩为按零件号分组?
转置产生:
0 1 2 3 4 5 6 7 8 9 10 11
0 PartNo Meas1 Meas2 !END PartNo Meas1 Meas2 !END PartNo Meas1 Meas2 !END
1 12 45 23 NaN 13 63 73 NaN 12 82 84 NaN
如何每第4个项目重置行?
我可以在原始数据帧中创建一个新的索引列,然后遍历各行,使用!END递增每行的索引(然后使用索引对数据进行分组),但是似乎应该更加优雅形状转换函数来处理这种情况,或者Pivot或Transpose中有一个参数可以处理这种情况。我是Python初学者。这是完整的代码:
import pandas as pd
from io import StringIO
tdata = (
'PartNo, 12\n'
'Meas1, 45\n'
'Meas2, 23\n'
'!END\n'
'PartNo, 13\n'
'Meas1, 63\n'
'Meas2, 73\n'
'!END\n'
'PartNo, 12\n'
'Meas1, 82\n'
'Meas2, 84\n'
'!END\n')
tdf = pd.read_csv(StringIO(tdata), header=None)
print(tdf)
print(tdf.pivot(index=None, columns=0, values=1))
print(tdf.T)
#having dataframe x:
>>> x = pd.DataFrame([['PartNo',12],['Meas1',45],['Meas2',23],['!END',''],['PartNo',13],['Meas1',63],['Meas2',73],['!END',''],['PartNo',12],['Meas1',82],['Meas2',84],['!END','']])
>>> x
0 1
0 PartNo 12
1 Meas1 45
2 Meas2 23
3 !END
4 PartNo 13
5 Meas1 63
6 Meas2 73
7 !END
8 PartNo 12
9 Meas1 82
10 Meas2 84
11 !END
#grouping by first column, and aggregating values to list. First column then contains Series that you want. By converting each list in this series to series, dataframe is created, then you just need to transpose
>>> df = x.groupby(0).agg(lambda x: list(x))[1].apply(lambda x: pd.Series(x)).transpose()
>>> df[['PartNo','Meas1','Meas2']]
0 PartNo Meas1 Meas2
0 12 45 23
1 13 63 73
2 12 82 84
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句