数据:
a= [{"content": 1, "time": 1577870427}, {"content": 4, "time": 1577870427},
{"content": 2, "time": 1577956827},
{"content": 3, "time": 1580548827}, {"content": 5, "time": 1580635227},
{"content": 6, "time": 1583054427}, {"content": 7, "time": 1583140827}]
我希望内容超过5
最终数据
[{"content": 6, "time": 1583054427}, {"content": 7, "time": 1583140827}]
我的代码
index = pd.to_datetime([i['time'] for i in a], unit='s')
df = pd.Series(a,index)
df.gt(5)
但引发错误
问题是在你的系列都在词典数据,因此在大熊猫真的不容易处理,也正是在循环(可能ONYapply
或list comprehension
或for
)。
index = pd.to_datetime([i['time'] for i in a], unit='s')
df = pd.Series(a,index)
print (df.head().apply(type))
2020-01-01 09:20:27 <class 'dict'>
2020-01-01 09:20:27 <class 'dict'>
2020-01-02 09:20:27 <class 'dict'>
2020-02-01 09:20:27 <class 'dict'>
2020-02-02 09:20:27 <class 'dict'>
dtype: object
如果要过滤,可以通过标量提取content
到Series
然后进行比较:
print (df[df.str.get('content').gt(5)])
2020-03-01 09:20:27 {'content': 6, 'time': 1583054427}
2020-03-02 09:20:27 {'content': 7, 'time': 1583140827}
dtype: object
运作方式:
print (df.str.get('content'))
2020-01-01 09:20:27 1
2020-01-01 09:20:27 4
2020-01-02 09:20:27 2
2020-02-01 09:20:27 3
2020-02-02 09:20:27 5
2020-03-01 09:20:27 6
2020-03-02 09:20:27 7
dtype: int64
print (df.str.get('content').gt(5))
2020-01-01 09:20:27 False
2020-01-01 09:20:27 False
2020-01-02 09:20:27 False
2020-02-01 09:20:27 False
2020-02-02 09:20:27 False
2020-03-01 09:20:27 True
2020-03-02 09:20:27 True
dtype: bool
如果要apply
使用自定义功能处理数据:
def f(x):
x['time'] = pd.to_datetime(x['time'], unit='s')
return x
df = df.apply(f)
print (df)
2020-01-01 09:20:27 {'content': 1, 'time': 2020-01-01 09:20:27}
2020-01-01 09:20:27 {'content': 4, 'time': 2020-01-01 09:20:27}
2020-01-02 09:20:27 {'content': 2, 'time': 2020-01-02 09:20:27}
2020-02-01 09:20:27 {'content': 3, 'time': 2020-02-01 09:20:27}
2020-02-02 09:20:27 {'content': 5, 'time': 2020-02-02 09:20:27}
2020-03-01 09:20:27 {'content': 6, 'time': 2020-03-01 09:20:27}
2020-03-02 09:20:27 {'content': 7, 'time': 2020-03-02 09:20:27}
dtype: object
这样更好DataFrame
:
df = pd.DataFrame(a)
print (df)
content time
0 1 1577870427
1 4 1577870427
2 2 1577956827
3 3 1580548827
4 5 1580635227
5 6 1583054427
6 7 1583140827
然后很容易处理,例如比较,因为标量:
print (df['content'].gt(5))
0 False
1 False
2 False
3 False
4 False
5 True
6 True
Name: content, dtype: bool
df['time'] = pd.to_datetime(df['time'], unit='s')
print (df)
content time
0 1 2020-01-01 09:20:27
1 4 2020-01-01 09:20:27
2 2 2020-01-02 09:20:27
3 3 2020-02-01 09:20:27
4 5 2020-02-02 09:20:27
5 6 2020-03-01 09:20:27
6 7 2020-03-02 09:20:27
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句