熊猫series.gt怎么用

debugcn 发表于 Dev

xin.chen

数据：

a=  [{"content": 1, "time": 1577870427}, {"content": 4, "time": 1577870427},
     {"content": 2, "time": 1577956827},
     {"content": 3, "time": 1580548827}, {"content": 5, "time": 1580635227},
     {"content": 6, "time": 1583054427}, {"content": 7, "time": 1583140827}]

我希望内容超过5

最终数据

[{"content": 6, "time": 1583054427}, {"content": 7, "time": 1583140827}]

我的代码

index = pd.to_datetime([i['time'] for i in a], unit='s')
df = pd.Series(a,index)
df.gt(5)

但引发错误

耶斯列尔

问题是在你的系列都在词典数据，因此在大熊猫真的不容易处理，也正是在循环（可能ONYapply或list comprehension或for）。

index = pd.to_datetime([i['time'] for i in a], unit='s')
df = pd.Series(a,index)
print (df.head().apply(type))
2020-01-01 09:20:27    <class 'dict'>
2020-01-01 09:20:27    <class 'dict'>
2020-01-02 09:20:27    <class 'dict'>
2020-02-01 09:20:27    <class 'dict'>
2020-02-02 09:20:27    <class 'dict'>
dtype: object

如果要过滤，可以通过标量提取content到Series然后进行比较：

print (df[df.str.get('content').gt(5)])
2020-03-01 09:20:27    {'content': 6, 'time': 1583054427}
2020-03-02 09:20:27    {'content': 7, 'time': 1583140827}
dtype: object

运作方式：

print (df.str.get('content'))
2020-01-01 09:20:27    1
2020-01-01 09:20:27    4
2020-01-02 09:20:27    2
2020-02-01 09:20:27    3
2020-02-02 09:20:27    5
2020-03-01 09:20:27    6
2020-03-02 09:20:27    7
dtype: int64

print (df.str.get('content').gt(5))
2020-01-01 09:20:27    False
2020-01-01 09:20:27    False
2020-01-02 09:20:27    False
2020-02-01 09:20:27    False
2020-02-02 09:20:27    False
2020-03-01 09:20:27     True
2020-03-02 09:20:27     True
dtype: bool

如果要apply使用自定义功能处理数据：

def f(x):
    x['time'] = pd.to_datetime(x['time'], unit='s')
    return x

df = df.apply(f)
print (df)
2020-01-01 09:20:27    {'content': 1, 'time': 2020-01-01 09:20:27}
2020-01-01 09:20:27    {'content': 4, 'time': 2020-01-01 09:20:27}
2020-01-02 09:20:27    {'content': 2, 'time': 2020-01-02 09:20:27}
2020-02-01 09:20:27    {'content': 3, 'time': 2020-02-01 09:20:27}
2020-02-02 09:20:27    {'content': 5, 'time': 2020-02-02 09:20:27}
2020-03-01 09:20:27    {'content': 6, 'time': 2020-03-01 09:20:27}
2020-03-02 09:20:27    {'content': 7, 'time': 2020-03-02 09:20:27}
dtype: object

这样更好DataFrame：

df = pd.DataFrame(a)
print (df)
   content        time
0        1  1577870427
1        4  1577870427
2        2  1577956827
3        3  1580548827
4        5  1580635227
5        6  1583054427
6        7  1583140827

然后很容易处理，例如比较，因为标量：

print (df['content'].gt(5))
0    False
1    False
2    False
3    False
4    False
5     True
6     True
Name: content, dtype: bool

df['time'] = pd.to_datetime(df['time'], unit='s')
print (df)
   content                time
0        1 2020-01-01 09:20:27
1        4 2020-01-01 09:20:27
2        2 2020-01-02 09:20:27
3        3 2020-02-01 09:20:27
4        5 2020-02-02 09:20:27
5        6 2020-03-01 09:20:27
6        7 2020-03-02 09:20:27

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。