Python 3.x 如何保留一组平均值较小的重复项？

debugcn 发表于 Dev

用户8884453

嗨，因为我是 Python 新手，一位朋友推荐我在 stackoverflow 上寻求帮助，所以我决定试一试。我目前使用的是 python 3.x 版。

我在没有列标题的 csv 文件中设置了超过 100k 的数据，我已将数据加载到 pandas 中DataFrame。由于文件是机密的，我无法在此处显示数据，但这是可以定义如下的数据和列的示例

("id", "name", "number", "time", "text_id", "text", "text")

1 | apple | 12 | 123 | 2 | abc | abc

1 | apple | 12 | 222 | 2 | abc | abc

2 | orange | 32 | 123 | 2 | abc | abc

2 | orange | 11 | 123 | 2 | abc | abc

3 | apple | 12 | 333 | 2 | abc | abc

3 | apple | 12 | 443 | 2 | abc | abc

3 | apple | 12 | 553 | 2 | abc | abc

正如您从该name列中看到的，我有 2 个重复的“apple”集群，但具有不同的 ID。

所以我的问题是：如何删除基于“时间”具有更高平均值的整个集群（行）？

示例：if (cluster with ID: 1).mean(time) < (cluster with ID: 3).mean(time) 然后删除集群中 ID 为 3 的所有行

期望的输出：

1 | 苹果| 12 | 123 | 2 | ABC | 美国广播公司

1 | 苹果| 12 | 第222话 2 | ABC | 美国广播公司

2 | 橙色| 32 | 123 | 2 | ABC | 美国广播公司

2 | 橙色| 11 | 123 | 2 | ABC | 美国广播公司

我需要很多帮助和任何我能得到的帮助，我的时间不多了，提前致谢！

约瑟夫 K。

你需要的是这些东西：

请尝试以下操作：

import pandas as pd

df = pd.read_csv('filename.csv', header=None)
df.columns = ['id', 'name', 'number', 'time', 'text_id', 'text', 'text']

print(df)

for eachname in df.name.unique():
    eachname_df = df.loc[df['name'] == eachname]
    grouped_df = eachname_df.groupby(['id', 'name'])
    avg_name = grouped_df['time'].mean()

    for a, b in grouped_df:
        if b['time'].mean() != avg_name.min():
            indextodrop = b.index.get_values()
            for eachindex in indextodrop:
                df = df.drop([eachindex])

print(df)


Result:
   id    name  number  time  text_id text text
0   1   apple      12   123        2  abc  abc
1   1   apple      12   222        2  abc  abc
2   2  orange      32   123        2  abc  abc
3   2  orange      11   123        2  abc  abc
4   3   apple      12   333        2  abc  abc
5   3   apple      12   443        2  abc  abc
6   3   apple      12   553        2  abc  abc

   id    name  number  time  text_id text text
0   1   apple      12   123        2  abc  abc
1   1   apple      12   222        2  abc  abc
2   2  orange      32   123        2  abc  abc
3   2  orange      11   123        2  abc  abc

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。