我想计算多列中某些字符串的出现并在新列中返回总计数
所以我知道我可以使用value_counts来计算给定列中值的总出现次数:
data['col'].value_counts(dropna=False)
结果:
[["win" TKO technical knockout] 336
[["win" UD unanimous decision] 307
[["win" KO knockout] 225
[["loss" UD unanimous decision] 97
[["loss" TKO technical knockout] 64
[["win" nan null] 53
[["draw" MD majority decision] 43
[["loss" KO knockout] 41
[["loss" MD majority decision] 35
[["loss" nan null] 32
[["loss" SD split decision] 29
[["unknown" nan null] 29
[["win" SD split decision] 27
[["draw" PTS null] 18
[["win" RTD corner retirement] 17
[["draw" SD split decision] 12
[["loss" RTD corner retirement] 11
[["win" MD majority decision] 9
[["loss" DQ disqualification] 6
[["win" PTS null] 6
[["unknown" NC null] 3
问题是我想例如计算每个相关列中的[[“ win” KO淘汰赛]的出现(相关列是col1到col20)。
这是我的数据样本:
{'col1': {0: ['["win" UD unanimous decision'],
1: ['["win" UD unanimous decision'],
2: ['["win" TKO technical knockout'],
3: ['["win" UD unanimous decision'],
4: ['["win" UD unanimous decision']},
'col2': {0: ['["win" TKO technical knockout'],
1: ['["win" TKO technical knockout'],
2: ['["win" TKO technical knockout'],
3: ['["win" UD unanimous decision'],
4: ['["win" UD unanimous decision']},
'col3': {0: ['["win" TKO technical knockout'],
1: ['["win" KO knockout'],
2: ['["win" TKO technical knockout'],
3: ['["win" TKO technical knockout'],
4: ['["win" UD unanimous decision']},
'col4': {0: ['["win" UD unanimous decision'],
1: ['["win" UD unanimous decision'],
2: ['["win" KO knockout'],
3: ['["win" TKO technical knockout'],
4: ['["win" UD unanimous decision']}}
在这种情况下,所需的输出为:
win UD win TKO win KO
0 2 2 0
1 2 1 1
2 0 3 1
3 2 2 0
4 4 0 0
更新:
我也尝试过使用size和groupby:
#list of column names
col_outcome = ['col'+str(i) for i in range(1,11)]
data.groupby(col_outcome).size()
但是,这将返回以下错误消息:
TypeError:无法散列的类型:“列表”
IIUC,让我们重塑“广”数据帧,以“长”有stack
则做一个小小的数据串清理,然后extract
和replace
使用正则表达式,下一个groupby
和apply
value_count
,最后使用unstack
重塑的结果:
df.stack().str[0].str.replace('\[|\"','')\
.str.extract('(\w+\s\w+)')\
.groupby(level=0)[0].apply(pd.Series.value_counts).unstack(fill_value=0)
输出:
win KO win TKO win UD
0 0 2 2
1 1 1 2
2 1 3 0
3 0 2 2
4 0 0 4
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句