I'm new to pandas and decided to learn it by playing around with some data I pulled from my favorite game's API. I have a dataframe with two columns "playerId" and "winner" like so:
playerStatus:
______________________
playerId winner
0 1848 True
1 1988 False
2 3543 True
3 1848 False
4 1988 False
...
Each row represents a match the player participated in. My goal is to either transform this dataframe or create a new one such that the win percentage for each playerId is calculated. For example, the above dataframe would become:
playerWinsAndTotals
_________________________________________
playerId wins totalPlayed winPct
0 1848 1 2 50.0000
1 1988 0 2 0.0000
2 3543 1 1 100.0000
...
It took quite a while of reading pandas docs, but I actually managed to achieve this by essentially creating two different tables (one to find the number of wins for each player, one to find the total games for each player), and merging them, then taking the ratio of wins to games played.
Creating the "wins" dataframe:
temp_df = playerStatus[['playerId', 'winner']].value_counts().reset_index(name='wins')
onlyWins = temp_df[temp_df['winner'] == True][['playerId', 'wins']]
onlyWins
_________________________
playerId wins
1 1670 483
3 1748 474
4 2179 468
6 4006 434
8 1668 392
...
Creating the "totals" dataframe:
totalPlayed = playerStatus['playerId'].value_counts().reset_index(name='totalCount').rename(columns={'index': 'playerId'})
totalPlayed
____________________
playerId totalCount
0 1670 961
1 1748 919
2 1872 877
3 4006 839
4 2179 837
...
Finally, merging them and adding the "winPct" column.
playerWinsAndTotals = onlyWins.merge(totalPlayed, on='playerId', how='left')
playerWinsAndTotals['winPct'] = playerWinsAndTotals['wins']/playerWinsAndTotals['totalCount'] * 100
playerWinsAndTotals
_____________________________________________
playerId wins totalCount winPct
0 1670 483 961 50.260146
1 1748 474 919 51.577802
2 2179 468 837 55.913978
3 4006 434 839 51.728248
4 1668 392 712 55.056180
...
Now, the reason I am posting this here is because I know I'm not taking full advantage of what pandas has to offer. Creating and merging two different dataframes just to find the ratio of player wins seems unnecessary. I feel like I took the "scenic" route on this one.
To anyone more experienced than me, how would you tackle this problem?
We can take advantage of the way that Boolean values are handled mathematically (True
being 1
and False
being 0
) and use 3 aggregation functions sum
, count
and mean
per group (groupby aggregate
). We can also take advantage of Named Aggregation to both create and rename the columns in one step:
df = (
df.groupby('playerId', as_index=False)
.agg(wins=('winner', 'sum'),
totalCount=('winner', 'count'),
winPct=('winner', 'mean'))
)
# Scale up winPct
df['winPct'] *= 100
df
:
playerId wins totalCount winPct
0 1848 1 2 50.0
1 1988 0 2 0.0
2 3543 1 1 100.0
DataFrame and imports:
import pandas as pd
df = pd.DataFrame({
'playerId': [1848, 1988, 3543, 1848, 1988],
'winner': [True, False, True, False, False]
})
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments