Search

Search

Faster way to transform group with mean value in Pandas

YXD Published at Dev

21

YXD

I have a Pandas dataframe where I am trying to replace the values in each group by the mean of the group. On my machine, the line df["signal"].groupby(g).transform(np.mean) takes about 10 seconds to run with N and N_TRANSITIONS set to the numbers below.

Is there any faster way to achieve the same result?

import pandas as pd
import numpy as np
from time import time

np.random.seed(0)

N = 120000
N_TRANSITIONS = 1400

# generate groups
transition_points = np.random.permutation(np.arange(N))[:N_TRANSITIONS]
transition_points.sort()
transitions = np.zeros((N,), dtype=np.bool)
transitions[transition_points] = True
g = transitions.cumsum()

df = pd.DataFrame({ "signal" : np.random.rand(N)})

# here is my bottleneck for large N
tic = time()
result = df["signal"].groupby(g).transform(np.mean)
toc = time()
print toc - tic

YXD

Inspired by Jeff's answer. This is the fastest method on my machine:

pd.Series(np.repeat(grp.mean().values, grp.count().values))

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-02-6

0

Comments

0 comments

Login to comment

Related

From Dev

Faster way to transform group with mean value in Pandas

From Java

Faster way of computing the mean with pandas groupy + apply and condensing groups

From Dev

Pandas - Faster way to find same value in different columns in CSV file?

From Dev

Plot with pandas: group and mean

From Dev

Pandas transform columns into percentage by group

From Dev

Group by column and get mean of the the group pandas

From Dev

Faster way to read Excel files to pandas dataframe

From Dev

Faster way to rank rows in subgroups in pandas dataframe

From Dev

Strange Behavior With Pandas Group By - Transform On String Columns

From Dev

Pandas - group by column and transform the data to numpy array

From Dev

Strange Behavior With Pandas Group By - Transform On String Columns

From Dev

NumPy - Faster way to implement threshold value ceiling

From Dev

Faster way to find the first TRUE value in a vector

From Dev

Faster way to find the next greatest value in array

From Dev

The efficient way to transform pandas dataframe into new format

From Dev

Pandas: Sorting columns by their mean value

From Dev

value counts of group by in pandas

From Dev

value counts of group by in pandas

From Dev

Select one group and transform the remaining group to columns in pandas

From Dev

Transform each value in a list the same way

From Dev

Python pandas dataframe group mean filtered by condition

From Dev

Pandas: Group by, filter rows, get the mean

From Java

Group pandas dataframe in unusual way

From Dev

Pythonic way to group by a pandas table

From Dev

Need to transform file faster

From Dev

Pandas groupby transform to get not null date value

From Dev

pandas dataframe: is there any way to transform columns as row values in pandas

From Dev

calculate mean by group by avoiding first value in the group in R

From Dev

Group By a specific value on a column for faster execution time - SQL

Related Related

Article

HotTag

Archive