2D bin (x,y) and calculate mean of values (c) of 10 deepest data points (z)

Coen

For a data set consisting of:

  • coordinates x, y
  • depth z
  • a certain value c

I would like to do the following more efficient:

  1. bin the data set in 2D bins based on the coordinates (x, y)
  2. take the 10 deepest data points (z) per bin
  3. calculate the mean value of c of these 10 data points per bin

Finally show a 2d heatmap with the calculated mean values.

I have found a working solution, but this takes too long for small bins and/or large data sets.

Is there a more efficient way of achieving the same result?

Current working example

Example dataframe:

import numpy as np
from numpy.random import rand
import pandas as pd
import math
import matplotlib.pyplot as plt

n = 10000
df = pd.DataFrame({'x':rand(n), 'y':rand(n), 'z':rand(n), 'c':rand(n)})

Bin the data set:

cell_size = 0.01

nx = math.ceil((max(df['x']) - min(df['x'])) / cell_size)
ny = math.ceil((max(df['y']) - min(df['y'])) / cell_size)

x_range = np.arange(0, nx)
y_range = np.arange(0, ny)

df['xbin'], x_edges = pd.cut(x=df['x'], bins=nx, labels=x_range, retbins=True)
df['ybin'], y_edges = pd.cut(x=df['y'], bins=ny, labels=y_range, retbins=True)

Code that now takes to long:

df = df.groupby(['xbin', 'ybin']).apply(
    lambda d: d.sort_values('z').head(10).mean())

Update an empty DataFrame for the bins without data and show result:

index = pd.MultiIndex.from_product([x_range, y_range],
    names=['xbin', 'ybin'])

tot_df = pd.DataFrame(index=index, columns=['z', 'c'])
tot_df.update(df)

zval = tot_df['c'].astype('float').values
zval = zval.reshape((nx, ny))
zval = zval.T
zval = np.flipud(zval)

extent = [min(x_edges), max(x_edges), min(y_edges), max(y_edges)]

plt.matshow(zval, aspect='auto', extent=extent)
plt.show()
Dev Khadka

you can use np.searchsorted to bin the rows by x and y and then use groupby to take 10 deep values and calculate means. As groupby will maintains the order in each group you can sort values before applying bins. groupby will perform better without apply

df = pd.DataFrame({'x':rand(n), 'y':rand(n), 'z':rand(n), 'c':rand(n)})

df = df.sort_values("z", ascending=False)
bins = np.linspace(0, 1, 11)
df["bin_x"] = np.searchsorted(bins, df['x'].values) - 1
df["bin_y"] = np.searchsorted(bins, df['y'].values) - 1

result = df.groupby(["bin_x", "bin_y"]).head(10)
result.groupby(["bin_x", "bin_y"])["c"].mean()

Result

bin_x  bin_y
0      0        0.369531
       1        0.601803
       2        0.554452
       3        0.575464
       4        0.455198
                  ...   
9      5        0.469838
       6        0.420772
       7        0.367549
       8        0.379200
       9        0.523083
Name: c, Length: 100, dtype: float64

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

python - map x, y, z values to 2D surface data

From Dev

Plot 2D array of (x,y,z) points in 3D space (Matplotlib)

From Dev

2D plot of x,y,z points in Python without Matplotlib

From Dev

SQL - Fit data points in X, Y, Z box

From Dev

generate fence plot from series of (x, y, z) data points

From Dev

Finding a row in 2d Numpy array with values x,y,z in

From Dev

How to create a 3D X,Y,Z array from 2D faces so that contiguity between points is preserved

From Dev

How to calculate statistic values over 2D DataFrame bin wise for column ranges defined via IntervalIndex?

From Dev

Access a 2D array[x][y] as a 1D array[z] in C

From Dev

Mean and Standard Deviation of x>=5 of 10000 data points binomial(10, 1/4)

From Dev

template functions for points with x,y,z and points with x(), y(), z()

From Dev

Python : 2d contour plot with fixed x and y for 6 series of fractional data (z)

From Dev

Plot a 2D Colormap/Heatmap in matplotlib with x y z data from a pandas dataframe

From Dev

Generate dummy values if the Column data (X,Y,Z) is absent for a date

From Dev

Return the X, Y and Z values of reprojectImageTo3D method in C# using emgu cv

From Dev

Calculate mean() of Nympy 2D-array grouped by values in a separate list with strings corresponding to each row in the 2D array

From Dev

How to calculate the mean of points each of which is of d-dimension

From Dev

Calculate mean of calculated values

From Dev

Calculate mean difference values

From Dev

Calculate the mean of values in a loop

From Dev

Smooth 2D interpolation map using Z values (1 column) at known X and Y coordinates (1 column each)

From Dev

How to convert x y z points into x y z spheres

From Dev

How to make a 3D plot (X, Y, Z), assigning Z values to X,Y ordered pairs?

From Dev

How to find grid neighbours (x, y as integers) group them and calculate mean of their values in spark

From Dev

Subset / group by pandas Data Frame to calculate mean and apply to missing values

From Dev

How to calculate Mean of specific values in each row of a data frame?

From Dev

Is there a way to calculate the Z score for all values in a row in a data frame?

From Dev

Merge 2 arrays example [a,b,c] [x,y,z] = [a,x,b,y,c,z]

From Dev

Interpolate unstructured X,Y,Z data on best grid based on nearest neighbour distance for each points

Related Related

  1. 1

    python - map x, y, z values to 2D surface data

  2. 2

    Plot 2D array of (x,y,z) points in 3D space (Matplotlib)

  3. 3

    2D plot of x,y,z points in Python without Matplotlib

  4. 4

    SQL - Fit data points in X, Y, Z box

  5. 5

    generate fence plot from series of (x, y, z) data points

  6. 6

    Finding a row in 2d Numpy array with values x,y,z in

  7. 7

    How to create a 3D X,Y,Z array from 2D faces so that contiguity between points is preserved

  8. 8

    How to calculate statistic values over 2D DataFrame bin wise for column ranges defined via IntervalIndex?

  9. 9

    Access a 2D array[x][y] as a 1D array[z] in C

  10. 10

    Mean and Standard Deviation of x>=5 of 10000 data points binomial(10, 1/4)

  11. 11

    template functions for points with x,y,z and points with x(), y(), z()

  12. 12

    Python : 2d contour plot with fixed x and y for 6 series of fractional data (z)

  13. 13

    Plot a 2D Colormap/Heatmap in matplotlib with x y z data from a pandas dataframe

  14. 14

    Generate dummy values if the Column data (X,Y,Z) is absent for a date

  15. 15

    Return the X, Y and Z values of reprojectImageTo3D method in C# using emgu cv

  16. 16

    Calculate mean() of Nympy 2D-array grouped by values in a separate list with strings corresponding to each row in the 2D array

  17. 17

    How to calculate the mean of points each of which is of d-dimension

  18. 18

    Calculate mean of calculated values

  19. 19

    Calculate mean difference values

  20. 20

    Calculate the mean of values in a loop

  21. 21

    Smooth 2D interpolation map using Z values (1 column) at known X and Y coordinates (1 column each)

  22. 22

    How to convert x y z points into x y z spheres

  23. 23

    How to make a 3D plot (X, Y, Z), assigning Z values to X,Y ordered pairs?

  24. 24

    How to find grid neighbours (x, y as integers) group them and calculate mean of their values in spark

  25. 25

    Subset / group by pandas Data Frame to calculate mean and apply to missing values

  26. 26

    How to calculate Mean of specific values in each row of a data frame?

  27. 27

    Is there a way to calculate the Z score for all values in a row in a data frame?

  28. 28

    Merge 2 arrays example [a,b,c] [x,y,z] = [a,x,b,y,c,z]

  29. 29

    Interpolate unstructured X,Y,Z data on best grid based on nearest neighbour distance for each points

HotTag

Archive