Implementing a 2D, FFT-based Kernel Density Estimator in python, and comparing it to the SciPy implimentation

lnmaurer

I need code to do 2D Kernel Density Estimation (KDE), and I've found the SciPy implementation is too slow. So, I've written an FFT based implementation, but several things confuse me. (The FFT implementation also enforces periodic boundary conditions, which is what I want.)

The implementation is based on creating a simple histogram from the samples and then convolving this with a gaussian. Here's code to do this and compare it with the SciPy result.

from numpy import *
from scipy.stats import *
from numpy.fft import *
from matplotlib.pyplot import *
from time import clock

ion()

#PARAMETERS
N   = 512   #number of histogram bins; want 2^n for maximum FFT speed?
nSamp   = 1000  #number of samples if using the ranom variable
h   = 0.1   #width of gaussian
wh  = 1.0   #width and height of square domain

#VARIABLES FROM PARAMETERS
rv  = uniform(loc=-wh,scale=2*wh)   #random variable that can generate samples
xyBnds  = linspace(-1.0, 1.0, N+1)  #boundaries of histogram bins
xy  = (xyBnds[1:] + xyBnds[:-1])/2      #centers of histogram bins
xx, yy = meshgrid(xy,xy)

#DEFINE SAMPLES, TWO OPTIONS
#samples = rv.rvs(size=(nSamp,2))
samples = array([[0.5,0.5],[0.2,0.5],[0.2,0.2]])

#DEFINITIONS FOR FFT IMPLEMENTATION
ker = exp(-(xx**2 + yy**2)/2/h**2)/h/sqrt(2*pi) #Gaussian kernel
fKer = fft2(ker) #DFT of kernel

#FFT IMPLEMENTATION
stime = clock()
#generate normalized histogram. Note sure why .T is needed:
hst = histogram2d(samples[:,0], samples[:,1], bins=xyBnds)[0].T / (xy[-1] - xy[0])**2
#convolve histogram with kernel. Not sure why fftshift is neeed:
KDE1 = fftshift(ifft2(fft2(hst)*fKer))/N
etime = clock()
print "FFT method time:", etime - stime

#DEFINITIONS FOR NON-FFT IMPLEMTATION FROM SCIPY
#points to sample the KDE at, in a form gaussian_kde likes:
grid_coords = append(xx.reshape(-1,1),yy.reshape(-1,1),axis=1)

#NON-FFT IMPLEMTATION FROM SCIPY
stime = clock()
KDEfn = gaussian_kde(samples.T, bw_method=h)
KDE2 = KDEfn(grid_coords.T).reshape((N,N))
etime = clock()
print "SciPy time:", etime - stime

#PLOT FFT IMPLEMENTATION RESULTS
fig = figure()
ax = fig.add_subplot(111, aspect='equal')
c = contour(xy, xy, KDE1.real)
clabel(c)
title("FFT Implementation Results")

#PRINT SCIPY IMPLEMENTATION RESULTS
fig = figure()
ax = fig.add_subplot(111, aspect='equal')
c = contour(xy, xy, KDE2)
clabel(c)
title("SciPy Implementation Results")

There are two sets of samples above. The 1000 random points is for benchmarking and is commented out; the three points are for debugging.

The resulting plots for the latter case are at the end of this post.

Here are my questions:

  • Can I avoid the .T for the histogram and the fftshift for KDE1? I'm not sure why they're needed, but the gaussians show up in the wrong places without them.
  • How is the scalar bandwidth defined for SciPy? The gaussians have much different widths in the two implementations.
  • Along the same lines, why are the gaussians in the SciPy implementation not radially symmetric even though I gave gaussian_kde a scalar bandwidth?
  • How could I implement the other bandwidth methods available in SciPy for the FFT code?

(Let me note that the FFT code is ~390x fast than the SciPy code in the 1000 random points case.)

Three point debugging KDE using FFT. Three point debugging KDE using SciPy

Joe Kington

The differences you're seeing are due to the bandwidth and scaling factors, as you've already noticed.

By default, gaussian_kde chooses the bandwidth using Scott's rule. Dig into the code, if you're curious about the details. The code snippets below are from something I wrote quite awhile ago to do something similar to what you're doing. (If I remember right, there's an obvious error in that particular version and it really shouldn't use scipy.signal for the convolution, but the bandwidth estimation and normalization are correct.)

# Calculate the covariance matrix (in pixel coords)
cov = np.cov(xyi)

# Scaling factor for bandwidth
scotts_factor = np.power(n, -1.0 / 6) # For 2D

#---- Make the gaussian kernel -------------------------------------------

# First, determine how big the gridded kernel needs to be (2 stdev radius) 
# (do we need to convolve with a 5x5 array or a 100x100 array?)
std_devs = np.diag(np.sqrt(cov))
kern_nx, kern_ny = np.round(scotts_factor * 2 * np.pi * std_devs)

# Determine the bandwidth to use for the gaussian kernel
inv_cov = np.linalg.inv(cov * scotts_factor**2) 

After the convolution, the grid is then normalized:

# Normalization factor to divide result by so that units are in the same
# units as scipy.stats.kde.gaussian_kde's output.  (Sums to 1 over infinity)
norm_factor = 2 * np.pi * cov * scotts_factor**2
norm_factor = np.linalg.det(norm_factor)
norm_factor = n * dx * dy * np.sqrt(norm_factor)

# Normalize the result
grid /= norm_factor

Hopefully that helps clarify things a touch.

As for your other questions:

Can I avoid the .T for the histogram and the fftshift for KDE1? I'm not sure why they're needed, but the gaussians show up in the wrong places without them.

I could be misreading your code, but I think you just have the transpose because you're going from point coordinates to index coordinates (i.e. from <x, y> to <y, x>).

Along the same lines, why are the gaussians in the SciPy implementation not radially symmetric even though I gave gaussian_kde a scalar bandwidth?

This is because scipy uses the full covariance matrix of the input x, y points to determine the gaussian kernel. Your formula assumes that x and y aren't correlated. gaussian_kde tests for and uses the correlation between x and y in the result.

How could I implement the other bandwidth methods available in SciPy for the FFT code?

I'll leave that one for you to figure out. :) It's not too hard, though. Basically, instead of scotts_factor, you'd change the formula and have some other scalar factor. Everything else is the same.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Implementing a different Kernel for 2D Kernel Density Estimation in R

From Dev

Plotting 2D Kernel Density Estimation with Python

From Dev

Scipy multidimensional kernel density estimate

From Dev

2D Kernel Density Estimate in Matlab

From Dev

Kernel Density Estimator ( with Gauss Kernel ) Sum f(x) = 1?

From Dev

2D kernel density e. in python - x axis crowded and shrinked

From Dev

2D kernel density e. in python - x axis crowded and shrinked

From Dev

Multivariate kernel density estimation in Python

From Dev

Python - SciPy Kernal Estimation Example - Density >> 1

From Dev

How to estimate lambdas of poisson distributed samples in R and to draw Kernel estimation of the density function of the estimator basing on that?

From Dev

Weighted Gaussian kernel density estimation in `python`

From Dev

how to estimate kernel density estimation of a 2D GMM and its gradients?

From Dev

Python 2D Autocorrelation fft - subtract mean or no?

From Dev

Optimize computation time for PDF approximation based on Kernel Density Estimation

From Dev

Comparing two scipy.sparse matrices in python

From Dev

Comparing two scipy.sparse matrices in python

From Dev

Python issue comparing decimated then upsampled 2D arrays

From Dev

Python issue comparing decimated then upsampled 2D arrays

From Dev

Python Gaussian Kernel density calculate score for new values

From Dev

lower bound to kernel density estimation with seaborn for matplotlib in python

From Dev

Can I get first derivative for kernel density estimation in python?

From Dev

lower bound to kernel density estimation with seaborn for matplotlib in python

From Dev

Kernel density estimation julia

From Dev

Python/Matplotlib: 2d random walk with kde joint density contour in a 3d plot

From Dev

Python/Matplotlib: 2d random walk with kde joint density contour in a 3d plot

From Dev

Multiple kernel density estimations in one d3.js chart

From Dev

5-D Kernel density estimation in R using “kde” function

From Dev

Maximum Likelihood Estimator for a Gamma density in R

From Dev

1D FFT plus kernel calculation with managedCUDA

Related Related

  1. 1

    Implementing a different Kernel for 2D Kernel Density Estimation in R

  2. 2

    Plotting 2D Kernel Density Estimation with Python

  3. 3

    Scipy multidimensional kernel density estimate

  4. 4

    2D Kernel Density Estimate in Matlab

  5. 5

    Kernel Density Estimator ( with Gauss Kernel ) Sum f(x) = 1?

  6. 6

    2D kernel density e. in python - x axis crowded and shrinked

  7. 7

    2D kernel density e. in python - x axis crowded and shrinked

  8. 8

    Multivariate kernel density estimation in Python

  9. 9

    Python - SciPy Kernal Estimation Example - Density >> 1

  10. 10

    How to estimate lambdas of poisson distributed samples in R and to draw Kernel estimation of the density function of the estimator basing on that?

  11. 11

    Weighted Gaussian kernel density estimation in `python`

  12. 12

    how to estimate kernel density estimation of a 2D GMM and its gradients?

  13. 13

    Python 2D Autocorrelation fft - subtract mean or no?

  14. 14

    Optimize computation time for PDF approximation based on Kernel Density Estimation

  15. 15

    Comparing two scipy.sparse matrices in python

  16. 16

    Comparing two scipy.sparse matrices in python

  17. 17

    Python issue comparing decimated then upsampled 2D arrays

  18. 18

    Python issue comparing decimated then upsampled 2D arrays

  19. 19

    Python Gaussian Kernel density calculate score for new values

  20. 20

    lower bound to kernel density estimation with seaborn for matplotlib in python

  21. 21

    Can I get first derivative for kernel density estimation in python?

  22. 22

    lower bound to kernel density estimation with seaborn for matplotlib in python

  23. 23

    Kernel density estimation julia

  24. 24

    Python/Matplotlib: 2d random walk with kde joint density contour in a 3d plot

  25. 25

    Python/Matplotlib: 2d random walk with kde joint density contour in a 3d plot

  26. 26

    Multiple kernel density estimations in one d3.js chart

  27. 27

    5-D Kernel density estimation in R using “kde” function

  28. 28

    Maximum Likelihood Estimator for a Gamma density in R

  29. 29

    1D FFT plus kernel calculation with managedCUDA

HotTag

Archive