用固定的某些参数拟合双峰高斯分布

ascripter 发表于 Dev

编剧

问题：我想将经验数据拟合为双峰正态分布，从物理上下文中我可以从中得知峰的距离（固定），并且两个峰必须具有相同的标准偏差。

我试图使用创建自己的发行版scipy.stats.rv_continous（请参见下面的代码），但是参数始终适合1。有人了解发生了什么事吗，还是可以为我指出解决问题的另一种方法？

详细信息：我避免了loc和scale参数，并实现他们作为m和s直接进入_pdf-方法，因为峰的距离delta不应受到影响scale。为了弥补这一点，我固定他们floc=0和fscale=1在fit-方法，实际上要装修参数m，s以及这些高峰的权重w

我期望样本数据中的峰分布在x=-450和x=450（=> m=0）附近。stdevs应该为100或200左右，但不能为1.0，并且权重w应该为大约。0.5

from __future__ import division
from scipy.stats import rv_continuous
import numpy as np


class norm2_gen(rv_continuous):
    def _argcheck(self, *args):
        return True

    def _pdf(self, x, m, s, w, delta):
        return np.exp(-(x-m+delta/2)**2 / (2. * s**2)) / np.sqrt(2. * np.pi * s**2) * w + \
               np.exp(-(x-m-delta/2)**2 / (2. * s**2)) / np.sqrt(2. * np.pi * s**2) * (1 - w)


norm2 = norm2_gen(name='norm2')

data = [487.0, -325.5, -159.0, 326.5, 538.0, 552.0, 563.0, -156.0, 545.5, 341.0, 530.0, -156.0, 473.0, 328.0, -319.5, -287.0, -294.5, 153.5, -512.0, 386.0, -129.0, -432.5, -382.0, -346.5, 349.0, 391.0, 299.0, 364.0, -283.0, 562.5, -42.0, 214.0, -389.0, 42.5, 259.5, -302.5, 330.5, -338.0, 508.5, 319.5, -356.5, 421.5, 543.0]

m, s, w, delta, loc, scale = norm2.fit(data, fdelta=900, floc=0, fscale=1)
print m, s, w, delta, loc, scale

>>> 1.0 1.0 1.0 900 0 1

沃伦·韦克瑟（Warren Weckesser）

经过几次调整，我能够使您的分布适合数据：

照常使用w，您将具有一个隐式约束，即0 <= w<=1。该fit()方法使用的求解器不知道此约束，因此w可能会得出不合理的值。处理这种约束的一种方法是允许w它是任意的实数值，但是在PDF的公式中，请使用将其转换w为phi0到1之间的一个小数phi = 0.5 + arctan(w)/pi。
通用fit()方法使用数值优化例程来找到最大似然估计。像大多数此类例程一样，它需要进行优化的起点。默认起点是全1，但这并不总是能正常工作。您可以通过fit()在数据之后提供值作为位置参数来选择其他起点。我在脚本中使用的值有效；我没有探讨结果对这些初始值有多敏感。

我做了两个估计。在第一个中，我将delta其设置为一个自由参数，在第二个中，将其固定delta为900。

下面的脚本生成以下图：

这是脚本：

from __future__ import division
from scipy.stats import rv_continuous
import numpy as np
import matplotlib.pyplot as plt


class norm2_gen(rv_continuous):
    def _argcheck(self, *args):
        return True

    def _pdf(self, x, m, s, w, delta):
        phi = 0.5 + np.arctan(w)/np.pi
        return np.exp(-(x-m+delta/2)**2 / (2. * s**2)) / np.sqrt(2. * np.pi * s**2) * phi + \
               np.exp(-(x-m-delta/2)**2 / (2. * s**2)) / np.sqrt(2. * np.pi * s**2) * (1 - phi)

norm2 = norm2_gen(name='norm2')


data = [487.0, -325.5, -159.0, 326.5, 538.0, 552.0, 563.0, -156.0, 545.5,
        341.0, 530.0, -156.0, 473.0, 328.0, -319.5, -287.0, -294.5, 153.5,
        -512.0, 386.0, -129.0, -432.5, -382.0, -346.5, 349.0, 391.0, 299.0,
        364.0, -283.0, 562.5, -42.0, 214.0, -389.0, 42.5, 259.5, -302.5,
        330.5, -338.0, 508.5, 319.5, -356.5, 421.5, 543.0]

# In the fit method, the positional arguments after data are the initial
# guesses that are passed to the optimization routine that computes the MLE.
# First let's see what we get if delta is not fixed.
m, s, w, delta, loc, scale = norm2.fit(data, 1.0, 1.0, 0.0, 900.0, floc=0, fscale=1)

# Fit the disribution with delta fixed.
fdelta = 900
m1, s1, w1, delta1, loc, scale = norm2.fit(data, 1.0, 1.0, 0.0, fdelta=fdelta, floc=0, fscale=1)

plt.hist(data, bins=12, normed=True, color='c', alpha=0.65)
q = np.linspace(-800, 800, 1000)
p = norm2.pdf(q, m, s, w, delta)
p1 = norm2.pdf(q, m1, s1, w1, fdelta)
plt.plot(q, p, 'k', linewidth=2.5, label='delta=%6.2f (fit)' % delta)
plt.plot(q, p1, 'k--', linewidth=2.5, label='delta=%6.2f (fixed)' % fdelta)
plt.legend(loc='best')
plt.show()