使用PyArray_SimpleNewFromData（）创建数组并返回时，Python扩展中的内存泄漏

Benkevitch 发表于 Dev

本凯维奇

我编写了一个简单的Python扩展模块来模拟3位模数转换器。应该接受一个浮点数组作为其输入，以返回相同大小的输出数组。输出实际上由量化的输入数字组成。这是我的（简化的）模块：

static PyObject *adc3(PyObject *self, PyObject *args) {
  PyArrayObject *inArray = NULL, *outArray = NULL;
  double *pinp = NULL, *pout = NULL;
  npy_intp nelem;
  int dims[1], i, j;

  /* Get arguments:  */
  if (!PyArg_ParseTuple(args, "O:adc3", &inArray))
    return NULL;

  nelem = PyArray_DIM(inArray,0); /* size of the input array */
  pout = (double *) malloc(nelem*sizeof(double));
  pinp = (double *) PyArray_DATA(inArray);

  /*   ADC action   */
  for (i = 0; i < nelem; i++) {
    if (pinp[i] >= -0.5) {
    if      (pinp[i] < 0.5)   pout[i] = 0;
    else if (pinp[i] < 1.5)   pout[i] = 1;
    else if (pinp[i] < 2.5)   pout[i] = 2;
    else if (pinp[i] < 3.5)   pout[i] = 3;
    else                      pout[i] = 4;
    }
    else {
    if      (pinp[i] >= -1.5) pout[i] = -1;
    else if (pinp[i] >= -2.5) pout[i] = -2;
    else if (pinp[i] >= -3.5) pout[i] = -3;
    else                      pout[i] = -4;
    }
  }

  dims[0] = nelem;

  outArray = (PyArrayObject *)
               PyArray_SimpleNewFromData(1, dims, NPY_DOUBLE, pout);
  //Py_INCREF(outArray);

  return PyArray_Return(outArray); 
} 

/* ==== methods table ====================== */
static PyMethodDef mwa_methods[] = {
  {"adc", adc, METH_VARARGS, "n-bit Analog-to-Digital Converter (ADC)"},
  {NULL, NULL, 0, NULL}
};

/* ==== Initialize ====================== */
PyMODINIT_FUNC initmwa()  {
    Py_InitModule("mwa", mwa_methods);
    import_array();  // for NumPy
}

我希望，如果正确处理了引用计数，则Python垃圾回收会（经常）释放具有相同名称并重复使用的输出数组使用的内存。因此，我使用以下代码在一些虚拟（但大量）数据上对其进行了测试：

for i in xrange(200): 
    a = rand(1000000)
    b = mwa.adc3(a)
    print i

在这里，名为“ b”的数组被重用了许多次，并且adc3（）从堆中借来的内存预计将返回给系统。我使用了gnome-system-monitor进行检查。与我的预期相反，python拥有的内存增长迅速，并且只能通过退出程序来释放（我使用IPython）。为了进行比较，我尝试使用标准的NumPy函数，zeros（）和copy（）进行相同的过程：

for i in xrange(1000): 
    a = np.zeros(10000000)
    b = np.copy(a)
    print i

如您所见，后面的代码不会增加任何内存。我在标准文档和网络上阅读了许多文本，试图使用Py_INCREF（outArray）而不使用它。全部徒劳：问题依然存在。

但是，我在http://wiki.scipy.org/Cookbook/C_Extensions/NumPy_arrays中找到了解决方案。作者提供了一个扩展程序matsq（），该程序创建一个数组并将其返回。当我尝试使用作者建议的电话时：

outArray = (PyArrayObject *) PyArray_FromDims(nd,dims,NPY_DOUBLE);
pout = (double *) outArray->data;

代替我的

pout = (double *) malloc(nelem*sizeof(double));
outArray = (PyArrayObject *)
            PyArray_SimpleNewFromData(1, dims, NPY_DOUBLE, pout);
/* no matter with or without Py_INCREF(outArray)) */

内存泄漏不见了！该程序现在可以正常运行。

一个问题：有人能解释为什么PyArray_SimpleNewFromData（）没有提供正确的引用计数，而PyArray_FromDims（）却提供了正确的计数吗？

非常感谢你。

加成。我的评论可能超出了房间/时间，因此我在此处添加了对Alex的评论。我试图以这种方式设置OWNDATA标志：