用其他值替换零序列

ppasler 发表于 Dev

帕斯勒

我有一个很大的数据集（> 200k），我试图用一个值替换零序列。零序列包含两个以上的零是伪像，应通过将其设置为np.NAN来删除。

我已经阅读过在NumPy数组中搜索序列，但是它不完全符合我的要求，因为我没有静态模式。

np.array([0, 1.0, 0, 0, -6.0, 13.0, 0, 0, 0, 1.0, 16.0, 0, 0, 0, 0, 1.0, 1.0, 1.0, 1.0])
# should be converted to this
np.array([0, 1.0, 0, 0, -6.0, 13.0, NaN, NaN, NaN, 1.0, 16.0, NaN, NaN, NaN, NaN, 1.0, 1.0, 1.0, 1.0])

如果您需要更多信息，请告诉我。提前致谢！

结果：

感谢您的回答，这是我的（非专业）测试结果，运行于288240点

divakar took 0.016000ms to replace 87912 points
desiato took 0.076000ms to replace 87912 points
polarise took 0.102000ms to replace 87912 points

由于@Divakar的解决方案是最短和最快的解决方案，因此我接受他的解决方案。

迪卡卡（Divakar）

好吧，这基本上是binary closing operation对缩小差距的门槛要求。这是基于此的实现-

# Pad with ones so as to make binary closing work around the boundaries too
a_extm = np.hstack((True,a!=0,True))

# Perform binary closing and look for the ones that have not changed indiicating
# the gaps in those cases were above the threshold requirement for closing
mask = a_extm == binary_closing(a_extm,structure=np.ones(3))

# Out of those avoid the 1s from the original array and set rest as NaNs
out = np.where(~a_extm[1:-1] & mask[1:-1],np.nan,a)

一种避免按需在早期方法中附加以处理边界元素的方法，这可能会使其在处理大型数据集时有点昂贵，就像这样-

# Create binary closed mask
mask = ~binary_closing(a!=0,structure=np.ones(3))
idx = np.where(a)[0]
mask[:idx[0]] = idx[0]>=3
mask[idx[-1]+1:] = a.size - idx[-1] -1 >=3

# Use the mask to set NaNs in a
out = np.where(mask,np.nan,a)

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。