我正在尝试编写一个进行时间序列分析的Web应用程序。
我用Python编写了一个函数,以获取属于给定datetime范围(使用Python的datetime.datetime类)的数据集的子集(Python dict)。
在我的Web应用程序中,我正在进行一次计算,该函数调用此函数数百次,以对大约10,000点的数据集进行选择。这大约需要25秒,虽然可以,但是并不理想。
我已经附上了在一些示例数据上运行我的方法的示例。是否有更好的方法以更好的性能实现相同的结果?人们也对使用更好的框架的建议表示赞赏(例如,最好是使用numpy
数组来完成此操作,还是完全放弃Python?)。
该功能输出经过的时间。
from datetime import datetime
from datetime import timedelta
data_dict = {'times':[], 'data':[]}
#Generate sample data
start_datetime = datetime(2014, 8, 23, 15, 17, 17, 392943)
for i in range(10000):
data_dict['times'].append(start_datetime+timedelta(minutes = 5*i))
data_dict['data'].append(i)
startTime = datetime.now()
def data_select(data_dict, time_range):
start = 0
end = 1
for x in data_dict['times']:
if x - time_range[0] < timedelta(seconds = 0):
start += 1
if x - time_range[1] <= timedelta(seconds = 0):
end += 1
data_dict['times'] = list(data_dict['times'][start:end])
data_dict['data'] = list(data_dict['data'][start:end])
return data_dict
#Example function call
data_sub_dict = data_select(data_dict, [datetime(2014, 8, 30, 0, 0, 0, 0), datetime(2014, 9, 5, 0, 0, 0, 0)])
print "Time elapsed: " + str((datetime.now() - startTime))
由于数据已排序,因此您可以作弊并使用非常有用的bisect模块。而不是做整个数据列表的线性搜索,它会检查的中间值,则需要向左或向右半-一个很多更少的比较。如果输出数据正确,bisect
则10k数据点的速度大约要快800倍。
import bisect
from datetime import datetime
from datetime import timedelta
data_dict = {'times':[], 'data':[]}
#Generate sample data
start_datetime = datetime(2014, 8, 23, 15, 17, 17, 392943)
for i in range(10000):
data_dict['times'].append(start_datetime+timedelta(minutes = 5*i))
data_dict['data'].append(i)
startTime = datetime.now()
def data_select_search(data_dict, time_range):
start = 0
end = 1
times = data_dict['times']
for x in times:
if x - time_range[0] < timedelta(seconds = 0):
start += 1
if x - time_range[1] <= timedelta(seconds = 0):
end += 1
# print 'search:',start,end
data_dict['times'] = list(data_dict['times'][start:end])
data_dict['data'] = list(data_dict['data'][start:end])
return data_dict
def data_select_bisect(data_dict, time_range):
times = data_dict['times']
start = bisect.bisect_left(times, time_range[0])
end = bisect.bisect_right(times, time_range[1], lo=start) + 1
# print 'bisect:',start,end
return dict(
times=data_dict['times'][start:end],
data=data_dict['data'][start:end],
)
drange = [
datetime(2014, 8, 30, 0, 0, 0, 0),
datetime(2014, 9, 5, 0, 0, 0, 0)
]
data_sub_dict = data_select_search(data_dict.copy(), drange)
_dict2 = data_select_bisect(data_dict.copy(), drange)
import timeit
mysetup = "from __main__ import data_select_bisect, data_select_search, data_dict, drange"
num = 100
print('search:', timeit.timeit(
"data_select_search(data_dict.copy(), drange)",
setup=mysetup,
number=num
))
print('bisect:', timeit.timeit(
"data_select_bisect(data_dict.copy(), drange)",
setup=mysetup,
number=num,
))
('search:', 1.2735650539398193)
('bisect:', 0.0015599727630615234)
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句