这是来自以下链接的扩展问题
我一直在使用以下解决方案
# Your input data.
tuples = [(2,3), (3,6), (1,2)]
lists = [[1,2,3,4],[2,3,4,5],[2,3],[4,5,6]]
# Convert to sets just once, rather than repeatedly
# within the nested for-loops.
subsets = {t : set(t) for t in tuples}
mainsets = [set(xs) for xs in lists]
# Same as your algorithm, but written differently.
tallies = {
tup : sum(s.issubset(m) for m in mainsets)
for tup, s in subsets.items()
}
print(tallies)
它适用于给定的解决方案,但是当我lists size = 541909
和tuples size = 3363671
它需要很多时间时。它一直在运行30 minutes
,我还没有得到输出。每个列表/元组中的元素将按升序排列,我准备更改这些元素的数据结构。执行此操作的最快方法是什么?
通过使用collections.defaultdict
构建字典,我看到了一些性能改进:
from collections import defaultdict
# Your input data.
tuples = [(i, i+1) for i in range(1000)]
lists = [[1,2,3,4],[2,3,4,5],[2,3],[4,5,6]] * 1000
def original(tuples, lists):
subsets = {t : set(t) for t in tuples}
mainsets = [set(xs) for xs in lists]
return { tup : sum(s.issubset(m) for m in mainsets) for tup, s in subsets.items() }
def jp(tuples, lists):
subsets = list(map(frozenset, tuples))
mainsets = list(map(set, lists))
d = defaultdict(int)
for item in mainsets:
for sub in subsets:
if sub.issubset(item):
d[sub] += 1
return d
%timeit original(tuples, lists) # 707 ms per loop
%timeit jp(tuples, lists) # 431 ms per loop
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句