现实生活中的问题:我有一个API,最多只能返回5000个项目。在任何给定时间,我可以通过按价格范围过滤来查看有多少商品。我还可以找到最昂贵的物品。我需要将这些项目分解为大小范围为5000(节点大小)的价格范围,以便我可以收集所有这些项目。
下面的实际示例。
填充列表。
price_list = []
for i in range(0, 100000):
price_list.append(random.uniform(0, 100000))
实现搜索功能。
def get_prange_count(startp, endp):
return len([r for r in price_list if r <= endp and r >= startp])
通过使用get_prange_count函数,将price_list中的所有项目分解为最大5000个节点。
例如
[{"startp": 0, "endp": 3.5, "item_count": 5000},
{"startp": 3.51, "endp": 4.03, "item_count": 5000}]
我知道这是一个算法问题,但是我找不到理想的问题。我研究了二进制搜索,但我没有寻找位置,我研究了匈牙利算法,但这似乎不是分配问题。
预先感谢您的指导,
编辑:
5000 -> 2 for demonstration purposes.
price_list = [0.1, 0.2, 0.4, 1.3, 1.6, 5.0]
node1 = {"startp": 0, "endp": 0.2, "item_count": 2}
node2 = {"startp": 0.4, "endp": 1.3, "item_count": 2}
node3 = {"startp": 1.6, "endp": 5.0, "item_count": 2}
这个Python解决方案仅使用function get_prange_count()
,并且get_max_price()
正如您所说的那样查找每个5k元素的价格范围。好吧,我还需要两个不同价格之间的最小增量价格-我假设它是一个,但是.001可以用于美分。
项目数量偏离5000,这是因为多个项目在箱的边界附近具有相同的价格,而剩下的最后一个箱是因为我不一定会生成5K的倍数的总价格。
import random
price_list_size = random.choice(range(90_000, 100_000))
price_list = random.choices(range(100_000), k=price_list_size)
delta_price = 1 # Minimum difference between any two different prices.
def get_prange_count(startp, endp):
return len([r for r in price_list if startp <= r <= endp])
def get_max_price():
return max(price_list)
def get_5k(mn=0, mx=get_max_price(), num=5_000):
count = get_prange_count(mn, mx)
delta_mx = (mx - mn) / 2
while count != num and delta_mx >= delta_price / 2:
mx += -delta_mx if count > num else +delta_mx
count, delta_mx = get_prange_count(mn, mx), delta_mx / 2
return mx, count
def get_all_5k(mn=0, mx=get_max_price(), num=5_000):
partmax, partcount = get_5k(mn, mx, num)
result = [(mn, partmax, partcount)]
while partmax < mx:
partmin = partmax + delta_price
partmax, partcount = get_5k(partmin, mx, num)
result.append((partmin, partmax, partcount))
return result
if __name__ == '__main__':
print(f"Using {price_list_size} random prices from 0 to {get_max_price()}")
result = get_all_5k()
print(f"Splits into {len(result)} bins of approx 5000 elements")
for mn, mx, count in result:
print(f" From {mn:8.1f} ... {mx:8.1f} with {count} items.")
样本输出:
Using 96625 random prices from 0 to 99997
Splits into 20 bins of approx 5000 elements
From 0.0 ... 5144.3 with 4998 items.
From 5145.3 ... 10266.7 with 5001 items.
From 10267.7 ... 15504.7 with 5000 items.
From 15505.7 ... 20584.7 with 4999 items.
From 20585.7 ... 25701.0 with 4997 items.
From 25702.0 ... 30907.7 with 5000 items.
From 30908.7 ... 36111.7 with 4999 items.
From 36112.7 ... 41304.5 with 5000 items.
From 41305.5 ... 46588.4 with 5000 items.
From 46589.4 ... 51771.6 with 4999 items.
From 51772.6 ... 56861.7 with 5000 items.
From 56862.7 ... 62156.4 with 5001 items.
From 62157.4 ... 67289.8 with 4999 items.
From 67290.8 ... 72393.2 with 5000 items.
From 72394.2 ... 77545.3 with 5000 items.
From 77546.3 ... 82747.2 with 5003 items.
From 82748.2 ... 87995.3 with 5000 items.
From 87996.3 ... 93270.8 with 4999 items.
From 93271.8 ... 98331.3 with 5001 items.
From 98332.3 ... 101660.9 with 1599 items.
如果您真正的get_prange_count()函数调用了一个慢速DB,那么它将对每个bin最大价格值进行许多二进制搜索。
停止按!
我重新阅读了您的问题,它最多显示5k 。如果对其进行以下更改,get_5k
将像以前一样进行二进制搜索,但是如果它超过5k,并且delta_mx
足够小以退出外部的while循环,则新的内部while循环将减小,mx
直到count
<= 5k。
def get_5k(mn=0, mx=get_max_price(), num=5_000):
count = get_prange_count(mn, mx)
delta_mx = (mx - mn) / 2
while count != num and delta_mx >= delta_price / 2:
mx += -delta_mx if count > num else +delta_mx
count, delta_mx = get_prange_count(mn, mx), delta_mx / 2
# Assure count < num
while count > num and delta_mx < delta_price / 2 and mx > mn:
mx -= delta_price
count = get_prange_count(mn, mx)
return mx, count
样本输出:
Using 95007 random prices from 0 to 99999
Splits into 19 bins of approx 5000 elements
From 0.0 ... 5236.8 with 5000 items.
From 5237.8 ... 10564.6 with 5000 items.
From 10565.6 ... 15872.4 with 4998 items.
From 15873.4 ... 21146.7 with 5000 items.
From 21147.7 ... 26417.0 with 4999 items.
From 26418.0 ... 31730.9 with 5000 items.
From 31731.9 ... 36962.1 with 5000 items.
From 36963.1 ... 42165.8 with 4998 items.
From 42166.8 ... 47382.0 with 5000 items.
From 47383.0 ... 52662.6 with 5000 items.
From 52663.6 ... 57884.3 with 5000 items.
From 57885.3 ... 63183.5 with 5000 items.
From 63184.5 ... 68374.5 with 4999 items.
From 68375.5 ... 73688.2 with 5000 items.
From 73689.2 ... 78915.4 with 4998 items.
From 78916.4 ... 84297.7 with 5000 items.
From 84298.7 ... 89515.5 with 4999 items.
From 89516.5 ... 94709.1 with 5000 items.
From 94710.1 ... 105287.2 with 4991 items.
停止按#2!
进一步的测试显示了以前的代码删除项。以下内容不丢弃任何项目,并保持在5k以下
def get_5k(mn=0, mx=get_max_price(), num=5_000):
"Mainly binary search for num items between mn and mx, adjusting mx"
count = get_prange_count(mn, mx)
delta_mx = (mx - mn) / 2
while count != num and delta_mx >= delta_price / 2:
mx += -delta_mx if count > num else +delta_mx
mx = mx // 1 # Floor
count, delta_mx = get_prange_count(mn, mx), delta_mx / 2
# Assure count < num
# while count > num and delta_mx < delta_price / 2 and mx > mn:
# mx -= delta_price
# count = get_prange_count(mn, mx)
return mx, count
将以下检查添加到末尾:
if __name__ == '__main__':
if len(price_list) != sum(count for mn, mx, count in result):
print("\nWhoops! Some items missing:")
p = price_list.copy()
for mn, mx, count in result:
items = _get_prange(mn, mx)
assert len(items) == get_prange_count(mn, mx)
for i in items:
p.remove(i)
p.sort()
print(p)
现在输出如下:
Using 100941 random prices from 0 to 99999
Splits into 21 bins of approx 5000 elements
From 0.0 ... 4918.0 with 4998 items.
From 4919.0 ... 9863.0 with 5000 items.
From 9864.0 ... 14901.0 with 4998 items.
From 14902.0 ... 19708.0 with 4999 items.
From 19709.0 ... 24605.0 with 5000 items.
From 24606.0 ... 29555.0 with 5000 items.
From 29556.0 ... 34539.0 with 5000 items.
From 34540.0 ... 39507.0 with 5000 items.
From 39508.0 ... 44602.0 with 4997 items.
From 44603.0 ... 49462.0 with 5000 items.
From 49463.0 ... 54458.0 with 4999 items.
From 54459.0 ... 59432.0 with 5000 items.
From 59433.0 ... 64315.0 with 5000 items.
From 64316.0 ... 69416.0 with 4998 items.
From 69417.0 ... 74389.0 with 4998 items.
From 74390.0 ... 79253.0 with 4999 items.
From 79254.0 ... 84149.0 with 5000 items.
From 84150.0 ... 89134.0 with 5000 items.
From 89135.0 ... 94059.0 with 5000 items.
From 94060.0 ... 99055.0 with 5000 items.
From 99056.0 ... 100934.0 with 955 items.
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句