将N个元素的列表分解为大小为S的节点

贝诺尼

现实生活中的问题:我有一个API,最多只能返回5000个项目。在任何给定时间,我可以通过按价格范围过滤来查看有多少商品。我还可以找到最昂贵的物品。我需要将这些项目分解为大小范围为5000(节点大小)的价格范围,以便我可以收集所有这些项目。

下面的实际示例。

填充列表。

price_list = []
for i in range(0, 100000):
    price_list.append(random.uniform(0, 100000))

实现搜索功能。

def get_prange_count(startp, endp):
    return len([r for r in price_list if r <= endp and r >= startp])

通过使用get_prange_count函数,将price_list中的所有项目分解为最大5000个节点。

例如

[{"startp": 0, "endp": 3.5, "item_count": 5000}, 
{"startp": 3.51, "endp": 4.03, "item_count": 5000}]

我知道这是一个算法问题,但是我找不到理想的问题。我研究了二进制搜索,但我没有寻找位置,我研究了匈牙利算法,但这似乎不是分配问题。

预先感谢您的指导,

编辑:

5000 -> 2 for demonstration purposes. 

price_list = [0.1, 0.2, 0.4, 1.3, 1.6, 5.0]

node1 = {"startp": 0, "endp": 0.2, "item_count": 2}
node2 = {"startp": 0.4, "endp": 1.3, "item_count": 2}
node3 = {"startp": 1.6, "endp": 5.0, "item_count": 2}
稻田3118

这个Python解决方案仅使用function get_prange_count(),并且get_max_price()正如您所说的那样查找每个5k元素的价格范围。好吧,我还需要两个不同价格之间的最小增量价格-我假设它是一个,但是.001可以用于美分。

项目数量偏离5000,这是因为多个项目在箱的边界附近具有相同的价格,而剩下的最后一个箱是因为我不一定会生成5K的倍数的总价格。

import random

price_list_size = random.choice(range(90_000, 100_000))
price_list = random.choices(range(100_000), k=price_list_size)

delta_price = 1     # Minimum difference between any two different prices.

def get_prange_count(startp, endp):
    return len([r for r in price_list if startp <= r <= endp])

def get_max_price():
    return max(price_list)

def get_5k(mn=0, mx=get_max_price(), num=5_000):
    count = get_prange_count(mn, mx)
    delta_mx = (mx - mn) / 2
    while count != num and delta_mx >= delta_price / 2:
        mx += -delta_mx if count > num else +delta_mx
        count, delta_mx = get_prange_count(mn, mx), delta_mx / 2
    return mx, count

def get_all_5k(mn=0, mx=get_max_price(), num=5_000):
    partmax, partcount = get_5k(mn, mx, num)
    result = [(mn, partmax, partcount)]
    while partmax < mx:
        partmin = partmax + delta_price
        partmax, partcount = get_5k(partmin, mx, num)
        result.append((partmin, partmax, partcount))
    return result

if __name__ == '__main__':
    print(f"Using {price_list_size} random prices from 0 to {get_max_price()}")
    result = get_all_5k()
    print(f"Splits into {len(result)} bins of approx 5000 elements")
    for mn, mx, count in result:
        print(f"  From {mn:8.1f} ... {mx:8.1f} with {count} items.")

样本输出:

Using 96625 random prices from 0 to 99997
Splits into 20 bins of approx 5000 elements
  From      0.0 ...   5144.3 with 4998 items.
  From   5145.3 ...  10266.7 with 5001 items.
  From  10267.7 ...  15504.7 with 5000 items.
  From  15505.7 ...  20584.7 with 4999 items.
  From  20585.7 ...  25701.0 with 4997 items.
  From  25702.0 ...  30907.7 with 5000 items.
  From  30908.7 ...  36111.7 with 4999 items.
  From  36112.7 ...  41304.5 with 5000 items.
  From  41305.5 ...  46588.4 with 5000 items.
  From  46589.4 ...  51771.6 with 4999 items.
  From  51772.6 ...  56861.7 with 5000 items.
  From  56862.7 ...  62156.4 with 5001 items.
  From  62157.4 ...  67289.8 with 4999 items.
  From  67290.8 ...  72393.2 with 5000 items.
  From  72394.2 ...  77545.3 with 5000 items.
  From  77546.3 ...  82747.2 with 5003 items.
  From  82748.2 ...  87995.3 with 5000 items.
  From  87996.3 ...  93270.8 with 4999 items.
  From  93271.8 ...  98331.3 with 5001 items.
  From  98332.3 ... 101660.9 with 1599 items.

如果您真正的get_prange_count()函数调用了一个慢速DB,那么它将对每个bin最大价格值进行许多二进制搜索。

停止按!

我重新阅读了您的问题,它最多显示5k 如果对其进行以下更改,get_5k将像以前一样进行二进制搜索,但是如果它超过5k,并且delta_mx足够小以退出外部的while循环,则新的内部while循环将减小,mx直到count<= 5k。

def get_5k(mn=0, mx=get_max_price(), num=5_000):
    count = get_prange_count(mn, mx)
    delta_mx = (mx - mn) / 2
    while count != num and delta_mx >= delta_price / 2:
        mx += -delta_mx if count > num else +delta_mx
        count, delta_mx = get_prange_count(mn, mx), delta_mx / 2
        # Assure count < num
        while count > num and delta_mx < delta_price / 2 and mx > mn:
            mx -= delta_price
            count = get_prange_count(mn, mx)
    return mx, count

样本输出:

Using 95007 random prices from 0 to 99999
Splits into 19 bins of approx 5000 elements
  From      0.0 ...   5236.8 with 5000 items.
  From   5237.8 ...  10564.6 with 5000 items.
  From  10565.6 ...  15872.4 with 4998 items.
  From  15873.4 ...  21146.7 with 5000 items.
  From  21147.7 ...  26417.0 with 4999 items.
  From  26418.0 ...  31730.9 with 5000 items.
  From  31731.9 ...  36962.1 with 5000 items.
  From  36963.1 ...  42165.8 with 4998 items.
  From  42166.8 ...  47382.0 with 5000 items.
  From  47383.0 ...  52662.6 with 5000 items.
  From  52663.6 ...  57884.3 with 5000 items.
  From  57885.3 ...  63183.5 with 5000 items.
  From  63184.5 ...  68374.5 with 4999 items.
  From  68375.5 ...  73688.2 with 5000 items.
  From  73689.2 ...  78915.4 with 4998 items.
  From  78916.4 ...  84297.7 with 5000 items.
  From  84298.7 ...  89515.5 with 4999 items.
  From  89516.5 ...  94709.1 with 5000 items.
  From  94710.1 ... 105287.2 with 4991 items.

停止按#2!

进一步的测试显示了以前的代码删除项。以下内容不丢弃任何项目,并保持在5k以下

def get_5k(mn=0, mx=get_max_price(), num=5_000):
    "Mainly binary search for num items between mn and mx, adjusting mx"
    count = get_prange_count(mn, mx)
    delta_mx = (mx - mn) / 2
    while count != num and delta_mx >= delta_price / 2:
        mx += -delta_mx if count > num else +delta_mx
        mx = mx // 1    # Floor
        count, delta_mx = get_prange_count(mn, mx), delta_mx / 2
        # Assure count < num
        # while count > num and delta_mx < delta_price / 2 and mx > mn:
        #     mx -= delta_price
        #     count = get_prange_count(mn, mx)
    return mx, count

将以下检查添加到末尾:

if __name__ == '__main__':
    if len(price_list) != sum(count for mn, mx, count in result):
        print("\nWhoops! Some items missing:")
        p = price_list.copy()
        for mn, mx, count in result:
            items = _get_prange(mn, mx)
            assert len(items) == get_prange_count(mn, mx)
            for i in items:
                p.remove(i)
        p.sort()
        print(p)

现在输出如下:

Using 100941 random prices from 0 to 99999
Splits into 21 bins of approx 5000 elements
  From      0.0 ...   4918.0 with 4998 items.
  From   4919.0 ...   9863.0 with 5000 items.
  From   9864.0 ...  14901.0 with 4998 items.
  From  14902.0 ...  19708.0 with 4999 items.
  From  19709.0 ...  24605.0 with 5000 items.
  From  24606.0 ...  29555.0 with 5000 items.
  From  29556.0 ...  34539.0 with 5000 items.
  From  34540.0 ...  39507.0 with 5000 items.
  From  39508.0 ...  44602.0 with 4997 items.
  From  44603.0 ...  49462.0 with 5000 items.
  From  49463.0 ...  54458.0 with 4999 items.
  From  54459.0 ...  59432.0 with 5000 items.
  From  59433.0 ...  64315.0 with 5000 items.
  From  64316.0 ...  69416.0 with 4998 items.
  From  69417.0 ...  74389.0 with 4998 items.
  From  74390.0 ...  79253.0 with 4999 items.
  From  79254.0 ...  84149.0 with 5000 items.
  From  84150.0 ...  89134.0 with 5000 items.
  From  89135.0 ...  94059.0 with 5000 items.
  From  94060.0 ...  99055.0 with 5000 items.
  From  99056.0 ... 100934.0 with 955 items.

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章

来自分类Dev

递归将列表分解为元素

来自分类Dev

将矩阵分解为序言中的第一个元素列表和其余元素的子列表

来自分类Dev

将列表分解为较小的重复元素列表,并保留原始顺序(Python)

来自分类Dev

将列表分解为一组索引列表

来自分类Dev

Python将单个列表分解为多个子列表

来自分类Dev

生成分解为k个集合的n个元素的每种组合的算法

来自分类Dev

将字符串分解为元素

来自分类Dev

将字符串分解为元素

来自分类Dev

如何将多个逗号分隔的列表分解为一个foreach?

来自分类Dev

(PHP)如何将年份列表分解为一个范围

来自分类Dev

将列表分解为字符串的函数

来自分类Dev

将整数分解为数字列表

来自分类Dev

将列表中的值分解为元组

来自分类Dev

将列表分解为多个数组的 Powershell 脚本

来自分类Dev

在序言中将矩阵分解为第一个元素的列表和其余元素的子列表

来自分类常见问题

Spark:将多列分解为一个

来自分类Dev

Spark:将多列分解为一个

来自分类Dev

将一个数分解为质数

来自分类Dev

如何将一个句子(字符串)分解为一个列表([String])?

来自分类Dev

将嵌套映射分解为键值对

来自分类Dev

将匿名功能分解为术语

来自分类Dev

将FD分解为BCNF

来自分类Dev

将this.props分解为Component

来自分类Dev

将序列分解为词汇变量

来自分类Dev

将shell输出分解为文件

来自分类Dev

自动将单词分解为字母?

来自分类Dev

将传入的JSON分解为数组

来自分类Dev

将FD分解为BCNF

来自分类Dev

MATLAB:将矩阵分解为向量