将数字列表拆分为n个块，以使这些块具有（接近）相等的总和并保持原始顺序

Ng Oon-Ee 发表于 Dev

吴O娥

这不是标准的分区问题，因为我需要维护列表中元素的顺序。

例如，如果我有一个列表

[1, 6, 2, 3, 4, 1, 7, 6, 4]

我想要两个块，然后拆分应该给

[[1, 6, 2, 3, 4, 1], [7, 6, 4]]

每边总共17个。对于三块，结果将是

[[1, 6, 2, 3], [4, 1, 7], [6, 4]]

总计为12、12和10。

编辑以获取其他说明

我目前将总和除以块数，然后将其用作目标，然后进行迭代，直到接近该目标为止。问题在于某些数据集可能会使算法混乱，例如，尝试将以下内容分为3：-

[95, 15, 75, 25, 85, 5]

总和为300，目标为100。第一个块的总和为95，第二个块的总和为90，第三个块的总和为110，而5为“剩余”。将其追加到应有的位置将得到95、90、115，而更“合理”的解决方案将是110、100、90。

结束编辑

背景：

我有一个包含高度不同的文本（歌词）的列表，我想将文本分为任意数量的列。目前，我基于所有行的总高度计算目标高度，但是显然这是一个始终被低估的情况，这在某些情况下会导致次优解决方案（最后一列的高度明显更高）。

肖恩·沙利文（Shawn Sullivan）

这种方法定义了分区边界，该边界将数组划分为大致相等数量的元素，然后反复搜索更好的分区，直到找不到更多分区为止。它与大多数其他发布的解决方案的不同之处在于，它通过尝试多个不同的分区来寻找最佳解决方案。其他解决方案尝试在单次通过数组的过程中创建一个良好的分区，但我想不出保证最佳的单次通过算法。

此处的代码是该算法的有效实现，但可能很难理解，因此在末尾包含了更具可读性的版本作为附录。

def partition_list(a, k):
    if k <= 1: return [a]
    if k >= len(a): return [[x] for x in a]
    partition_between = [(i+1)*len(a)/k for i in range(k-1)]
    average_height = float(sum(a))/k
    best_score = None
    best_partitions = None
    count = 0

    while True:
        starts = [0]+partition_between
        ends = partition_between+[len(a)]
        partitions = [a[starts[i]:ends[i]] for i in range(k)]
        heights = map(sum, partitions)

        abs_height_diffs = map(lambda x: abs(average_height - x), heights)
        worst_partition_index = abs_height_diffs.index(max(abs_height_diffs))
        worst_height_diff = average_height - heights[worst_partition_index]

        if best_score is None or abs(worst_height_diff) < best_score:
            best_score = abs(worst_height_diff)
            best_partitions = partitions
            no_improvements_count = 0
        else:
            no_improvements_count += 1

        if worst_height_diff == 0 or no_improvements_count > 5 or count > 100:
            return best_partitions
        count += 1

        move = -1 if worst_height_diff < 0 else 1
        bound_to_move = 0 if worst_partition_index == 0\
                        else k-2 if worst_partition_index == k-1\
                        else worst_partition_index-1 if (worst_height_diff < 0) ^ (heights[worst_partition_index-1] > heights[worst_partition_index+1])\
                        else worst_partition_index
        direction = -1 if bound_to_move < worst_partition_index else 1
        partition_between[bound_to_move] += move * direction

def print_best_partition(a, k):
    print 'Partitioning {0} into {1} partitions'.format(a, k)
    p = partition_list(a, k)
    print 'The best partitioning is {0}\n    With heights {1}\n'.format(p, map(sum, p))

a = [1, 6, 2, 3, 4, 1, 7, 6, 4]
print_best_partition(a, 1)
print_best_partition(a, 2) 
print_best_partition(a, 3)
print_best_partition(a, 4)

b = [1, 10, 10, 1]
print_best_partition(b, 2)

import random
c = [random.randint(0,20) for x in range(100)]
print_best_partition(c, 10)

d = [95, 15, 75, 25, 85, 5]
print_best_partition(d, 3)

根据您要执行的操作，可能需要进行一些修改。例如，要确定是否找到了最佳的分区，该算法将在分区之间没有高度差时停止，它找不到比连续5次以上或100次迭代后看到的最佳结果更好的东西总迭代作为万能的停止点。您可能需要调整这些常数或使用其他方案。如果您的身高构成了复杂的价值观，那么知道何时停止就可能陷入试图逃脱局部最大值之类的经典问题。

输出

Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 1 partitions
The best partitioning is [[1, 6, 2, 3, 4, 1, 7, 6, 4]]
With heights [34]

Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 2 partitions
The best partitioning is [[1, 6, 2, 3, 4, 1], [7, 6, 4]]
With heights [17, 17]

Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 3 partitions
The best partitioning is [[1, 6, 2, 3], [4, 1, 7], [6, 4]]
With heights [12, 12, 10]

Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 4 partitions
The best partitioning is [[1, 6], [2, 3, 4], [1, 7], [6, 4]]
With heights [7, 9, 8, 10]

Partitioning [1, 10, 10, 1] into 2 partitions
The best partitioning is [[1, 10], [10, 1]]
With heights [11, 11]

Partitioning [7, 17, 17, 1, 8, 8, 12, 0, 10, 20, 17, 13, 12, 4, 1, 1, 7, 11, 7, 13, 9, 12, 3, 18, 9, 6, 7, 19, 20, 17, 7, 4, 3, 16, 20, 6, 7, 12, 16, 3, 6, 12, 9, 4, 3, 2, 18, 1, 16, 14, 17, 7, 0, 14, 13, 3, 5, 3, 1, 5, 5, 13, 16, 0, 16, 7, 3, 8, 1, 20, 16, 11, 15, 3, 10, 10, 2, 0, 12, 12, 0, 18, 20, 3, 10, 9, 13, 12, 15, 6, 14, 16, 6, 12, 9, 9, 16, 14, 19, 1] into 10 partitions
The best partitioning is [[7, 17, 17, 1, 8, 8, 12, 0, 10, 20], [17, 13, 12, 4, 1, 1, 7, 11, 7, 13, 9], [12, 3, 18, 9, 6, 7, 19, 20], [17, 7, 4, 3, 16, 20, 6, 7, 12], [16, 3, 6, 12, 9, 4, 3, 2, 18, 1, 16], [14, 17, 7, 0, 14, 13, 3, 5, 3, 1, 5, 5], [13, 16, 0, 16, 7, 3, 8, 1, 20, 16], [11, 15, 3, 10, 10, 2, 0, 12, 12, 0, 18], [20, 3, 10, 9, 13, 12, 15, 6, 14], [16, 6, 12, 9, 9, 16, 14, 19, 1]]
With heights [100, 95, 94, 92, 90, 87, 100, 93, 102, 102]

Partitioning [95, 15, 75, 25, 85, 5] into 3 partitions
The best partitioning is [[95, 15], [75, 25], [85, 5]]
With heights [110, 100, 90]

编辑

添加了此方法可以正确处理的新测试用例[95、15、75、25、85、5]。

附录

此版本的算法更易于阅读和理解，但由于较少利用内置Python功能，因此会更长一些。但是，它似乎可以在相当甚至更快的时间内执行。

#partition list a into k partitions
def partition_list(a, k):
    #check degenerate conditions
    if k <= 1: return [a]
    if k >= len(a): return [[x] for x in a]
    #create a list of indexes to partition between, using the index on the
    #left of the partition to indicate where to partition
    #to start, roughly partition the array into equal groups of len(a)/k (note
    #that the last group may be a different size) 
    partition_between = []
    for i in range(k-1):
        partition_between.append((i+1)*len(a)/k)
    #the ideal size for all partitions is the total height of the list divided
    #by the number of paritions
    average_height = float(sum(a))/k
    best_score = None
    best_partitions = None
    count = 0
    no_improvements_count = 0
    #loop over possible partitionings
    while True:
        #partition the list
        partitions = []
        index = 0
        for div in partition_between:
            #create partitions based on partition_between
            partitions.append(a[index:div])
            index = div
        #append the last partition, which runs from the last partition divider
        #to the end of the list
        partitions.append(a[index:])
        #evaluate the partitioning
        worst_height_diff = 0
        worst_partition_index = -1
        for p in partitions:
            #compare the partition height to the ideal partition height
            height_diff = average_height - sum(p)
            #if it's the worst partition we've seen, update the variables that
            #track that
            if abs(height_diff) > abs(worst_height_diff):
                worst_height_diff = height_diff
                worst_partition_index = partitions.index(p)
        #if the worst partition from this run is still better than anything
        #we saw in previous iterations, update our best-ever variables
        if best_score is None or abs(worst_height_diff) < best_score:
            best_score = abs(worst_height_diff)
            best_partitions = partitions
            no_improvements_count = 0
        else:
            no_improvements_count += 1
        #decide if we're done: if all our partition heights are ideal, or if
        #we haven't seen improvement in >5 iterations, or we've tried 100
        #different partitionings
        #the criteria to exit are important for getting a good result with
        #complex data, and changing them is a good way to experiment with getting
        #improved results
        if worst_height_diff == 0 or no_improvements_count > 5 or count > 100:
            return best_partitions
        count += 1
        #adjust the partitioning of the worst partition to move it closer to the
        #ideal size. the overall goal is to take the worst partition and adjust
        #its size to try and make its height closer to the ideal. generally, if
        #the worst partition is too big, we want to shrink the worst partition
        #by moving one of its ends into the smaller of the two neighboring
        #partitions. if the worst partition is too small, we want to grow the
        #partition by expanding the partition towards the larger of the two
        #neighboring partitions
        if worst_partition_index == 0:   #the worst partition is the first one
            if worst_height_diff < 0: partition_between[0] -= 1   #partition too big, so make it smaller
            else: partition_between[0] += 1   #partition too small, so make it bigger
        elif worst_partition_index == len(partitions)-1: #the worst partition is the last one
            if worst_height_diff < 0: partition_between[-1] += 1   #partition too small, so make it bigger
            else: partition_between[-1] -= 1   #partition too big, so make it smaller
        else:   #the worst partition is in the middle somewhere
            left_bound = worst_partition_index - 1   #the divider before the partition
            right_bound = worst_partition_index   #the divider after the partition
            if worst_height_diff < 0:   #partition too big, so make it smaller
                if sum(partitions[worst_partition_index-1]) > sum(partitions[worst_partition_index+1]):   #the partition on the left is bigger than the one on the right, so make the one on the right bigger
                    partition_between[right_bound] -= 1
                else:   #the partition on the left is smaller than the one on the right, so make the one on the left bigger
                    partition_between[left_bound] += 1
            else:   #partition too small, make it bigger
                if sum(partitions[worst_partition_index-1]) > sum(partitions[worst_partition_index+1]): #the partition on the left is bigger than the one on the right, so make the one on the left smaller
                    partition_between[left_bound] -= 1
                else:   #the partition on the left is smaller than the one on the right, so make the one on the right smaller
                    partition_between[right_bound] += 1

def print_best_partition(a, k):
    #simple function to partition a list and print info
    print '    Partitioning {0} into {1} partitions'.format(a, k)
    p = partition_list(a, k)
    print '    The best partitioning is {0}\n    With heights {1}\n'.format(p, map(sum, p))

#tests
a = [1, 6, 2, 3, 4, 1, 7, 6, 4]
print_best_partition(a, 1)
print_best_partition(a, 2) 
print_best_partition(a, 3)
print_best_partition(a, 4)
print_best_partition(a, 5)

b = [1, 10, 10, 1]
print_best_partition(b, 2)

import random
c = [random.randint(0,20) for x in range(100)]
print_best_partition(c, 10)

d = [95, 15, 75, 25, 85, 5]
print_best_partition(d, 3)

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-02-25

我来说两句

0条评论

登录后参与评论

上一篇：编译后Visual Studio冻结。当我尝试启动应用程序时，资源管理器也是如此

来自分类Dev

Related 相关文章

文章

将数字列表拆分为n个块，以使这些块具有（接近）相等的总和并保持原始顺序

将数字列表拆分为n个块，以使这些块具有（接近）相等的总和并保持原始顺序

输出

编辑

附录

Python将列表分为N个块，总和大致相等

将具有连续块项目但不连续块的列表拆分为子列表

将具有连续块项目但不连续块的列表拆分为子列表

将数组拆分为具有动态大小的固定 n 个块

将数组拆分为大致相等的块

将列表拆分为块索引列表

将数据拆分为3个有序的块，且大小尽可能接近

将列表拆分为单独但重叠的块

将列表拆分为单独但重叠的块

将numpy数组拆分为不相等的块

将Python列表拆分为重叠块的列表

（序言）检查列表是否可以分为2个具有相等总和的子列表

将数字列表拆分为 N 个偶数小节

将数字向量拆分为R中的连续块

将数组拆分为具有最小块大小的给定大小的块

将列表拆分为随机大小的块的最佳方法？

如何使用跨度将列表拆分为多个块？

如何将代码块拆分为列表？

将具有M个元素的向量列表拆分为2个N和MN元素列表

是否有将大数据帧拆分为大小相等的n个较小数据帧（按行）并具有n + 1个较小大小的数据帧的功能？

根据第二个列表将Python列表拆分为自定义块大小

根据第二个列表将Python列表拆分为自定义块大小

如何使用biopython将多fasta文件拆分为相等序列长度的块并更改标题

将稀疏方阵拆分为块

MongoDB。将嵌入的数组拆分为块

Prolog将数字列表分为两个相等的部分

如何将数组拆分为块并找到块的总和并将输出作为数组存储在pyspark中

在R中将向量拆分为不相等的块

使用macOS版本的拆分包将文件拆分为4个块