Django的ManyToMany字段的bulk_create的正确方法？

Georgy 发表于 Dev

乔治

我有用于表填充的这段代码。

def add_tags(count):
    print "Add tags"
    insert_list = []
    photo_pk_lower_bound = Photo.objects.all().order_by("id")[0].pk
    photo_pk_upper_bound = Photo.objects.all().order_by("-id")[0].pk
    for i in range(count):
        t = Tag( tag = 'tag' + str(i) )
        insert_list.append(t)
    Tag.objects.bulk_create(insert_list)
    for i in range(count):
        random_photo_pk = randint(photo_pk_lower_bound, photo_pk_upper_bound)
        p = Photo.objects.get( pk = random_photo_pk )
        t = Tag.objects.get( tag = 'tag' + str(i) )
        t.photos.add(p)

这是模型：

class Tag(models.Model):
    tag = models.CharField(max_length=20,unique=True)
    photos = models.ManyToManyField(Photo)

据我理解这个答案：Django：此函数的关键字参数无效，我必须先保存标签对象（由于ManyToMany字段），然后通过将照片附加到它们add()。但是，从count长远来看，此过程耗时太长。有什么方法可以重构此代码以使其更快？

通常，我想用随机虚拟数据填充Tag模型。

编辑1（照片模型）

class Photo(models.Model):
    photo = models.ImageField(upload_to="images")
    created_date = models.DateTimeField(auto_now=True)
    user = models.ForeignKey(User)

    def __unicode__(self):
       return self.photo.name

杜丁

TL; DR使用“直通”模型批量插入m2m关系。

Tag.photos.through => Model with 3 fields [ id, tag, photo ]
new_tag_photo = Tag.photos.through(tag_id=1, photo_id=2)
Tag.photos.through.bulk_insert([new_tag_photo, ...])

这是我所知道的最快的方法，我一直在使用它来创建测试数据。我可以在几分钟内生成数百万条记录。

从乔治编辑：

def add_tags(count):
    new_tags = []
    for t in range(count):
        tag = Tag(tag='tag%s' % t)
        new_tags.append(tag)
    Tag.objects.bulk_create(new_tags)

    tag_ids = list(Tag.objects.values_list('id', flat=True))
    photo_ids = Photo.objects.values_list('id', flat=True)
    tag_count = len(tag_ids)
       
    for photo_id in photo_ids:
        tag_to_photo_links = []
        shuffle(tag_ids)

        rand_num_tags = randint(0, tag_count)
        photo_tags = tag_ids[:rand_num_tags]

        for tag_id in photo_tags:
            # through is the table generated by django to link m2m between tag and photo
            photo_tag = Tag.photos.through(tag_id=tag_id, photo_id=photo_id)
            tag_to_photo_links.append(photo_tag)

        Tag.photos.through.objects.bulk_create(tag_to_photo_links, batch_size=7000)