Performance optimization on Django update or create

ygesher

In a Django project, I'm refreshing tens of thousands of lines of data from an external API on a daily basis. The problem is that since I don't know if the data is new or just an update, I can't do a bulk_create operation.

Note: Some, or perhaps many, of the rows, do not actually change on a daily basis, but I don't which, or how many, ahead of time.

So for now I do:

for row in csv_data:
    try:
        MyModel.objects.update_or_create(id=row['id'], defaults={'field1': row['value1']....})
    except:
        print 'error!'

And it takes.... forever! One or two lines a second, max speed, sometimes several seconds per line. Each model I'm refreshing has one or more other models connected to it through a foreign key, so I can't just delete them all and reinsert every day. I can't wrap my head around this one -- how can I cut down significantly the number of database operations so the refresh doesn't take hours and hours.

Thanks for any help.

Yariv Katz

The problem is you are doing a database action on each data row you grabbed from the api. You can avoid doing that by understanding which of the rows are new (and do a bulk insert to all new rows), Which of the rows actually need update, and which didn't change. To elaborate:

  1. grab all the relevant rows from the database (meaning all the rows that can possibly be updated)
old_data = MyModel.objects.all() # if possible than do MyModel.objects.filter(...)
  1. Grab all the api data you need to insert or update
api_data = [...]
  1. for each row of data understand if its new and put it in array, or determine if the row needs to update the DB
    for row in api_data:
        if is_new_row(row, old_data):
            new_rows_array.append(row)
        else:
            if is_data_modified(row, old_data):
                ...
                # do the update
            else:
                continue
     MyModel.objects.bulk_create(new_rows_array)

is_new_row - will understand if the row is new and add it to an array that will be bulk created

is_data_modified - will look for the row in the old data and understand if the data of that row is changed and will update only if its changed

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Django. get_or_create optimization with defer

From Dev

Django. get_or_create optimization with defer

From Dev

Performance optimization of a database access query in Python/Django web app

From Dev

Performance optimization of a database access query in Python/Django web app

From Dev

QTableWidget performance optimization

From Dev

Nodejs performance optimization

From Dev

Performance optimization for std::string

From Dev

Haskell Performance Optimization

From Dev

offset/limit performance optimization

From Dev

sql view performance optimization

From Dev

C++ optimization if performance

From Dev

Game performance optimization interview

From Dev

Regex performance optimization

From Dev

Java compilation optimization and performance

From Dev

in_array() performance optimization

From Dev

Java final performance/optimization

From Dev

Nodejs performance optimization

From Dev

Game performance optimization interview

From Dev

Chrome Extension performance optimization?

From Dev

QTableWidget performance optimization

From Dev

SQL Query Performance Optimization

From Dev

ZFS dedup performance optimization

From Dev

django-rest-framework ManyToManyField create and update

From Dev

Create or Update (with PUT) in Django Rest Framework

From Dev

Django ModelForms __init__ kwargs create and update

From Dev

Django FormView: distinguishing between create and update

From Dev

Django - Update or create syntax assistance (error)

From Dev

Django PostgreSQL IntegerRangeField and update_or_create

From Dev

Django update or create where field like/beings with