Loading 5 million rows into Pandas from MySQL

Dervin Thunk

I have 5 million rows in a MySQL DB sitting over the (local) network (so quick connection, not on the internet).

The connection to the DB works fine, but if I try to do

f = pd.read_sql_query('SELECT * FROM mytable', engine, index_col = 'ID')

This takes a really long time. Even chunking with chunksize will be slow. Besides, I don't really know whether it's just hung there or indeed retrieving information.

I would like to ask, for those people working with large data on a DB, how they retrieve their data for their Pandas session?

Would it be "smarter", for example, to run the query, return a csv file with the results and load that into Pandas? Sounds much more involved than it needs to be.

firelynx

The best way of loading all data from a table out of -any-SQL database into pandas is:

  1. Dumping the data out of the database using COPY for PostgreSQL, SELECT INTO OUTFILE for MySQL or similar for other dialects.
  2. Reading the csv file with pandas using the pandas.read_csv function

Use the connector only for reading a few rows. The power of an SQL database is its ability to deliver small chunks of data based on indices.

Delivering entire tables is something you do with dumps.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Optimal MYSQL query for longest prefix matching in a table with 5 million rows

From Dev

Optimal MYSQL query for longest prefix matching in a table with 5 million rows

From Dev

Duplicate Rows Loading Data From MySQL to DataGridView

From Dev

Can MySQL handle 100 million+ rows?

From Dev

INSERT vs UPDATE: MySQL, 7 million rows

From Dev

Mysql database design: table with half a million rows

From Dev

speeding up processing 5 million rows of coordinate data

From Dev

Pandas groupby+transform on 50 million rows is taking 3 hours

From Dev

Pandas groupby+transform on 50 million rows is taking 3 hours

From Dev

PHP/MySQL - Updating 70 million rows every week

From Dev

Slow Query on Medium MySQL Table (1 Million Rows)

From Dev

Using php/MySQL to run queries on 3+million rows

From Dev

Mysql group by won't use index after 1 million rows

From Dev

Slow Query on Medium MySQL Table (1 Million Rows)

From Dev

How to convert about 16 million rows in mysql with least time?

From Dev

Best way to insert ~20 million rows using Python/MySQL

From Dev

update query with 1.5 million rows taking long time to execute mysql

From Dev

MySQL performance issue on ~3million rows containing MEDIUMTEXT?

From Dev

Pandas get specific rows from HDF5 by index

From Dev

can not select ten million rows from a database using python

From Dev

SQL - Delete table with 372 million rows starting from first row

From Dev

How to quickly delete a column from a table containing 600 million rows?

From Java

Loading XML file in MySQL getting extra rows

From Dev

MySQL update every single row from a 3 million row table

From Dev

Problems with creating large MultiIndex (10 million rows) in Python Pandas used to reindex large DataFrame

From Dev

How do I prevent running out of memory when inserting a million rows in mysql with php

From Dev

Very slow SELECT query in MySQL in a 10 Million rows table (single table)

From Dev

Java SQL 1 million rows

From Dev

Java SQL 1 million rows

Related Related

  1. 1

    Optimal MYSQL query for longest prefix matching in a table with 5 million rows

  2. 2

    Optimal MYSQL query for longest prefix matching in a table with 5 million rows

  3. 3

    Duplicate Rows Loading Data From MySQL to DataGridView

  4. 4

    Can MySQL handle 100 million+ rows?

  5. 5

    INSERT vs UPDATE: MySQL, 7 million rows

  6. 6

    Mysql database design: table with half a million rows

  7. 7

    speeding up processing 5 million rows of coordinate data

  8. 8

    Pandas groupby+transform on 50 million rows is taking 3 hours

  9. 9

    Pandas groupby+transform on 50 million rows is taking 3 hours

  10. 10

    PHP/MySQL - Updating 70 million rows every week

  11. 11

    Slow Query on Medium MySQL Table (1 Million Rows)

  12. 12

    Using php/MySQL to run queries on 3+million rows

  13. 13

    Mysql group by won't use index after 1 million rows

  14. 14

    Slow Query on Medium MySQL Table (1 Million Rows)

  15. 15

    How to convert about 16 million rows in mysql with least time?

  16. 16

    Best way to insert ~20 million rows using Python/MySQL

  17. 17

    update query with 1.5 million rows taking long time to execute mysql

  18. 18

    MySQL performance issue on ~3million rows containing MEDIUMTEXT?

  19. 19

    Pandas get specific rows from HDF5 by index

  20. 20

    can not select ten million rows from a database using python

  21. 21

    SQL - Delete table with 372 million rows starting from first row

  22. 22

    How to quickly delete a column from a table containing 600 million rows?

  23. 23

    Loading XML file in MySQL getting extra rows

  24. 24

    MySQL update every single row from a 3 million row table

  25. 25

    Problems with creating large MultiIndex (10 million rows) in Python Pandas used to reindex large DataFrame

  26. 26

    How do I prevent running out of memory when inserting a million rows in mysql with php

  27. 27

    Very slow SELECT query in MySQL in a 10 Million rows table (single table)

  28. 28

    Java SQL 1 million rows

  29. 29

    Java SQL 1 million rows

HotTag

Archive