I have 5 million rows in a MySQL DB sitting over the (local) network (so quick connection, not on the internet).
The connection to the DB works fine, but if I try to do
f = pd.read_sql_query('SELECT * FROM mytable', engine, index_col = 'ID')
This takes a really long time. Even chunking with chunksize
will be slow. Besides, I don't really know whether it's just hung there or indeed retrieving information.
I would like to ask, for those people working with large data on a DB, how they retrieve their data for their Pandas session?
Would it be "smarter", for example, to run the query, return a csv file with the results and load that into Pandas? Sounds much more involved than it needs to be.
The best way of loading all data from a table out of -any-SQL database into pandas is:
pandas.read_csv
functionUse the connector only for reading a few rows. The power of an SQL database is its ability to deliver small chunks of data based on indices.
Delivering entire tables is something you do with dumps.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments