Out of Memory:Transferring Large Data from Amazon Redshift to Pandas

RC0706

I have a large data chunk(about 10M rows) in Amazon-Redishift, that I was to obtain in a Pandas data-frame and store the data in a pickle file. However, it shows "Out of Memory" exception for obvious reasons, because of the size of data. I tried a lot other things like sqlalchemy, however, not able to crack the Problem. Can anyone suggest a better way or code to get through it.

My current (simple) code snippet goes as below:

import psycopg2 

import pandas as pd

import numpy as np

cnxn = psycopg2.connect(dbname=<mydatabase>, host='my_redshift_Server_Name', port='5439', user=<username>, password=<pwd>)

sql = "Select * from mydatabase.mytable" 

df = pd.read_sql(sql, cnxn, columns=1)

pd.to_pickle(df, 'Base_Data.pkl')

print(df.head(50))

cnxn.close()

print(df.head(50))
AlexYes

1) find the row count in the table and the maximum chunk of the table that you can pull by adding order by [column] limit [number] offset 0 and increasing the limit number reasonably

2) add a loop that will produce the sql with the limit that you found and increasing offset, i.e. if you can pull 10k rows your statements would be:

... limit 10000 offset 0;
... limit 10000 offset 10000;
... limit 10000 offset 20000;

until you reach the table row count

3) in the same loop, append every new obtained set of rows to your dataframe.

p.s. this will work assuming you won't run into any issues with memory/disk on client end which I can't guarantee since you have such issue on a cluster which is likely higher grade hardware. To avoid the problem you would just write a new file on every iteration instead of appending.

Also, the whole approach is probably not right. You'd better unload the table to S3 which is pretty quick because the data is copied from every node independently, and then do whatever needed against the flat file on S3 to transform it to the final format you need.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Synchronize data from MySql to Amazon RedShift

From Dev

Synchronize data from MySql to Amazon RedShift

From Dev

Migrating Data From Amazon Redshift into DynamoDB

From Dev

FileZilla times out when transferring large file

From Dev

Filter out data in a large pandas dataframe

From Dev

Memory management of large data from oracle database

From Dev

Transferring large quantities of data from S3 bucket with direct connect

From Dev

extract the data out from a large list

From Dev

Entity framework large data set, out of memory exception

From Dev

EF - Query Large Data Sets Causes Out Of Memory Exception

From Dev

Transferring data from int to string

From Dev

Transferring data from postgres to rabbitmq

From Dev

Deserializing large JSON Objects from Web Service (Out of Memory)

From Dev

Transferring large files using scp with CPU and memory considerations

From Dev

large bitmap crushes (out of memory)

From Dev

Transferring data to/from a callback from/to a worker thread

From Dev

JOGL Large Texture Out Loading Out Of Memory

From Dev

LAN / WAN issues when transferring large amounts of data

From Dev

Transferring resources from Amazon S3 to Tomcat via Chef

From Dev

How to get pandas to print out data and not the memory address?

From Dev

Amazon Redshift: Copying Data Between Databases

From Dev

Amazon Redshift - How to extract previous month data

From Dev

Transferring data from double array to HashMap

From Dev

Transferring data from controller to view - List<> / IEnumerable<>?

From Dev

Consilidating and Transferring data from multiple sheets

From Dev

transferring data from string arrays into another activity

From Dev

Transferring data from one view to another in Rails

From Dev

connecting amazon redshift server from tableau server

From Dev

Export from Amazon Redshift into an RDS MySQL database

Related Related

  1. 1

    Synchronize data from MySql to Amazon RedShift

  2. 2

    Synchronize data from MySql to Amazon RedShift

  3. 3

    Migrating Data From Amazon Redshift into DynamoDB

  4. 4

    FileZilla times out when transferring large file

  5. 5

    Filter out data in a large pandas dataframe

  6. 6

    Memory management of large data from oracle database

  7. 7

    Transferring large quantities of data from S3 bucket with direct connect

  8. 8

    extract the data out from a large list

  9. 9

    Entity framework large data set, out of memory exception

  10. 10

    EF - Query Large Data Sets Causes Out Of Memory Exception

  11. 11

    Transferring data from int to string

  12. 12

    Transferring data from postgres to rabbitmq

  13. 13

    Deserializing large JSON Objects from Web Service (Out of Memory)

  14. 14

    Transferring large files using scp with CPU and memory considerations

  15. 15

    large bitmap crushes (out of memory)

  16. 16

    Transferring data to/from a callback from/to a worker thread

  17. 17

    JOGL Large Texture Out Loading Out Of Memory

  18. 18

    LAN / WAN issues when transferring large amounts of data

  19. 19

    Transferring resources from Amazon S3 to Tomcat via Chef

  20. 20

    How to get pandas to print out data and not the memory address?

  21. 21

    Amazon Redshift: Copying Data Between Databases

  22. 22

    Amazon Redshift - How to extract previous month data

  23. 23

    Transferring data from double array to HashMap

  24. 24

    Transferring data from controller to view - List<> / IEnumerable<>?

  25. 25

    Consilidating and Transferring data from multiple sheets

  26. 26

    transferring data from string arrays into another activity

  27. 27

    Transferring data from one view to another in Rails

  28. 28

    connecting amazon redshift server from tableau server

  29. 29

    Export from Amazon Redshift into an RDS MySQL database

HotTag

Archive