Plotting a large number of points using matplotlib and running out of memory

Hooked

I have a large (~6GB) text file in a simple format

x1 y1 z1
x2 y2 z2
...

Since I may load this data more than once, I've created a np.memmap file for efficiency reasons:

X,Y,Z = np.memmap(f_np_mmap,dtype='float32',mode='r',shape=shape).T

What I'm trying to do is plot:

plt.scatter(X, Y, 
           color=custom_colorfunction(Z), 
           alpha=.01, s=.001, marker='s', linewidth=0)

This works perfectly for smaller datasets. However, for this larger dataset I run out of memory. I've checked that plt.scatter is taking all the memory; I can step through X,Y,Z just fine. Is there a way I "rasterize" the canvas so I do not run out of memory? I do not need to zoom and pan around the image, it is going straight to disk. I realize that I can bin the data and plot that, but I'm not sure how to do this with a custom colormap and an alpha value.

Joe Kington

@tcaswell's suggestion to override the Axes.draw method is definitely the most flexible way to approach this.

However, you can use/abuse blitting to do this without subclassing Axes. Just use draw_artist each time without restoring the canvas.

There's one additional trick: We need to have a special save method, as all of the others draw the canvas before saving, which will wipe out everything we've drawn on it previously.

Also, as tcaswell notes, calling draw_artist for every item is rather slow, so for a large number of points, you'll want to chunk your input data. Chunking will give a significant speedup, but this method is always going to be slower than drawing a single PathCollection.

At any rate, either one of these answers should alleviate your memory problems. Here's a simplistic example.

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import _png
from itertools import izip

def main():
    # We'll be saving the figure's background, so let's make it transparent.
    fig, ax = plt.subplots(facecolor='none')

    # You'll have to know the extent of the input beforehand with this method.
    ax.axis([0, 10, 0, 10])

    # We need to draw the canvas before we start adding points.
    fig.canvas.draw()

    # This won't actually ever be drawn. We just need an artist to update.
    col = ax.scatter([5], [5], color=[0.1, 0.1, 0.1], alpha=0.3)

    for xy, color in datastream(int(1e6), chunksize=int(1e4)):
        col.set_offsets(xy)
        col.set_color(color)
        ax.draw_artist(col)

    save(fig, 'test.png')

def datastream(n, chunksize=1):
    """Returns a generator over "n" random xy positions and rgb colors."""
    for _ in xrange(n//chunksize):
        xy = 10 * np.random.random((chunksize, 2))
        color = np.random.random((chunksize, 3))
        yield xy, color

def save(fig, filename):
    """We have to work around `fig.canvas.print_png`, etc calling `draw`."""
    renderer = fig.canvas.renderer
    with open(filename, 'w') as outfile:
        _png.write_png(renderer._renderer.buffer_rgba(),
                       renderer.width, renderer.height,
                       outfile, fig.dpi)

main()

enter image description here

Also, you might notice that the top and left spines are getting drawn over. You could work around this by re-drawing those two spines (ax.draw_artist(ax.spines['top']), etc) before saving.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Java

Out of memory exception when decrypt large file using Cipher

From Java

Calculate a large amount without running out of memory

From Dev

Openlayer is very slow when using WKT Multipolygon with a large number of Points

From Dev

Running out of memory using python ElementTree

From Dev

Compress large file using SharpZipLib causing Out Of Memory Exception

From Dev

Creating a large GIF with CGImageDestinationFinalize - running out of memory

From Dev

UglifyJS Running out of Memory

From Dev

Plotting using PolyCollection in matplotlib

From Dev

Out of memory when creating large number of relationships

From Dev

GridView out of memory with large number of ObservableCollection

From Dev

plotting points with different colors by name in matplotlib

From Dev

Plotting a very large number of points on HTML5 canvas with JavaScript

From Dev

PyCharm Running Out of Memory

From Dev

Seaborn: using boxplot cause running out of memory

From Dev

C++ running out of memory trying to draw large image with OpenGL

From Dev

How to iterate a large table in Django without running out of memory?

From Dev

Bash script using gzip and bcftools running out of memory with large files

From Dev

Plotting points between ranges using matplotlib

From Dev

Count number of pages in large set of pdf files : Out of memory

From Dev

Running out of memory

From Dev

Download large files using large byte array causes "Out of memory"

From Dev

Process out of memory error using nodejs when nothing should be that large

From Dev

"Out of memory" error when using TTask.Run for a large number of job running in parallel

From Dev

Error while plotting an ellipsoid using matplotlib 3D plot with random number of points inside: Python

From Dev

Node - Large for loops with database work running out of memory

From Dev

Running out of VBA Memory

From Dev

running out of memory with fragments

From Dev

Read/download large file in PHP without running out of memory

From Dev

Out of memory exception when using xlsx module with large files

Related Related

  1. 1

    Out of memory exception when decrypt large file using Cipher

  2. 2

    Calculate a large amount without running out of memory

  3. 3

    Openlayer is very slow when using WKT Multipolygon with a large number of Points

  4. 4

    Running out of memory using python ElementTree

  5. 5

    Compress large file using SharpZipLib causing Out Of Memory Exception

  6. 6

    Creating a large GIF with CGImageDestinationFinalize - running out of memory

  7. 7

    UglifyJS Running out of Memory

  8. 8

    Plotting using PolyCollection in matplotlib

  9. 9

    Out of memory when creating large number of relationships

  10. 10

    GridView out of memory with large number of ObservableCollection

  11. 11

    plotting points with different colors by name in matplotlib

  12. 12

    Plotting a very large number of points on HTML5 canvas with JavaScript

  13. 13

    PyCharm Running Out of Memory

  14. 14

    Seaborn: using boxplot cause running out of memory

  15. 15

    C++ running out of memory trying to draw large image with OpenGL

  16. 16

    How to iterate a large table in Django without running out of memory?

  17. 17

    Bash script using gzip and bcftools running out of memory with large files

  18. 18

    Plotting points between ranges using matplotlib

  19. 19

    Count number of pages in large set of pdf files : Out of memory

  20. 20

    Running out of memory

  21. 21

    Download large files using large byte array causes "Out of memory"

  22. 22

    Process out of memory error using nodejs when nothing should be that large

  23. 23

    "Out of memory" error when using TTask.Run for a large number of job running in parallel

  24. 24

    Error while plotting an ellipsoid using matplotlib 3D plot with random number of points inside: Python

  25. 25

    Node - Large for loops with database work running out of memory

  26. 26

    Running out of VBA Memory

  27. 27

    running out of memory with fragments

  28. 28

    Read/download large file in PHP without running out of memory

  29. 29

    Out of memory exception when using xlsx module with large files

HotTag

Archive