I have a large (~6GB) text file in a simple format
x1 y1 z1
x2 y2 z2
...
Since I may load this data more than once, I've created a np.memmap
file for efficiency reasons:
X,Y,Z = np.memmap(f_np_mmap,dtype='float32',mode='r',shape=shape).T
What I'm trying to do is plot:
plt.scatter(X, Y,
color=custom_colorfunction(Z),
alpha=.01, s=.001, marker='s', linewidth=0)
This works perfectly for smaller datasets. However, for this larger dataset I run out of memory. I've checked that plt.scatter
is taking all the memory; I can step through X,Y,Z
just fine. Is there a way I "rasterize" the canvas so I do not run out of memory? I do not need to zoom and pan around the image, it is going straight to disk. I realize that I can bin the data and plot that, but I'm not sure how to do this with a custom colormap and an alpha value.
@tcaswell's suggestion to override the Axes.draw
method is definitely the most flexible way to approach this.
However, you can use/abuse blitting to do this without subclassing Axes
. Just use draw_artist
each time without restoring the canvas.
There's one additional trick: We need to have a special save
method, as all of the others draw the canvas before saving, which will wipe out everything we've drawn on it previously.
Also, as tcaswell notes, calling draw_artist
for every item is rather slow, so for a large number of points, you'll want to chunk your input data. Chunking will give a significant speedup, but this method is always going to be slower than drawing a single PathCollection
.
At any rate, either one of these answers should alleviate your memory problems. Here's a simplistic example.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import _png
from itertools import izip
def main():
# We'll be saving the figure's background, so let's make it transparent.
fig, ax = plt.subplots(facecolor='none')
# You'll have to know the extent of the input beforehand with this method.
ax.axis([0, 10, 0, 10])
# We need to draw the canvas before we start adding points.
fig.canvas.draw()
# This won't actually ever be drawn. We just need an artist to update.
col = ax.scatter([5], [5], color=[0.1, 0.1, 0.1], alpha=0.3)
for xy, color in datastream(int(1e6), chunksize=int(1e4)):
col.set_offsets(xy)
col.set_color(color)
ax.draw_artist(col)
save(fig, 'test.png')
def datastream(n, chunksize=1):
"""Returns a generator over "n" random xy positions and rgb colors."""
for _ in xrange(n//chunksize):
xy = 10 * np.random.random((chunksize, 2))
color = np.random.random((chunksize, 3))
yield xy, color
def save(fig, filename):
"""We have to work around `fig.canvas.print_png`, etc calling `draw`."""
renderer = fig.canvas.renderer
with open(filename, 'w') as outfile:
_png.write_png(renderer._renderer.buffer_rgba(),
renderer.width, renderer.height,
outfile, fig.dpi)
main()
Also, you might notice that the top and left spines are getting drawn over. You could work around this by re-drawing those two spines (ax.draw_artist(ax.spines['top'])
, etc) before saving.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments