Python - Convert Very Large (6.4GB) XML files to JSON

Dustin

Essentially, I have a 6.4GB XML file that I'd like to convert to JSON then save it to disk. I'm currently running OSX 10.8.4 with an i7 2700k and 16GBs of ram, and running Python 64bit (double checked). I'm getting an error that I don't have enough memory to allocate. How do I go about fixing this?

print 'Opening'
f = open('large.xml', 'r')
data = f.read()
f.close()

print 'Converting'
newJSON = xmltodict.parse(data)

print 'Json Dumping'
newJSON = json.dumps(newJSON)

print 'Saving'
f = open('newjson.json', 'w')
f.write(newJSON)
f.close()

The Error:

Python(2461) malloc: *** mmap(size=140402048315392) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Traceback (most recent call last):
  File "/Users/user/Git/Resources/largexml2json.py", line 10, in <module>
    data = f.read()
MemoryError
Leonardo.Z

Many Python XML libraries support parsing XML sub elements incrementally, e.g. xml.etree.ElementTree.iterparse and xml.sax.parse in the standard library. These functions are usually called "XML Stream Parser".

The xmltodict library you used also has a streaming mode. I think it may solve your problem

https://github.com/martinblech/xmltodict#streaming-mode

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Querying very large xml files

From Dev

Converting very large files from xml to csv

From Dev

Handling very large files with openpyxl python

From Dev

pread for very large files

From Dev

Parse large JSON files (Python)

From Dev

Parse large JSON files (Python)

From Dev

How to convert a very large image(tif) file into an array using Python

From Dev

Deleting/rearranging/adding in very large tsv files Python

From Dev

Deleting/rearranging/adding in very large tsv files Python

From Dev

Processing Large Files in Python [ 1000 GB or More]

From Dev

Convert very large Blob object (about 3GB) into bytearray in Java

From Dev

Parsing large xml files

From Dev

pow or ** for very large number in Python

From Dev

Convert a very large base n number to bytes

From Dev

Convert very large number stored as string into number

From Dev

NVIDIA usable memory 4GB instead of 6GB?

From Dev

Parse very large CSV files with C++

From Dev

grep not performing very well on large files, is there an alternative?

From Dev

Optimizing searches in very large csv files

From Dev

numpy.savetxt() outputs very large files

From Dev

Rename files with space with very large directory structure

From Dev

Optimizing searches in very large csv files

From Dev

Where & how to edit very large video files?

From Dev

Scripting for file management with a very large amount of files

From Dev

Parsing large XML file in R is very slow

From Dev

Cleaning up large XML files in Python (stream parse)

From Dev

Cleaning up large XML files in Python (stream parse)

From Dev

Ext4 and Linux - very large number of files in one directory - operations

From Dev

Fix format of VERY LARGE json file

Related Related

HotTag

Archive