Process large JSON stream with jq

Martin Preusse

I get a very large JSON stream (several GB) from curl and try to process it with jq.

The relevant output I want to parse with jq is packed in a document representing the result structure:

{
  "results":[
    {
      "columns": ["n"],

      // get this
      "data": [    
        {"row": [{"key1": "row1", "key2": "row1"}], "meta": [{"key": "value"}]},
        {"row": [{"key1": "row2", "key2": "row2"}], "meta": [{"key": "value"}]}
      //  ... millions of rows      

      ]
    }
  ],
  "errors": []
}

I want to extract the row data with jq. This is simple:

curl XYZ | jq -r -c '.results[0].data[0].row[]'

Result:

{"key1": "row1", "key2": "row1"}
{"key1": "row2", "key2": "row2"}

However, this always waits until curl is completed.

I played with the --stream option which is made for dealing with this. I tried the following command but is also waits until the full object is returned from curl:

curl XYZ | jq -n --stream 'fromstream(1|truncate_stream(inputs)) | .[].data[].row[]'

Is there a way to 'jump' to the data field and start parsing row one by one without waiting for closing tags?

peak

(1) The vanilla filter you would use would be as follows:

jq -r -c '.results[0].data[].row'

(2) One way to use the streaming parser here would be to use it to process the output of .results[0].data, but the combination of the two steps will probably be slower than the vanilla approach.

(3) To produce the output you want, you could run:

jq -nc --stream '
  fromstream(inputs
    | select( [.[0][0,2,4]] == ["results", "data", "row"])
    | del(.[0][0:5]) )'

(4) Alternatively, you may wish to try something along these lines:

jq -nc --stream 'inputs
      | select(length==2)
      | select( [.[0][0,2,4]] == ["results", "data", "row"])
      | [ .[0][6], .[1]] '

For the illustrative input, the output from the last invocation would be:

["key1","row1"] ["key2","row1"] ["key1","row2"] ["key2","row2"]

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Java

Java Spliterator : How to process large Stream splits equally?

From Java

Improving performance when using jq to process large files

From Dev

Parsing json using jq

From Dev

Generate a large stream for testing

From Dev

How to process stream of streams?

From Dev

Cannot stream large files

From Dev

How to convert a JSON object stream into an array with jq

From Dev

JSON JQ if without else

From Dev

Python - How to stream large (11 gb) JSON file to be broken up

From Dev

Android: Stream a large JSON Array produced from loopj's AsyncHttpClient

From Dev

denormalizing JSON with jq

From Dev

json remapping with jq

From Dev

Convert string to json in jq

From Dev

Faster way to process 1.2 million JSON geolocation queries from large dataframe

From Dev

'How to stream a large json doc through the node.js mongodb driver efficiently?

From Dev

Using jq to "normalize" JSON

From Dev

Java Spliterator : How to process large Stream splits equally?

From Dev

Format json output with jq

From Dev

Reshape JSON with JQ

From Dev

Process Large Files in Java

From Dev

extracting json using jq

From Dev

JQ: Enumerate Object Stream

From Dev

Parsing JSON with JQ

From Dev

Android: Stream a large JSON Array produced from loopj's AsyncHttpClient

From Dev

denormalizing JSON with jq

From Dev

jq streaming of large json files to get only objects whose properties have a specific value

From Dev

Json transformation using jq

From Dev

jq add large value to existing json

From Dev

How to sort a stream of json objects by field value using jq