PostgreSQL: Efficiently split JSON array into rows

Aidan Published at Dev

aidan

I have a table (Table A) that includes a text column that contains JSON encoded data.

The JSON data is always an array with between one and a few thousand plain object.

I have another table (Table B) with a few columns, including a column with a datatype of 'JSON'

I want to select all the rows from table A, split the json array into its elements and insert each element into table B

Bonus objective: Each object (almost) always has a key, x. I want to pull the value of x out into column, and delete x from the original object (if it exists).

E.g.: Table A

| id | json_array (text)              |
+----+--------------------------------+
|  1 | '[{"x": 1}, {"y": 8}]'         |
|  2 | '[{"x": 2, "y": 3}, {"x": 1}]' |
|  3 | '[{"x": 8, "z": 2}, {"z": 3}]' |
|  4 | '[{"x": 5, "y": 2, "z": 3}]'   |

...would become: Table B

| id | a_id | x    | json (json)        |
+----+------+------+--------------------+
|  0 |    1 |    1 | '{}'               |
|  1 |    1 | NULL | '{"y": 8}'         |
|  2 |    2 |    2 | '{"y": 3}'         |
|  3 |    2 |    1 | '{}'               |
|  4 |    3 |    8 | '{"y": 2}'         |
|  5 |    3 | NULL | '{"z": 3}'         |
|  6 |    4 |    5 | '{"y": 2, "z": 3}' |

This initially has to work on a few million rows, and would then need to be run at regular intervals, so making it efficient would be a priority.

Is it possible to do this without using a loop and PL/PgSQL? I haven't been making much progress.

Patrick

The json data type is not particularly suitable (or intended) for modification at the database level. Extracting "x" objects from the JSON object is therefore cumbersome, although it can be done.

You should create your table B (with hopefully a more creative column name than "json"; I am using item here) and make the id column a serial that starts at 0. A pure json solution then looks like this:

INSERT INTO b (a_id, x, item)
  SELECT sub.a_id, sub.x,
         ('{' ||
         string_agg(
             CASE WHEN i.k IS NULL THEN '' ELSE '"' || i.k || '":' || i.v END,
             ', ') ||
         '}')::json
  FROM (
    SELECT a.id AS a_id, (j.items->>'x')::integer AS x, j.items
    FROM a, json_array_elements(json_array) j(items) ) sub
  LEFT JOIN json_each(sub.items) i(k,v) ON i.k <> 'x'
  GROUP BY sub.a_id, sub.x
  ORDER BY sub.a_id;

In the sub-query this extracts the a_id and x values, well as the JSON object. In the outer query the JSON object is broken into its individual pieces and the objects with key x thrown out (the LEFT JOIN ON i.k <> 'x'). In the select list the pieces are put back together again with string concatenation and grouped into compound objects.

This necessarily has to be like this because json has no built-in manipulation functions of any consequence. This works on PG versions 9.3+, i.e. since time immemorial insofar as JSON support is concerned.

If you are using PG9.5+, the solution is much simpler through a cast to jsonb:

INSERT INTO b (a_id, x, item)
  SELECT a.id, (j.items->>'x')::integer, j.items #- '{x}'
  FROM a, jsonb_array_elements(json_array::jsonb) j(items);

The #- operator on the jsonb data type does all the dirty work here. Obviously, there is a lot of work going on behind the scenes, converting json to jsonb, so if you find that you need to manipulate your JSON objects more frequently then you are better off using the jsonb type to begin with. In your case I suggest you do some benchmarking with EXPLAIN ANALYZE SELECT ... (you can safely forget about the INSERT while testing) on perhaps 10,000 rows to see which works best for your setup.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-03-4

Comments

0 comments

From Dev

Related Related

Article