I have a problem with the executing speed of my sql query to postgres database.
I have 2 tables:
table 1: DEVICES
ID | NAME
------------------
1 | first device
2 | second device
table 2: DATA
ID | DEVICE_ID | TIME | DATA
--------------------------------------------
1 | 1 | 2016-07-14 2:00:00 | data1
2 | 1 | 2016-07-14 1:00:00 | data2
3 | 2 | 2016-07-14 4:00:00 | data3
4 | 1 | 2016-07-14 3:00:00 | data4
5 | 2 | 2016-07-14 6:00:00 | data5
6 | 2 | 2016-07-14 5:00:00 | data6
I need get this select's result table:
ID | DEVICE_ID | TIME | DATA
-------------------------------------------
4 | 1 | 2016-07-14 3:00:00 | data4
5 | 2 | 2016-07-14 6:00:00 | data5
i.e. for each device in devices table I need to get only one data record with the last TIME value.
This is my sql query:
SELECT * FROM db.data d
WHERE d.time = (
SELECT MAX(d2.time) FROM db.data d2
WHERE d2.device_id = d.device_id);
This is HQL query equivalent:
SELECT d FROM Data d
WHERE d.time = (
SELECT MAX(d2.time) FROM Data d2
WHERE d2.device.id = t2.device.id)
Yes, I use Hibernate ORM in my project - may this info will be useful for someone.
I got correct answer on my queries, BUT it's too long - about 5-10 seconds on 10k records in data table and only 2 devices in devices table. It's terrible.
First of all, I thought that problem is in Hibernate. But native sql query from psql in linux terminal execute the same time as through hibernate.
How can I optimize my query? This query is too complexity:
O(device_count * data_count^2)
Since you're using Postgres, you could use window functions to achieve this, like so:
select
sq.id,
sq.device_id,
sq.time,
sq.data
from (
select
data.*,
row_number() over (partition by data.device_id order by data.time desc) as rnk
from
data
) sq
where
sq.rnk = 1
The row_number()
window function first ranks the rows in the data
table on the basis of the device_id
and time
columns, and the outer query then picks the highest-ranked rows.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments