The query time for a very simple min-max no-join no-nesting SQL is taking more than 2 seconds.
THE TABLE STRUCTURE:::
> DESCRIBE tbl;
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| ... | ... | ... | ... | ... | ... |
| created_at | datetime | YES | MUL | NULL | |
+-------------+--------------+------+-----+---------+----------------+
7 rows in set (0.00 sec)
The table contains 10,000,000+ rows
THE INDEXES IN THE TABLE:::
> SHOW INDEX IN tbl;
+-------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| tbl | 0 | PRIMARY | 1 | id | A | 10000545 | NULL | NULL | | BTREE | | |
| tbl | 1 | created_at | 1 | created_at | A | 18 | NULL | NULL | YES | BTREE | | |
+-------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.00 sec)
THE SQL UNDER CONCERN::: Finding min-max DateTime of last 10k entries
SELECT
min(created_at),
max(created_at)
FROM tbl
ORDER BY id DESC
LIMIT 10000
THE CONCERN::: First query is always taking 2+ seconds to complete. After the first select, all subsequent query calls take less than 0.001 seconds to complete unless a new row gets inserted into the table.
First call in 2.06 seconds:
> SELECT min(created_at), max(created_at) FROM tbl USE INDEX (created_at) ORDER BY id DESC LIMIT 10000;
+---------------------+---------------------+
| min(created_at) | max(created_at) |
+---------------------+---------------------+
| 2010-01-01 00:00:00 | 2015-12-28 00:00:00 |
+---------------------+---------------------+
1 row in set (2.06 sec)
Subsequent calls in 0.00 seconds:
> SELECT min(created_at), max(created_at) FROM tbl USE INDEX (created_at) ORDER BY id DESC LIMIT 10000;
+---------------------+---------------------+
| min(created_at) | max(created_at) |
+---------------------+---------------------+
| 2010-01-01 00:00:00 | 2015-12-28 00:00:00 |
+---------------------+---------------------+
1 row in set (0.00 sec)
Once a new row gets added into the table, it again takes 2+ seconds to complete and then all subsequent query calls take less than 0.001 seconds to complete.
I understand that the indexes get reshuffled every time a new row is inserted. So that's ok. However, my goal is to improve the first query time to less than a few milliseconds as taking 2+ seconds per query in a frequently updating system degrades the performance way too heavily.
THE EXPLANATION OF THE QUERY PLAN::: The explain statement shows that the query is going through almost all the rows of the table. So i am guessing i have room to improve by indexes. But what should i index?
> EXPLAIN SELECT min(created_at), max(created_at) FROM tbl ORDER BY id DESC LIMIT 10000;
+----+-------------+-------+-------+---------------+------------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------------+---------+------+----------+-------------+
| 1 | SIMPLE | tbl | index | NULL | created_at | 9 | NULL | 10000545 | Using index |
+----+-------------+-------+-------+---------------+------------+---------+------+----------+-------------+
1 row in set (0.00 sec)
Your original query does not return "the minimum/maximum date from the last 10000 entries"; the LIMIT
applies after the aggregate functions are processed, so you're asking "Give me the max/min date, then limit that to the first 10k"... And there is only one row.
You must use a subquery for this:
SELECT min(created_at), max(created_at)
FROM (SELECT created_at
FROM my_table
ORDER BY id
LIMIT 10000) subtable;
Your best bet is an index on (id, created_at)
, because the subquery will then only need to traverse the index, and then the min/max query only needs to sort over 10k elements.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments