我有一个查询,对于特定的行来说很慢。我假设Postgres选择对某些行Seq Scan
使用a而不是使用an Index Scan
,因为它实际上比使用索引要快。
以下是针对正常工作量使用索引的查询计划:http : //explain.depesz.com/s/1A2o:
EXPLAIN (ANALYZE, BUFFERS) SELECT "blocks".* FROM "blocks" INNER JOIN "jobs" ON "blocks"."job_id" = "jobs"."id" WHERE "jobs"."project_id" = 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.71..166.27 rows=19 width=130) (actual time=0.092..4.247 rows=2421 loops=1)
Buffers: shared hit=350
-> Index Scan using index_jobs_on_project_id on jobs (cost=0.29..18.81 rows=4 width=4) (actual time=0.044..0.099 rows=15 loops=1)
Index Cond: (project_id = 1)
Buffers: shared hit=17
-> Index Scan using index_blocks_on_job_id on blocks (cost=0.42..36.67 rows=19 width=130) (actual time=0.021..0.133 rows=161 loops=15)
Index Cond: (job_id = jobs.id)
Buffers: shared hit=333
Total runtime: 4.737 ms
(9 rows)
下面是查询计划,选择对不太常见的工作量进行顺序扫描:http : //explain.depesz.com/s/cJOd:
EXPLAIN (ANALYZE, BUFFERS) SELECT "blocks".* FROM "blocks" INNER JOIN "jobs" ON "blocks"."job_id" = "jobs"."id" WHERE "jobs"."project_id" = 2;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=1138.64..11236.94 rows=10421 width=130) (actual time=5.212..72.604 rows=2516 loops=1)
Hash Cond: (blocks.job_id = jobs.id)
Buffers: shared hit=5671
-> Seq Scan on blocks (cost=0.00..8478.06 rows=303206 width=130) (actual time=0.008..24.573 rows=298084 loops=1)
Buffers: shared hit=5446
-> Hash (cost=1111.79..1111.79 rows=2148 width=4) (actual time=3.346..3.346 rows=2164 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 77kB
Buffers: shared hit=225
-> Bitmap Heap Scan on jobs (cost=40.94..1111.79 rows=2148 width=4) (actual time=0.595..2.158 rows=2164 loops=1)
Recheck Cond: (project_id = 2)
Buffers: shared hit=225
-> Bitmap Index Scan on index_jobs_on_project_id (cost=0.00..40.40 rows=2148 width=0) (actual time=0.516..0.516 rows=2164 loops=1)
Index Cond: (project_id = 2)
Buffers: shared hit=8
Total runtime: 72.767 ms
(15 rows)
在第一种情况下,该项目有15个工作和2421个块。在第二种情况下,该项目有2164个工作和2516个块。
有没有一种方法可以查询此数据,以便第二个工作负载不会那么慢?还是我正在处理某种最坏情况下的性能工作负载?
编辑
将random_page_cost更新为1.1并针对慢速查询重新运行EXPLAIN后:http ://explain.depesz.com/s/xKdd
EXPLAIN (ANALYZE, BUFFERS) SELECT "blocks".* FROM "blocks" INNER JOIN "jobs" ON "blocks"."job_id" = "jobs"."id" WHERE "jobs"."project_id" = 2;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.71..7634.08 rows=10421 width=130) (actual time=0.025..10.597 rows=2516 loops=1)
Buffers: shared hit=9206
-> Index Scan using index_jobs_on_project_id on jobs (cost=0.29..1048.99 rows=2148 width=4) (actual time=0.015..1.239 rows=2164 loops=1)
Index Cond: (project_id = 32357)
Buffers: shared hit=225
-> Index Scan using index_blocks_on_job_id on blocks (cost=0.42..2.88 rows=19 width=130) (actual time=0.003..0.003 rows=1 loops=2164)
Index Cond: (job_id = jobs.id)
Buffers: shared hit=8981
Total runtime: 10.925 ms
(9 rows)
好多了!看来我需要花费一些时间来调整服务器配置。
由于两次索引扫描的嵌套循环比位图索引扫描的hashjoin快得多random_page_cost
,因此,至少在数据缓存在RAM或中时,您不能准确反映您的实际性能shared_buffers
。
尝试设置SET random_page_cost = 1.1
并在该会话中重新运行。您可能还想更多work_mem
地解决这个问题。
如果random_page_cost
调整有效,您可能需要更新postgresql.conf
以反映该调整。注意1.1是一个非常极端的设置;默认值为4,seq_page_cost
为1,因此在配置文件中,我将从2或1.5开始,以避免使其他计划变得更糟。
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句