聚合查询中如何使用索引？

Chris Middleton 发表于 Dev

克里斯·米德尔顿

给定一个查询

SELECT franchise, MAX(worth)
FROM figurines
GROUP BY franchise

什么样的索引可以加快查询速度，数据库将如何使用该索引？

如果需要更多详细信息，请假定该列franchise具有较低的基数并且worth具有很高的基数。

我个人使用的是mysql，但是我正在寻找对该算法的一般理解，而不是特定于供应商的实现细节。

艾米·B

方案1：无索引（读取整个表）

foreach(page in table.pages)
{
  foreach(row in page.rows)
  {
    Compare and accumulate franchise and worth from row
  }
}
-- Total IO = table.pages

方案2：仅专营权指数

foreach(page in index.pages)
{
  foreach(indexRow in page.rows)
  {
    tableRow = table.fetchRow(indexRow); // + 1 page of IO for each row
    Compare and accumulate franchise from indexRow and worth from tableRow
  }
}
-- Total IO = index.pages + table.rows
-- this is likely to be greater than Scenario 1...
--  so optimizer should prefer that plan instead.

方案3：按该顺序覆盖索引（特许经营权，价值）。

foreach(page in index.pages)
{
  foreach(row in page.rows)
  {
    Compare and accumulate franchise and worth from row
  }
}
-- Total IO = index.pages
-- Assuming that index is thinner than table, a win!

方案4：方案3中带有索引的特许经营者列表的不同查询

foreach(franchise in franchises)
{
  SELECT MAX(worth) FROM figurines WHERE franchise = franchise
}

...

foreach(franchise in franchises)
{
  search into the index looking for the last record with this franchise
  // this is usually less than 10 pages of IO in my experience.
}
-- Total IO = count of franchise * 10
-- super win!

方案4有所不同，因为它为搜索而不是扫描开具发票。

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。