“ WHERE”子句中具有NEWID()的奇数SQL Server(TSQL)查询结果

oz

可以肯定地说,这是一个奇怪的问题,但是对于这种行为的解释我有些困惑:

背景:(不需要知道)

因此,开始时,我正在编写一个快速查询,并粘贴的列表,UNIQUERIDENTIFIER并希望它们在WHERE X IN (...)子句中统一过去,我UNIQUERIDENTIFIER在列表的顶部使用了一个空(全零),因此我可以粘贴以下形式的统一集UNIQUERIDENTIFIER,'XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX'这次,为避免碰到零,我插入了一个NEWID()想法,认为几乎不可能发生碰撞的可能性,这令我惊讶地产生了成千上万的额外结果,例如桌子的50%以上。

开始提问:(您确实需要了解的部分)

该查询:

-- SETUP: (i boiled this down to the bare minimum)
-- just creating a table with 500 PK UNIQUERIDENTIFIERs
IF (OBJECT_ID('tempdb..#wtfTable') IS NOT NULL) DROP TABLE #wtfTable; 

CREATE TABLE #wtfTable (WtfId UNIQUEIDENTIFIER PRIMARY KEY);

INSERT INTO #wtfTable 
SELECT TOP(500) NEWID()
FROM master.sys.all_objects o1 (NOLOCK)
CROSS JOIN master.sys.all_objects o2 (NOLOCK);

-- ACTUAL QUERY:
SELECT *
FROM #wtfTable
WHERE [WtfId] IN ('00000000-0000-0000-0000-000000000000', NEWID());

...应该从统计上产生bupkis。但是,如果您运行十次左右,有时您会得到大量选择。例如,在上一次运行中,我收到465/500行,这意味着返回了93%以上的行。

结果画面

虽然我知道NEWID()将按行重新计算,但在地狱中并没有统计到如此之高的机会。我在这里写的所有内容都是产生细微差别所必需的SELECT,删除任何东西都可以防止它的发生。顺便说一句,您可以将替换INWHERE WtfId = '...' OR WtfId = NEWID(),仍然会得到相同的结果。我正在使用最新的SQL SERVER 2014 Standard修补程序,没有我知道的任何奇特设置被激活。

那么外面有人知道这是怎么回事吗?提前致谢。


编辑:

'00000000-0000-0000-0000-000000000000'是一个红色鲱鱼,这是一个适用于整数的版本:(有趣的是,我需要将表的大小用整数提高到1000,以产生有问题的查询计划...)

IF (OBJECT_ID('tempdb..#wtfTable') IS NOT NULL) DROP TABLE #wtfTable; 

CREATE TABLE #wtfTable (WtfId INT PRIMARY KEY);

INSERT INTO #wtfTable 
SELECT DISTINCT TOP(1000) CAST(CAST('0x' + LEFT(NEWID(), 8) AS VARBINARY) AS INT)
FROM sys.tables o1 (NOLOCK)
CROSS JOIN sys.tables o2 (NOLOCK);

SELECT *
FROM #wtfTable
WHERE [WtfId] IN (0, CAST(CAST('0x' + LEFT(NEWID(), 8) AS VARBINARY) AS INT));

或者您可以只替换文字UNIQUEIDENTIFIER并执行以下操作:

DECLARE @someId UNIQUEIDENTIFIER = NEWID();

SELECT *
FROM #wtfTable
WHERE [WtfId] IN (@someId, NEWID());

两者都产生相同的结果...问题是为什么会这样?

弗拉基米尔·巴拉诺夫(Vladimir Baranov)

让我们看一下执行计划。

计划

寻找

在此查询的特定运行中,Seek返回了51行,而不是估计的1行。


以下实际查询生成的形状相同的计划,但是更容易分析,因为我们有两个变量@ID1@ID2,您可以在计划中对其进行跟踪。

CREATE TABLE #wtfTable (WtfId UNIQUEIDENTIFIER PRIMARY KEY);

INSERT INTO #wtfTable 
SELECT TOP(500) NEWID()
FROM master.sys.all_objects o1 (NOLOCK)
CROSS JOIN master.sys.all_objects o2 (NOLOCK);

DECLARE @ID1 UNIQUEIDENTIFIER;
DECLARE @ID2 UNIQUEIDENTIFIER;

SELECT TOP(1) @ID1 = WtfId
FROM #wtfTable
ORDER BY WtfId;

SELECT TOP(1) @ID2 = WtfId
FROM #wtfTable
ORDER BY WtfId DESC;

-- ACTUAL QUERY:
SELECT *
FROM #wtfTable
WHERE WtfId IN (@ID1, @ID2);

DROP TABLE #wtfTable;

如果仔细检查该计划中的运算符,您会看到IN查询的一部分被转换为具有两行三列的表。Concatenation运营商将返回该表。该帮助器表中的每一行都定义了索引范围。

ExpFrom    ExpTo    ExpFlags
@ID1       @ID1     62
@ID2       @ID2     62

Internal ExpFlags specify what kind of range seek is needed (<, <=, >, >=). If you add more variables to IN clause you'll see them in the plan concatenated to this helper table.

Sort and Merge Interval operators make sure that any possible overlapping ranges are merged. See detailed post about Merge Interval operator by Fabiano Amorim which examines the plans with this shape. Here is another good post about this plan shape by Paul White.

In the end the helper table with two rows is joined with the main table and for each row in the helper table there is a range seek in the clustered index from ExpFrom to ExpTo, which is shown in the Index Seek operator. The Seek operator shows < and >, but it is misleading. The actual comparison is defined internally by the Flags value.

If you had some different set of ranges, for example:

WHERE 
    ([WtfId] >= @ID1 AND [WtfId] < @ID2)
    OR [WtfId] = @ID3

, you would still see the same shape of the plan with the same seek predicate, but different Flags values.

So, there are two seeks:

from @ID1 to @ID1, which returns one row
from @ID2 to @ID2, which returns one row

In the query with variables internal expressions lead to getting values from the variables when needed. The value of the variable doesn't change during the query execution and everything behaves correctly as expected.

How NEWID() affects it

When we use NEWID as in your example:

SELECT *
FROM #wtfTable
WHERE WtfId IN ('00000000-0000-0000-0000-000000000000', NEWID());

the plan and all internal processing is the same as for variables.

The difference is that this internal table effectively becomes:

ExpFrom    ExpTo    ExpFlags
0...0      0...0    62
NEWID()    NEWID()  62

NEWID() is called two times. Naturally, each call produces a different value, which by chance results in a range that covers some existing values in the table.

There are two range scans of the clustered index with ranges

from `0...0`     to `0...0`
from `some_id_1` to `some_id_2`

Now it is easy to see how such query can return some rows, even though the chances of NEWID collision is very small.


Apparently, optimiser thinks that it can call NEWID twice instead of remembering the first generated random value and using it further in the query. There have been other cases when optimiser called NEWID more times than expected producing similar seemingly impossible results.

For example:

Is it legal for SQL Server to fill PERSISTED columns with data that does not match the definition?

Inconsistent results with NEWID() and PERSISTED computed column

Optimiser should know that NEWID() is non-deterministic. Overall, it feels like a bug.

I don't know anything about SQL Server internals, but my wild guess looks like this: There are runtime constant functions like RAND(). NEWID() was put into this category by mistake. Then somebody noticed that people do not expect it to return the same ID in the same fashion as RAND() returns the same random number for each invocation. And they patched it by actually regenerating new ID each time NEWID() appears in expressions. But overall rules for optimiser remained the same as for RAND(), so higher level optimiser thinks that all invocations of NEWID() return the same value and freely rearranges expressions with NEWID() which leads to unexpected results.


There is another question about a similar strange behaviour of NEWID():

联接的虚拟表中的NEWID()导致意外的交叉应用行为

答案是说,有一个Connect错误报告,并且将其关闭为“无法修复”。Microsoft的评论实质上说这种行为是设计使然。

优化器不保证标量函数的执行时间或执行次数。这是一个长久以来的宗旨。这是基本的“余地”,它使优化程序有足够的自由来获得查询计划执行的重大改进。

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章

来自分类Dev

“ WHERE”子句中具有NEWID()的奇数SQL Server(TSQL)查询结果

来自分类Dev

在where子句中具有相关子查询的SQL查询

来自分类Dev

SQL Server:WHERE子句中具有IN条件的CASE语句

来自分类Dev

SQL Server:如何提高WHERE子句中具有多个CTE和子查询的查询的性能

来自分类Dev

在where in子句中返回所有结果(SQL Server)

来自分类Dev

在where子句中具有解码和比较的SQL查询

来自分类Dev

where子句中具有聚合的复杂查询

来自分类Dev

具有替换的子查询无法在WHERE子句中识别

来自分类Dev

在where子句中具有多列的子查询

来自分类Dev

SQL Server上带有IN的WHERE子句中的CASE语法

来自分类Dev

SQL查询结果需要返回WHERE子句中的所有记录,甚至重复

来自分类Dev

当where子句中使用的索引列具有特定值时,Azure sql查询速度很慢

来自分类Dev

当where子句中使用的索引列具有特定值时,Azure sql查询速度很慢

来自分类Dev

使用查询结果确定SQL的where子句中的值?

来自分类Dev

在SQL查询的'WHERE'子句中的'SELECT'中使用'IF'的结果

来自分类Dev

如何在具有 Group By 子句的 SQL 查询的 where 子句中使用算术表达式而不会出现算术溢出?

来自分类Dev

WHERE子句中嵌套SQL语句的结果

来自分类Dev

WHERE子句中嵌套SQL语句的结果

来自分类Dev

SQL查询:在WHERE子句中使用AND / OR

来自分类Dev

WHERE子句中的HIVE SQL子查询

来自分类Dev

sql查询-如果WHERE子句中存在

来自分类Dev

SQL查询的WHERE子句中的CASE语句

来自分类Dev

SQL在select或where子句中的插入查询

来自分类Dev

Where子句中的PDO SQL查询错误

来自分类Dev

WHERE子句中的SQL查询聚合函数

来自分类Dev

在sqlite的Where子句中查询结果

来自分类Dev

WHERE子句中具有复杂查询的MySQL查询优先级

来自分类Dev

SQL Server:WHERE子句中的IF条件

来自分类Dev

WHERE子句中的SQL Server浮点

Related 相关文章

  1. 1

    “ WHERE”子句中具有NEWID()的奇数SQL Server(TSQL)查询结果

  2. 2

    在where子句中具有相关子查询的SQL查询

  3. 3

    SQL Server:WHERE子句中具有IN条件的CASE语句

  4. 4

    SQL Server:如何提高WHERE子句中具有多个CTE和子查询的查询的性能

  5. 5

    在where in子句中返回所有结果(SQL Server)

  6. 6

    在where子句中具有解码和比较的SQL查询

  7. 7

    where子句中具有聚合的复杂查询

  8. 8

    具有替换的子查询无法在WHERE子句中识别

  9. 9

    在where子句中具有多列的子查询

  10. 10

    SQL Server上带有IN的WHERE子句中的CASE语法

  11. 11

    SQL查询结果需要返回WHERE子句中的所有记录,甚至重复

  12. 12

    当where子句中使用的索引列具有特定值时,Azure sql查询速度很慢

  13. 13

    当where子句中使用的索引列具有特定值时,Azure sql查询速度很慢

  14. 14

    使用查询结果确定SQL的where子句中的值?

  15. 15

    在SQL查询的'WHERE'子句中的'SELECT'中使用'IF'的结果

  16. 16

    如何在具有 Group By 子句的 SQL 查询的 where 子句中使用算术表达式而不会出现算术溢出?

  17. 17

    WHERE子句中嵌套SQL语句的结果

  18. 18

    WHERE子句中嵌套SQL语句的结果

  19. 19

    SQL查询:在WHERE子句中使用AND / OR

  20. 20

    WHERE子句中的HIVE SQL子查询

  21. 21

    sql查询-如果WHERE子句中存在

  22. 22

    SQL查询的WHERE子句中的CASE语句

  23. 23

    SQL在select或where子句中的插入查询

  24. 24

    Where子句中的PDO SQL查询错误

  25. 25

    WHERE子句中的SQL查询聚合函数

  26. 26

    在sqlite的Where子句中查询结果

  27. 27

    WHERE子句中具有复杂查询的MySQL查询优先级

  28. 28

    SQL Server:WHERE子句中的IF条件

  29. 29

    WHERE子句中的SQL Server浮点

热门标签

归档