# BINARY_CHECKSUM – different result depending on number of rows

• A+
Category：Languages

I wonder why the `BINARY_CHECKSUM` function returns different result for the same:

``SELECT *, BINARY_CHECKSUM(a,b) AS bc FROM (VALUES(1, NULL, 100),             (2, NULL, NULL),             (3, 1, 2)) s(id,a,b);  SELECT *, BINARY_CHECKSUM(a,b) AS bc FROM (VALUES(1, NULL, 100),             (2, NULL, NULL)) s(id,a,b); ``

Ouput:

``+-----+----+------+-------------+ | id  | a  |  b   |     bc      | +-----+----+------+-------------+ |  1  |    | 100  |        -109 | |  2  |    |      | -2147483640 | |  3  | 1  |   2  |          18 | +-----+----+------+-------------+  -- -109 vs 100 +-----+----+------+------------+ | id  | a  |  b   |     bc     | +-----+----+------+------------+ |  1  |    | 100  |        100 | |  2  |    |      | 2147483647 | +-----+----+------+------------+ ``

And for second sample I get what I would anticipate:

``SELECT *, BINARY_CHECKSUM(a,b) AS bc FROM (VALUES(1, 1, 100),             (2, 3, 4),             (3,1,1)) s(id,a,b);  SELECT *, BINARY_CHECKSUM(a,b) AS bc FROM (VALUES(1, 1, 100),             (2, 3, 4)) s(id,a,b); ``

Ouptut for both first two rows:

``+-----+----+------+-----+ | id  | a  |  b   | bc  | +-----+----+------+-----+ |  1  | 1  | 100  | 116 | |  2  | 3  |   4  |  52 | +-----+----+------+-----+ ``

db<>fiddle demo

It has strange consequences when I want to compare two tables/queries:

``WITH t AS (   SELECT 1 AS id, NULL AS a, 100 b   UNION ALL SELECT 2, NULL, NULL   UNION ALL SELECT 3, 1, 2           -- comment this out ), s AS (   SELECT 1 AS id ,100 AS a, NULL as b   UNION ALL SELECT 2, NULL, NULL   UNION ALL SELECT 3, 2, 1           -- comment this out ) SELECT t.*,s.*   ,BINARY_CHECKSUM(t.a, t.b) AS bc_t, BINARY_CHECKSUM(s.a, s.b) AS bc_s FROM t JOIN s   ON s.id = t.id WHERE BINARY_CHECKSUM(t.a, t.b) = BINARY_CHECKSUM(s.a, s.b); ``

db<>fiddle demo2

For 3 rows I get single result:

``+-----+----+----+-----+----+----+--------------+-------------+ | id  | a  | b  | id  | a  | b  |    bc_t      |    bc_s     | +-----+----+----+-----+----+----+--------------+-------------+ |  2  |    |    |  2  |    |    | -2147483640  | -2147483640 | +-----+----+----+-----+----+----+--------------+-------------+ ``

but for 2 rows I get also id = 1:

``+-----+----+------+-----+------+----+-------------+------------+ | id  | a  |  b   | id  |  a   | b  |    bc_t     |    bc_s    | +-----+----+------+-----+------+----+-------------+------------+ |  1  |    | 100  |  1  | 100  |    |        100  |        100 | |  2  |    |      |  2  |      |    | 2147483647  | 2147483647 | +-----+----+------+-----+------+----+-------------+------------+ ``

Remarks:

• I am not searching for alternatives like(HASH_BYTES/MD5/CHECKSUM)

• I am aware that `BINARY_CHECKSUM` could lead to collisions(two different calls produce the same output) here scenario is a bit different

For this definition, we say that null values, of a specified type, compare as equal values. If at least one of the values in the expression list changes, the expression checksum can also change. However, this is not guaranteed. Therefore, to detect whether values have changed, we recommend use of BINARY_CHECKSUM only if your application can tolerate an occasional missed change.

It is strange for me that hash function returns different result for the same input arguments. Is this behaviour by design or it is some kind of glitch?

EDIT:

As @scsimon points out it works for materialized tables but not for cte. db<>fiddle actual table

``SELECT name, system_type_name FROM sys.dm_exec_describe_first_result_set(' SELECT * FROM (VALUES(1, NULL, 100),             (2, NULL, NULL),             (3, 1, 2)) s(id,a,b)', NULL,0);  SELECT name, system_type_name FROM sys.dm_exec_describe_first_result_set(' SELECT * FROM (VALUES(1, NULL, 100),             (2, NULL, NULL)) s(id,a,b)', NULL,0)  -- working workaround SELECT name, system_type_name FROM sys.dm_exec_describe_first_result_set(' SELECT * FROM (VALUES(1, cast(NULL as int), 100),             (2, NULL, NULL)) s(id,a,b)', NULL,0) ``

For all cases all columns are `INT` but with explicit `CAST` it behaves as it should.

For the literal `NULL` without the `CAST` (and without any typed values in the column) it entirely ignores it and just gives you the same result as `BINARY_CHECKSUM(b)`.

This seems to happen very early on. The initial tree representation output from

``SELECT *, BINARY_CHECKSUM(a,b) AS bc FROM (VALUES(1, NULL, 100),             (2, NULL, NULL)) s(id,a,b) OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 3604); ``

Already shows that it has decided to just use one column as input to the function

``ScaOp_Intrinsic binary_checksum      ScaOp_Identifier COL: Union1008  ``

This compares with the following output for your first query

``ScaOp_Intrinsic binary_checksum      ScaOp_Identifier COL: Union1011       ScaOp_Identifier COL: Union1010  ``

If you try and get the `BINARY_CHECKSUM` with

``SELECT *, BINARY_CHECKSUM(a) AS bc FROM (VALUES(1, NULL, 100)) s(id,a,b) ``

It gives the error

Msg 8184, Level 16, State 1, Line 8 Error in binarychecksum. There are no comparable columns in the binarychecksum input.

This is not the only place where an untyped `NULL` constant is treated differently from an explicitly typed one.

Another case is

``SELECT COALESCE(CAST(NULL AS INT),CAST(NULL AS INT)) ``

vs

``SELECT COALESCE(NULL,NULL) ``

I'd err on the side of "glitch" in this case rather than "by design" though as the columns from the derived table are supposed to be `int` before they get to the checksum function.

``SELECT COALESCE(a,b) FROM (VALUES(NULL, NULL)) s(a,b) ``

Does work as expected without this glitch.