STRING_AGG not behaving as expected

  • A+
Category:Languages

I have the following query:

WITH cteCountryLanguageMapping AS (     SELECT * FROM (         VALUES             ('Spain', 'English'),             ('Spain', 'Spanish'),             ('Sweden', 'English'),             ('Switzerland', 'English'),             ('Switzerland', 'French'),             ('Switzerland', 'German'),             ('Switzerland', 'Italian')     ) x ([Country], [Language]) ) SELECT     [Country],     CASE COUNT([Language])         WHEN 1 THEN MAX([Language])         WHEN 2 THEN STRING_AGG([Language], ' and ')         ELSE STRING_AGG([Language], ', ')     END AS [Languages],     COUNT([Language]) AS [LanguageCount] FROM cteCountryLanguageMapping GROUP BY [Country] 

I was expecting the value inside Languages column for Switzerland to be comma separated i.e.:

  | Country     | Languages                                 | LanguageCount --+-------------+-------------------------------------------+-------------- 1 | Spain       | Spanish and English                       | 2 2 | Sweden      | English                                   | 1 3 | Switzerland | French, German, Italian, English          | 4 

Instead I am getting the below output (the 4 values are separated by and):

  | Country     | Languages                                 | LanguageCount --+-------------+-------------------------------------------+-------------- 1 | Spain       | Spanish and English                       | 2 2 | Sweden      | English                                   | 1 3 | Switzerland | French and German and Italian and English | 4 

What am I missing?


Here is another example:

SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG(z, '-') AS STRING_AGG_MINUS FROM (     VALUES         (1, 'a'),         (1, 'b') ) x (y, z) GROUP by y    | y | STRING_AGG_PLUS | STRING_AGG_MINUS --+---+-----------------+----------------- 1 | 1 | a+b             | a+b 

Is this a bug in SQL Server?

 


Yes, this is a Bug (tm), present in (as of writing) versions up to SQL Server 2017 CU11 (but not, according to @DanGuzman, in Azure SQL Database, so apparently it's already fixed and the fix could land in the next CU). Specifically, the part in the optimizer that performs common subexpression elimination (ensuring that we don't calculate expressions more than necessary) improperly considers all expressions of the form STRING_AGG(x, <separator>) identical as long as x matches, no matter what <separator> is, and unifies these with the first calculated expression in the query.

One workaround is to make sure x does not match by performing some sort of (near-)identity transformation on it. Since we're dealing with strings, concatenating an empty one will do:

SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG('' + z, '-') AS STRING_AGG_MINUS FROM (     VALUES         (1, 'a'),         (1, 'b') ) x (y, z) GROUP by y 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: