regex_replace, why does it lose the $1?

  • A+
Category:Languages
string s = " 'I'd go.' "; s = std::regex_replace(s, std::regex("((^| )')|('($| ))"), "$1(Quotation, )"); cout << s; // '(Quotation, )I'd go.(Quotation, ) 

I want to replace the ' with (Quotation, ), and I don't want to lose the original '. So, I use $1 to mean the original '. And I don't want to replace the ' of I'd.

^ means if the ' is at the start of the string it would be replaced. $ means the end of the string.

The result is supposed to be:

'(Quotation, )I'd go.' (Quotation, )

But actually the result is

'(Quotation, )I'd go.(Quotation, )

The left quotation replacement works fine, but the right loses the '. Why?

 


It happens because the ' at the end of the string is captured in Group 3:

((^| )')|('($| )) || 2 |   | |  1   | | | 4 |          |  3   | 

You may refer to each of the groups with $1, $2, $3 and $4, and more, you may even refer to the whole match using $& replacement backreferences.

So adding $3 can solve the issue:

s = std::regex_replace(s, std::regex("((^| )')|('($| ))"), "$1$3(Quotation, )"); // =>  '(Quotation, )I'd go.' (Quotation, ) 

See the C++ demo

An alternative solution might look like

s = std::regex_replace(s, std::regex("(?:^|//s)'|'(?!//S)"), "$&(Quotation, )"); 

The (?:^|/s)'|'(?!/S) regex matches

  • (?:^|/s)' - start of string or a whitespace char and a ' after them
  • | - or
  • '(?!/S) - a ' that is followed with a whitespace or end of string.

The $& inserts the match back into the result upon a replacement. See this regex demo online (do not pay attention at the replacement there, the site does not support $& backreference).

NOTE: If you are using the latest compiler, you may use raw string literals when defining regexps, R"((?:^|//s)'|'(?!//S))".

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: