Regex issue in gsub

  • A+

I have defined

vec <- "5f 110y, Fast" gsub("[//s0-9a-z]+,", "", vec) 

gives "5f Fast"

I would have expected it to give "Fast" since everything before the comma should get matched by the regex.

Can anyone explain to me why this is not the case?


You should keep in mind that, in TRE regex patterns, you cannot use regex escapes like /s, /d, /w.

So, the regex in your case, "[//s0-9a-z]+,", matches 1 or more /, s, digits and lowercase ASCII letters, and then a single ,.

You may use POSIX character classes instead, like [:space:] (any whitespaces) or [:blank:] (horizontal whitespaces):

> gsub("[[:space:]0-9a-z]+,", "", vec) [1] " Fast" 

Or, use a PCRE regex with /s and perl=TRUE argument:

> gsub("[//s0-9a-z]+,", "", vec, perl=TRUE) [1] " Fast" 

To make /s match all Unicode whitespaces, add (*UCP) PCRE verb at the pattern start: gsub("(*UCP)[//s0-9a-z]+,", "", vec, perl=TRUE).


:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: