regexpr syntax in R

Going crazy here over syntax of regexpr in R

I am trying the following which should allow me to get everything between productUrl:// and the following ?


The above works on

I am then trying to escape the backslashes to fit that string into the grep function but with no luck. What is the proper way of doing it ?


Added link to example

EDIT2: I actually need to extract the substrings that match my pattern so grep may be used in conjunction with another function.


Note you do not need to escape / in R regex patterns as they are defined with string literals and / is not a special regex metacharacter. If you want to write a " inside "..." string literal, you should escape it with a single /, as you are already doing.

You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=/?) into a negated character class:

grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE) 

The [^?]* negated character class matches any 0 or more chars other than ?.

If the string you are checking against has no double quotes remove them from the lookbehind:

grep('(?<=productUrl://)([^?]*)', x, perl=TRUE) 

Instead of the lookbehind, you may also use /K to omit the part of text matched:

grep('productUrl:////K[^?]*', x, perl=TRUE)                    ^^^  

Actually, you do not even need the capturing group in your pattern.

Solving the actual task

You cannot extract substrings with grep in R, you can only find/identify elements to fetch from a character vector using grep. To extract substrings, you need to use base R regmatches or stringr str_extract/str_extract_all or similar match functions.

Example with base R:

> x <- '":"ppath","value":[],"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":[{"name":"BRAND/'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":[{"domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0}],/n"productUrl":"//","image":"",/n"productUrl":"//","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"}],"restrictedAge":0,"categories":[1438,1565,4776,7305' > regmatches(x, gregexpr('"productUrl":"//K[^?"]*', x, perl=TRUE)) [[1]] [1] "//" [2] "//" 

With stringr:

> library(stringr) > str_extract_all(x, '(?<="productUrl":")[^?"]*') [[1]] [1] "//" [2] "//" 


