Find strings that contain a sequence of characters regardless of the order in r

  • A+
Category:Languages

I have a dataframe(df)

    V1    V2 1 "BCC"  Yes 2 "ABB"  Yes 

I want to find all the strings that contain a certain sequence of characters, regardless of the order. For example if I have the string "CBC" or "CCB" I would like to get

    V1    V2 1 "BCC"  Yes 

I've tried with grep, but It only finds the matching patterns

>df[grep("CBC", df$V1),] 1  V1   V2 <0 rows> (or 0-length row.names)  >df[grep("BCC", df$V1),]    V1   V2 1 "BCC" Yes 

 


We can create a logical index by splitting the column

i1 <- sapply(strsplit(df$V1, ""), function(x) all(c("B", "C") %in% x)) df[i1, , drop = FALSE] #   V1  V2 #1 BCC Yes 

if we have two datasets and one is a lookup table ('df2'), then split the column into characters,paste the sorted elements, and use %in% to create the logical vector for filtering the rows

v1n <- sapply(strsplit(df1$v1, ""), function(x) paste(sort(x), collapse="")) v1l <- sapply(strsplit(df2$v1, ""), function(x) paste(sort(x), collapse="")) df1[v1n %in% v1l, , drop = FALSE] 

data

df1 <- data.frame(v1 = c("BCC", "CAB" , "ABB", "CBC", "CCB", "BAB", "CDB"),      stringsAsFactors = FALSE) df2 <- data.frame(v1 = c("CBC", "ABB"), stringsAsFactors = FALSE) 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: