Subset string by counting specific characters

  • A+
Category:Languages

I have the following strings:

strings <- c("ABBSDGNHNGA", "AABSDGDRY", "AGNAFG", "GGGDSRTYHG")  

I want to cut off the string, as soon as the number of occurances of A, G and N reach a certain value, say 3. In that case, the result should be:

some_function(strings)  c("ABBSDGN", "AABSDG", "AGN", "GGG")  

I tried to use the stringi, stringr and regex expressions but I can't figure it out.

 


Here is a base R option using strsplit

sapply(strsplit(strings, ""), function(x)     paste(x[1:which.max(cumsum(x %in% c("A", "G", "N")) == 3)], collapse = "")) #[1] "ABBSDGN" "AABSDG"  "AGN"     "GGG" 

Or in the tidyverse

library(tidyverse) map_chr(str_split(strings, ""),      ~str_c(.x[1:which.max(cumsum(.x %in% c("A", "G", "N")) == 3)], collapse = "")) 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: