How to Filter out Rows per Group after Condition Occurrs

  • A+
Category:Languages

I am new to R programming and attempting to remove certain rows per a group of rows after a filtering criteria has been met.

Scenario: For each GROUP, if 2 TYPE "B" are in a row, remove all the following rows for that GROUP. The "Include in DataSet" column shows what the output should be.

Here is my example input:

GROUP   TYPE    Include in DataSet? -------------------------------------------- 1       A       yes 1       A       yes 1       B       yes 1       B       yes 1       B       no 2       A       yes 2       B       yes 2       B       yes 2       A       no 2       B       no 2       B       no  DF = structure(list(GROUP = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,  2L, 2L), TYPE = c("A", "A", "B", "B", "B", "A", "B", "B", "A",  "B", "B"), inc = c("yes", "yes", "yes", "yes", "no", "yes", "yes",  "yes", "no", "no", "no")), .Names = c("GROUP", "TYPE", "inc"), row.names = c(NA,  -11L), class = "data.frame") 

Expected Output:

GROUP   TYPE    Include in DataSet? -------------------------------------------- 1       A       yes 1       A       yes 1       B       yes 1       B       yes 2       A       yes 2       B       yes 2       B       yes 

I have tried writing some code, with no luck due to grouping issue.

i=1 j=2 x <- allrows for (i in x){   for(j in x){     if(i==j){       a$REMOVE=1     }     else{       a$REMOVE=2     }   } } 

 


You could do this by creating a new variable that identifies "double B" rows, then filter out rows after the first "double B" row in the group:

library(dplyr) df %>%     group_by(GROUP) %>%     # Create new variable that tests if each row and the one below it TYPE==B     mutate(double_B = (TYPE == 'B' & lag(TYPE) == 'B')) %>%     # Find the first row with `double_B` in each group, filter out rows after it     filter(row_number() <= min(which(double_B == TRUE))) %>%     # Optionally, remove `double_B` column when done with it     select(-double_B)  # A tibble: 7 x 3 # Groups:   GROUP [2]   GROUP TYPE  IncludeinDataSet   <int> <chr> <chr>            1     1 A     yes              2     1 A     yes              3     1 B     yes              4     1 B     yes              5     2 A     yes              6     2 B     yes              7     2 B     yes        

As @Frank points out in the comment, you don't need to create the double_B variable: you can just test for the "double B" condition in the which statement inside the filter:

df %>%     group_by(GROUP) %>%     # Find the first row with `double_B` in each group, filter out rows after it     filter(row_number() <= min(which(TYPE == 'B' & lag(TYPE) == 'B'))) 

Also, it will return a warning if no "double B" condition is found in a group, but will still filter properly

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: