Find all records which have multiple values in a column in R

  • A+
Category:Languages

For a sample dataframe:

df <- structure(list(code = c("a1", "a1", "b2", "v4", "f5", "f5", "h7",         "a1"), name = c("katie", "katie", "sally", "tom", "amy", "amy",         "ash", "james"), number = c(3.5, 3.5, 2, 6, 4, 4, 7, 3)), .Names = c("code",         "name", "number"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,         -8L), spec = structure(list(cols = structure(list(code = structure(list(), class = c("collector_character",         "collector")), name = structure(list(), class = c("collector_character",         "collector")), number = structure(list(), class = c("collector_double",         "collector"))), .Names = c("code", "name", "number")), default = structure(list(), class = c("collector_guess",         "collector"))), .Names = c("cols", "default"), class = "col_spec")) 

I want to highlight all the records which are have two or more values of 'code' which are the same. I know I could use:

df[duplicated(df$name), ] 

But this only highlights the duplicated records, but I want all of the code values which are duplicated (i.e. 3 a1s and 2 f5s).

Any ideas?

 


df[duplicated(df$code) | duplicated(df$code, fromLast=TRUE), ]   code  name number 1   a1 katie    3.5 2   a1 katie    3.5 5   f5   amy    4.0 6   f5   amy    4.0 8   a1 james    3.0 

Another solution inspired by Alok VS:

ta <- table(df$code) df[df$code %in% names(ta[ta > 1]), ] 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: