dplyr::select() with some variables that may not exist in the data frame?

  • A+

I have a helper function (say foo()) that will be run on various data frames that may or may not contain specified variables. Suppose I have

library(dplyr) d1 <- data_frame(taxon=1,model=2,z=3) d2 <- data_frame(taxon=2,pss=4,z=3) 

The variables I want to select are

vars <- intersect(names(data),c("taxon","model","z")) 

that is, I'd like foo(d1) to return the taxon, model, and z columns, while foo(d2) returns just taxon and z.

If foo contains select(data,c(taxon,model,z)) then foo(d2) fails (because d2 doesn't contain model). If I use select(data,-pss) then foo(d1) fails similarly.

I know how to do this if I retreat from the tidyverse (just return data[vars]), but I'm wondering if there's a handy way to do this either (1) with a select() helper of some sort (tidyselect::select_helpers) or (2) with tidyeval (which I still haven't found time to get my head around!)


Another option is select_if:

d2 %>% select_if(names(.) %in% c('taxon', 'model', 'z'))  # # A tibble: 1 x 2 #   taxon     z #   <dbl> <dbl> # 1     2     3 


:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: