Dynamic variable names in R regressions

  • A+
Category:Languages

Being aware of the danger of using dynamic variable names, I am trying to loop over varios regression models where different variables specifications are choosen. Usually !!rlang::sym() solves this kind of problem for me just fine, but it somehow fails in regressions. A minimal example would be the following:

y= runif(1000)  x1 = runif(1000)  x2 = runif(1000)   df2= data.frame(y,x1,x2) summary(lm(y ~ x1+x2, data=df2)) ## works  var = "x1" summary(lm(y ~ !!rlang::sym(var)) +x2, data=df2) # gives an error 

My understanding was that !!rlang::sym(var)) takes the values of var (namely x1) and puts that in the code in a way that R thinks this is a variable (not a char). BUt I seem to be wrong. Can anyone enlighten me?

 


Personally, I like to do this with some computing on the language. For me, a combination of bquote with eval is easiest (to remember).

var <- as.symbol(var) eval(bquote(summary(lm(y ~ .(var) + x2, data = df2)))) #Call: #lm(formula = y ~ x1 + x2, data = df2) # #Residuals: #     Min       1Q   Median       3Q      Max  #-0.49298 -0.26248 -0.00046  0.24111  0.51988  # #Coefficients: #            Estimate Std. Error t value Pr(>|t|)     #(Intercept)  0.50244    0.02480  20.258   <2e-16 *** #x1          -0.01468    0.03161  -0.464    0.643     #x2          -0.01635    0.03227  -0.507    0.612     #--- #Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # #Residual standard error: 0.2878 on 997 degrees of freedom #Multiple R-squared:  0.0004708,    Adjusted R-squared:  -0.001534  #F-statistic: 0.2348 on 2 and 997 DF,  p-value: 0.7908 

I find this superior to any approach that doesn't show the same call as summary(lm(y ~ x1+x2, data=df2)).

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: