Being aware of the danger of using dynamic variable names, I am trying to loop over varios regression models where different variables specifications are choosen. Usually
!!rlang::sym() solves this kind of problem for me just fine, but it somehow fails in regressions. A minimal example would be the following:
y= runif(1000) x1 = runif(1000) x2 = runif(1000) df2= data.frame(y,x1,x2) summary(lm(y ~ x1+x2, data=df2)) ## works var = "x1" summary(lm(y ~ !!rlang::sym(var)) +x2, data=df2) # gives an error
My understanding was that
!!rlang::sym(var)) takes the values of
var (namely x1) and puts that in the code in a way that R thinks this is a variable (not a char). BUt I seem to be wrong. Can anyone enlighten me?
Personally, I like to do this with some computing on the language. For me, a combination of
eval is easiest (to remember).
var <- as.symbol(var) eval(bquote(summary(lm(y ~ .(var) + x2, data = df2)))) #Call: #lm(formula = y ~ x1 + x2, data = df2) # #Residuals: # Min 1Q Median 3Q Max #-0.49298 -0.26248 -0.00046 0.24111 0.51988 # #Coefficients: # Estimate Std. Error t value Pr(>|t|) #(Intercept) 0.50244 0.02480 20.258 <2e-16 *** #x1 -0.01468 0.03161 -0.464 0.643 #x2 -0.01635 0.03227 -0.507 0.612 #--- #Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # #Residual standard error: 0.2878 on 997 degrees of freedom #Multiple R-squared: 0.0004708, Adjusted R-squared: -0.001534 #F-statistic: 0.2348 on 2 and 997 DF, p-value: 0.7908
I find this superior to any approach that doesn't show the same call as
summary(lm(y ~ x1+x2, data=df2)).