How to adjust a data table manipulation so that, besides
sum per category of several colums, it would also calculate other functions at the same time such as
mean and counts (
.N) and automatically create column names: "sum c1" , "sum c2", "sum c4" , "mean c1", " mean c2", "mean c4" and preferably also 1 column "counts"?
My old solution was to write out
mean col1 = .... mean col2 = ....
Etc, Inside the data.table command
Which worked, but horribly inefficient I think, and it won't work anymore to precode it if in the new app version, the calculations depend on user choices in an R Shiny app what to calculate for which columns.
I've read my way through a bunch of posts and blog articles but haven't quite figured out how to best do this. I read that in some cases the manipulation can become quite slow on large data tables depending on what approach you use (.sdcols, get, lapply, and or by =). Therefore I added a 'sizeable' dummy data set
My real data is around 100k rows by 100 columns and 1-100 groups roughly.
library(data.table) n = 100000 dt = data.table(index=1:100000, category = sample(letters[1:25], n, replace = T), c1=rnorm(n,10000), c2=rnorm(n,1000), c3=rnorm(n,100), c4 = rnorm(n,10) ) # add more columns to test for big data tables lapply(c(paste('c', 5:100, sep ='')), function(addcol) dt[[addcol]] <<- rnorm(n,1000) ) # Simulate columns selected by shiny app user Colchoice <- c("c1", "c4") FunChoice <- c(".N", "mean", "sum") # attempt which now does just one function and doesn't add names dt[, lapply(.SD, sum, na.rm=TRUE), by=category, .SDcols=Colchoice ]
Expected output is a row per group and a column for each function per each selected column.
Category Mean c1 Sum c1 Mean c4 ... A B C D E ......
Possibly a duplicate but I haven't found the exact answer that I need
Here's a data.table answer:
funs_list <- lapply(FunChoice, as.symbol) dcast(dt, category~1, fun=list(mean, sum), value.var = Colchoice)
It's super fast and does what you want.