Randomise order of groups in R data table while preserving internal order of groups

  • A+
Category:Languages

In R, I have the following sample data table:

library(data.table) x <- data.table(Group = c("d1", "d1", "d1", "d1", "d2", "d3", "d3", "d4", "d5", "d5", "d5", "d6", "d7", "d7", "d7", "d7", "d7")) x[, InternalOrder := seq(.N), by = Group] 

Which looks like this:

# Input: #     Group InternalOrder  1:    d1             1  2:    d1             2  3:    d1             3  4:    d1             4  5:    d2             1  6:    d3             1  7:    d3             2  8:    d4             1  9:    d5             1 10:    d5             2 11:    d5             3 12:    d6             1 13:    d7             1 14:    d7             2 15:    d7             3 16:    d7             4 17:    d7             5 

My goal is to randomise the order of groups in the data table x while preserving the internal order of each group.

I have already worked out a solution

groupsizes <- x[, .N, by = Group]$N  # Get number of elements (= rows) for each group set.seed(10) x[, RandomGroupID := rep(sample(c(1:length(unique(x$Group))), replace = F), groupsizes)]  # Make new column with random ID for each group setorder(x, RandomGroupID, InternalOrder)  # Re-order data by random group ID and internal order 

that gives the desired output:

# Output (as desired):      Group InternalOrder RandomGroupID  1:    d5             1             1  2:    d5             2             1  3:    d5             3             1  4:    d2             1             2  5:    d3             1             3  6:    d3             2             3  7:    d1             1             4  8:    d1             2             4  9:    d1             3             4 10:    d1             4             4 11:    d4             1             5 12:    d7             1             6 13:    d7             2             6 14:    d7             3             6 15:    d7             4             6 16:    d7             5             6 17:    d6             1             7 

Since I am trying to improve my data table skills, I would like to know if there is a nicer, more idiomatic solution that does not require the intermediate step of creating the vector groupsizes but assigns a new column making use of the typical data table syntax using the by argument in combination with .GRP or .I or the like. I have thought of something like x[, RandomGroupIDAlternative := rep(sample(c(1:length(unique(x$Group))), replace = F), .GRP), by = Group] which obviously does not give the desired output.

I am looking forward to your comments and to seeing alternative solutions to this problem.

 


You can also do it using split and rbindlist:

x_new <- rbindlist(sample(split(x, by='Group')))      Group InternalOrder  1:    d4             1  2:    d1             1  3:    d1             2  4:    d1             3  5:    d1             4  6:    d5             1  7:    d5             2  8:    d5             3  9:    d6             1 10:    d7             1 11:    d7             2 12:    d7             3 13:    d7             4 14:    d7             5 15:    d3             1 16:    d3             2 17:    d2             1 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: