Calculations based on a dynamic subgroup of a data.table

  • A+
Category:Languages

My question is related to Subset by group with data.table but different.

Imagine a data set like this:

tmp <- data.table(x = 1:10, y = c(27, 70, 54, 18, 50, 44, 22, 73, 6, 5)) 

For each row of the data, I want to calculate a new value, z, which is the min(y) for all rows with a larger value of x. For instance, for the third row of the data where x is 3, I want min(y) among rows with x > 3 (which would be the value 5). For our intents and purposes, you can assume the data is already ordered by x.

At first I thought of using a function like this:

min.y <- function(val, dt) {   dt[x > val, min(y)] } 

But calling tmp[, z:= fun(x, tmp)] will result in a warning message:

In min(y) : no non-missing arguments to min; returning Inf 

What is the proper way to do this?

PS: Obviously, for the last row I expect to get NA as the result

 


Method 1:

Since you said we can assume the data is sorted on x you could use a cumulative minimum starting from the end of y. We cut out the first observation so that we do a > search rather than >=:

tmp$min_y <- c(rev(cummin(rev(tmp$y[-1]))), NA) 

Update: The old method effectively did a >= search rather than >. Updated to do >.


Method 2: Data.table

If you wanted to use data.table you could try to group by each row and then subset within J. The ifelse is needed so that when we are at our last row we don't take the min of no values:

tmp[, "min_y" := {curr_x <- x                   tmp_subs <- tmp[x > curr_x]                   ifelse(nrow(tmp_subs)>0, min(tmp[x > curr_x][["y"]]), NA_real_)},     by = 1:nrow(tmp)]  tmp #     x  y min_y # 1:  1 27     5 # 2:  2 70     5 # 3:  3 54     5 # 4:  4 18     5 # 5:  5 50     5 # 6:  6 44     5 # 7:  7 22     5 # 8:  8 73     5 # 9:  9  6     5 #10: 10  5     NA 

Since 5 is the smallest value which is at the end everything will be 5. Let's make this a little more interesting:

tmp <- data.table(x = 1:10, y = c(27, 70, 54, 18, 50, 44, 22, 73, 47, 58)) 

Our result will be:

#     x  y min_y # 1:  1 27    18 # 2:  2 70    18 # 3:  3 54    18 # 4:  4 18    22 # 5:  5 50    22 # 6:  6 44    22 # 7:  7 22    47 # 8:  8 73    47 # 9:  9 47    58 #10: 10 58    NA 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: