# Calculations based on a dynamic subgroup of a data.table

• A+
Category：Languages

My question is related to Subset by group with data.table but different.

Imagine a data set like this:

``tmp <- data.table(x = 1:10, y = c(27, 70, 54, 18, 50, 44, 22, 73, 6, 5)) ``

For each row of the data, I want to calculate a new value, z, which is the `min(y)` for all rows with a larger value of x. For instance, for the third row of the data where x is 3, I want `min(y)` among rows with x > 3 (which would be the value 5). For our intents and purposes, you can assume the data is already ordered by x.

At first I thought of using a function like this:

``min.y <- function(val, dt) {   dt[x > val, min(y)] } ``

But calling `tmp[, z:= fun(x, tmp)]` will result in a warning message:

``In min(y) : no non-missing arguments to min; returning Inf ``

What is the proper way to do this?

PS: Obviously, for the last row I expect to get NA as the result

Method 1:

Since you said we can assume the data is sorted on `x` you could use a cumulative minimum starting from the end of `y`. We cut out the first observation so that we do a `>` search rather than `>=`:

``tmp\$min_y <- c(rev(cummin(rev(tmp\$y[-1]))), NA) ``

Update: The old method effectively did a `>=` search rather than `>`. Updated to do `>`.

Method 2: Data.table

If you wanted to use `data.table` you could try to group by each row and then subset within `J`. The ifelse is needed so that when we are at our last row we don't take the min of no values:

``tmp[, "min_y" := {curr_x <- x                   tmp_subs <- tmp[x > curr_x]                   ifelse(nrow(tmp_subs)>0, min(tmp[x > curr_x][["y"]]), NA_real_)},     by = 1:nrow(tmp)]  tmp #     x  y min_y # 1:  1 27     5 # 2:  2 70     5 # 3:  3 54     5 # 4:  4 18     5 # 5:  5 50     5 # 6:  6 44     5 # 7:  7 22     5 # 8:  8 73     5 # 9:  9  6     5 #10: 10  5     NA ``

Since `5` is the smallest value which is at the end everything will be `5`. Let's make this a little more interesting:

``tmp <- data.table(x = 1:10, y = c(27, 70, 54, 18, 50, 44, 22, 73, 47, 58)) ``

Our result will be:

``#     x  y min_y # 1:  1 27    18 # 2:  2 70    18 # 3:  3 54    18 # 4:  4 18    22 # 5:  5 50    22 # 6:  6 44    22 # 7:  7 22    47 # 8:  8 73    47 # 9:  9 47    58 #10: 10 58    NA ``