# Transform NA values based on first registration and nearest values

• A+
Category：Languages

I already made a similar question but now I want just to restrict the new values of NA.

I have some data like this:

``Date 1   Date 2    Date 3    Date 4    Date 5   Date 6 A  NA       0.1       0.2       NA        0.3    0.2 B  0.1      NA        NA        0.3       0.2    0.1 C  NA       NA        NA        NA        0.3    NA D  0.1      0.2       0.3       NA        0.1    NA E  NA       NA        0.1       0.2       0.1    0.3 ``

I would like to change the NA values of my data based on the first date a value is registered. So for example for A, the first registration is Date 2. Then I want that before that registration the values of NA in A are 0, and after the first registration the values of NA become the mean of the nearest values (mean of date 3 and 5).

In case the last value is an NA, transform it into the last registered value (as in C and D). In the case of E all NA values will become 0.

Get something like this:

``Date 1   Date 2    Date 3    Date 4    Date 5   Date 6  A  0       0.1       0.2        0.25      0.3    0.2 B  0.1     0.2       0.2        0.3       0.2    0.1 C  0       0         0          0         0.3    0.3 D  0.1     0.2       0.3        0.2       0.1    0.1 E  0       0         0.1        0.2       0.1    0.3 ``

Can you help me? I'm not sure how to do it in R.

Here is a `base R` way using `na.approx` and `apply` with `MARGIN = 1` (so this is probably not very efficient but get's the job done).

``df1 <- as.data.frame(t(apply(dat, 1, na.approx, method = "constant", f = .5, na.rm = FALSE))) ``

This results in

``df1 #   V1  V2  V3   V4  V5 #A  NA 0.1 0.2 0.25 0.3 #B 0.1 0.2 0.2 0.30 0.2 #C  NA  NA  NA   NA 0.3 #E  NA  NA 0.1 0.20 0.1 ``

Replace `NA`s and rename columns.

``df1[is.na(df1)] <- 0 names(df1) <- names(dat) df1 #  Date_1 Date_2 Date_3 Date_4 Date_5 #A    0.0    0.1    0.2   0.25    0.3 #B    0.1    0.2    0.2   0.30    0.2 #C    0.0    0.0    0.0   0.00    0.3 #E    0.0    0.0    0.1   0.20    0.1 ``

explaination

Given a vector

``x <- c(0.1, NA, NA, 0.3, 0.2) na.approx(x) ``

returns `x` with linear interpolated values

``#[1] 0.1000000 0.1666667 0.2333333 0.3000000 0.2000000 ``

But OP asked for constant values so we need the argument `constant` from the `approx` function.

``na.approx(x, method = "constant")  # [1] 0.1 0.1 0.1 0.3 0.2 ``

But this is still not what OP asked for because it carries the last observation forward while you want the mean for the closest non-`NA` values. Therefore we need the argument `f` (also from `approx`)

``na.approx(x, method = "constant", f = .5) # [1] 0.1 0.2 0.2 0.3 0.2 # looks good ``

From `?approx`

f : for method = "constant" a number between 0 and 1 inclusive, indicating a compromise between left- and right-continuous step functions. If y0 and y1 are the values to the left and right of the point then the value is y0 if f == 0, y1 if f == 1, and y0*(1-f)+y1*f for intermediate values. In this way the result is right-continuous for f == 0 and left-continuous for f == 1, even for non-finite y values.

Lastly, to take care of the `NA`s at the start of each, i.e. don't replace them, row use `na.rm = FALSE`.

From `?na.approx`

na.rm : logical. If the result of the (spline) interpolation still results in NAs, should these be removed?

data

``dat <- structure(list(Date_1 = c(NA, 0.1, NA, NA), Date_2 = c(0.1, NA,  NA, NA), Date_3 = c(0.2, NA, NA, 0.1), Date_4 = c(NA, 0.3, NA,  0.2), Date_5 = c(0.3, 0.2, 0.3, 0.1)), .Names = c("Date_1", "Date_2",  "Date_3", "Date_4", "Date_5"), class = "data.frame", row.names = c("A",  "B", "C", "E")) ``

EDIT

If there are `NA`s in the last column we can replace these with the last non-`NA`s before we apply `na.approx` as shown above.

``y <- apply(dat, 1, function(x) tail(na.omit(x), 1)) dat\$Date_6[is.na(dat\$Date_6)] <- y[is.na(dat\$Date_6)] ``