- A+

I already made a similar question but now I want just to restrict the new values of NA.

I have some data like this:

`Date 1 Date 2 Date 3 Date 4 Date 5 Date 6 A NA 0.1 0.2 NA 0.3 0.2 B 0.1 NA NA 0.3 0.2 0.1 C NA NA NA NA 0.3 NA D 0.1 0.2 0.3 NA 0.1 NA E NA NA 0.1 0.2 0.1 0.3 `

I would like to change the NA values of my data based on the first date a value is registered. So for example for A, the first registration is Date 2. Then I want that before that registration the values of NA in A are 0, and after the first registration the values of NA become the mean of the nearest values (mean of date 3 and 5).

In case the last value is an NA, transform it into the last registered value (as in C and D). In the case of E all NA values will become 0.

Get something like this:

`Date 1 Date 2 Date 3 Date 4 Date 5 Date 6 A 0 0.1 0.2 0.25 0.3 0.2 B 0.1 0.2 0.2 0.3 0.2 0.1 C 0 0 0 0 0.3 0.3 D 0.1 0.2 0.3 0.2 0.1 0.1 E 0 0 0.1 0.2 0.1 0.3 `

Can you help me? I'm not sure how to do it in R.

Here is a `base R`

way using `na.approx`

and `apply`

with `MARGIN = 1`

(so this is probably not very efficient but get's the job done).

`df1 <- as.data.frame(t(apply(dat, 1, na.approx, method = "constant", f = .5, na.rm = FALSE))) `

This results in

`df1 # V1 V2 V3 V4 V5 #A NA 0.1 0.2 0.25 0.3 #B 0.1 0.2 0.2 0.30 0.2 #C NA NA NA NA 0.3 #E NA NA 0.1 0.20 0.1 `

Replace `NA`

s and rename columns.

`df1[is.na(df1)] <- 0 names(df1) <- names(dat) df1 # Date_1 Date_2 Date_3 Date_4 Date_5 #A 0.0 0.1 0.2 0.25 0.3 #B 0.1 0.2 0.2 0.30 0.2 #C 0.0 0.0 0.0 0.00 0.3 #E 0.0 0.0 0.1 0.20 0.1 `

**explaination**

Given a vector

`x <- c(0.1, NA, NA, 0.3, 0.2) na.approx(x) `

returns `x`

with linear interpolated values

`#[1] 0.1000000 0.1666667 0.2333333 0.3000000 0.2000000 `

But OP asked for constant values so we need the argument `constant`

from the `approx`

function.

`na.approx(x, method = "constant") # [1] 0.1 0.1 0.1 0.3 0.2 `

But this is still not what OP asked for because it carries the last observation forward while you want the mean for the closest non-`NA`

values. Therefore we need the argument `f`

(also from `approx`

)

`na.approx(x, method = "constant", f = .5) # [1] 0.1 0.2 0.2 0.3 0.2 # looks good `

From `?approx`

f : for method = "constant" a number between 0 and 1 inclusive, indicating a compromise between left- and right-continuous step functions. If y0 and y1 are the values to the left and right of the point then the value is y0 if f == 0, y1 if f == 1, and y0*(1-f)+y1*f for intermediate values. In this way the result is right-continuous for f == 0 and left-continuous for f == 1, even for non-finite y values.

Lastly, to take care of the `NA`

s at the start of each, i.e. don't replace them, row use `na.rm = FALSE`

.

From `?na.approx`

na.rm : logical. If the result of the (spline) interpolation still results in NAs, should these be removed?

**data**

`dat <- structure(list(Date_1 = c(NA, 0.1, NA, NA), Date_2 = c(0.1, NA, NA, NA), Date_3 = c(0.2, NA, NA, 0.1), Date_4 = c(NA, 0.3, NA, 0.2), Date_5 = c(0.3, 0.2, 0.3, 0.1)), .Names = c("Date_1", "Date_2", "Date_3", "Date_4", "Date_5"), class = "data.frame", row.names = c("A", "B", "C", "E")) `

**EDIT**

If there are `NA`

s in the last column we can replace these with the last non-`NA`

s before we apply `na.approx`

as shown above.

`y <- apply(dat, 1, function(x) tail(na.omit(x), 1)) dat$Date_6[is.na(dat$Date_6)] <- y[is.na(dat$Date_6)] `