Transform NA values based on first registration and nearest values

  • A+
Category:Languages

I already made a similar question but now I want just to restrict the new values of NA.

I have some data like this:

Date 1   Date 2    Date 3    Date 4    Date 5   Date 6 A  NA       0.1       0.2       NA        0.3    0.2 B  0.1      NA        NA        0.3       0.2    0.1 C  NA       NA        NA        NA        0.3    NA D  0.1      0.2       0.3       NA        0.1    NA E  NA       NA        0.1       0.2       0.1    0.3 

I would like to change the NA values of my data based on the first date a value is registered. So for example for A, the first registration is Date 2. Then I want that before that registration the values of NA in A are 0, and after the first registration the values of NA become the mean of the nearest values (mean of date 3 and 5).

In case the last value is an NA, transform it into the last registered value (as in C and D). In the case of E all NA values will become 0.

Get something like this:

Date 1   Date 2    Date 3    Date 4    Date 5   Date 6  A  0       0.1       0.2        0.25      0.3    0.2 B  0.1     0.2       0.2        0.3       0.2    0.1 C  0       0         0          0         0.3    0.3 D  0.1     0.2       0.3        0.2       0.1    0.1 E  0       0         0.1        0.2       0.1    0.3 

Can you help me? I'm not sure how to do it in R.

 


Here is a base R way using na.approx and apply with MARGIN = 1 (so this is probably not very efficient but get's the job done).

df1 <- as.data.frame(t(apply(dat, 1, na.approx, method = "constant", f = .5, na.rm = FALSE))) 

This results in

df1 #   V1  V2  V3   V4  V5 #A  NA 0.1 0.2 0.25 0.3 #B 0.1 0.2 0.2 0.30 0.2 #C  NA  NA  NA   NA 0.3 #E  NA  NA 0.1 0.20 0.1 

Replace NAs and rename columns.

df1[is.na(df1)] <- 0 names(df1) <- names(dat) df1 #  Date_1 Date_2 Date_3 Date_4 Date_5 #A    0.0    0.1    0.2   0.25    0.3 #B    0.1    0.2    0.2   0.30    0.2 #C    0.0    0.0    0.0   0.00    0.3 #E    0.0    0.0    0.1   0.20    0.1 

explaination

Given a vector

x <- c(0.1, NA, NA, 0.3, 0.2) na.approx(x) 

returns x with linear interpolated values

#[1] 0.1000000 0.1666667 0.2333333 0.3000000 0.2000000 

But OP asked for constant values so we need the argument constant from the approx function.

na.approx(x, method = "constant")  # [1] 0.1 0.1 0.1 0.3 0.2 

But this is still not what OP asked for because it carries the last observation forward while you want the mean for the closest non-NA values. Therefore we need the argument f (also from approx)

na.approx(x, method = "constant", f = .5) # [1] 0.1 0.2 0.2 0.3 0.2 # looks good 

From ?approx

f : for method = "constant" a number between 0 and 1 inclusive, indicating a compromise between left- and right-continuous step functions. If y0 and y1 are the values to the left and right of the point then the value is y0 if f == 0, y1 if f == 1, and y0*(1-f)+y1*f for intermediate values. In this way the result is right-continuous for f == 0 and left-continuous for f == 1, even for non-finite y values.

Lastly, to take care of the NAs at the start of each, i.e. don't replace them, row use na.rm = FALSE.

From ?na.approx

na.rm : logical. If the result of the (spline) interpolation still results in NAs, should these be removed?

data

dat <- structure(list(Date_1 = c(NA, 0.1, NA, NA), Date_2 = c(0.1, NA,  NA, NA), Date_3 = c(0.2, NA, NA, 0.1), Date_4 = c(NA, 0.3, NA,  0.2), Date_5 = c(0.3, 0.2, 0.3, 0.1)), .Names = c("Date_1", "Date_2",  "Date_3", "Date_4", "Date_5"), class = "data.frame", row.names = c("A",  "B", "C", "E")) 

EDIT

If there are NAs in the last column we can replace these with the last non-NAs before we apply na.approx as shown above.

y <- apply(dat, 1, function(x) tail(na.omit(x), 1)) dat$Date_6[is.na(dat$Date_6)] <- y[is.na(dat$Date_6)] 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: