- A+

I'm new to R and have been struggling with the following for a while now so I was hoping someone would be able to help me out.

The sample data represents stock price returns (each row is a monthly period). The real data set is much bigger and is structured like the input below:

**Input:**

`stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02) stock2 <- c(0, 0, 0.02, 0.04, -0.03, 0.02) stock3 <- c(0, 0, 0.02, 0, -0.01, 0.03) stock4 <- c(0, -0.02, 0.01, 0, 0, -0.02) df <- cbind(stock1,stock2,stock3,stock4) stock1 stock2 stock3 stock4 [1,] 0.01 0.00 0.00 0.00 [2,] -0.02 0.00 0.00 -0.02 [3,] 0.01 0.02 0.02 0.01 [4,] 0.05 0.04 0.00 0.00 [5,] 0.04 -0.03 -0.01 0.00 [6,] -0.02 0.02 0.03 -0.02 `

Any zeroes that precedes a non-zero for a given stock represents missing data as opposed to a return of zero for the period. I would like to set these values as NA so the output I would like to achieve is the following:

**Desired Output:**

`stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02) stock2 <- c(NA, NA, 0.02, 0.04, -0.03, 0.02) stock3 <- c(NA, NA, 0.02, 0, -0.01, 0.03) stock4 <- c(NA, -0.02, 0.01, 0, 0, -0.02) df <- cbind(stock1,stock2,stock3,stock4) stock1 stock2 stock3 stock4 [1,] 0.01 NA NA NA [2,] -0.02 NA NA -0.02 [3,] 0.01 0.02 0.02 0.01 [4,] 0.05 0.04 0.00 0.00 [5,] 0.04 -0.03 -0.01 0.00 [6,] -0.02 0.02 0.03 -0.02 `

I've tried a few things but they only seem to work for a single vector as opposed to a data set with multiple columns. I've tried using lapply to get around this but haven't had any luck so far. The closest I've gotten is shown below.

My single vector solution:

`stock1[1:min(which(stock1!=0))-1 <- NA `

My multiple vector solution which *does not work*:

`lapply(df,function(x) x[1:min(which(x!=0))-1 <- NA] `

Would greatly appreciate any guidance! Thanks!

There are three issues. First, writing:

`df <- cbind(stock1,stock2,stock3,stock4) `

doesn't create a data frame. It creates a matrix. This is an issue when you try to use `lapply`

, which will operate over the columns of a *data frame* but over the elements of a *matrix*. Instead, you should write:

`df <- data.frame(stock1,stock2,stock3,stock4) `

Second, the function you're using in `lapply`

needs to return the modified vector. Otherwise, the return value will be something unexpected (in this case, the assignment will return a single `NA`

, and the `lapply`

will return a data frame of one row of `NA`

s instead of the data frame you want).

Third, you need to take care with `1:n`

when `n`

can be zero (i.e., when the first stock quote is non-zero) because `1:0`

gives the sequence `c(1,0)`

instead of an empty sequence. (This is arguably one of R's stupidest features.)

Therefore, the following will give you what you want:

`stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02) stock2 <- c(0, 0, 0.02, 0.04, -0.03, 0.02) stock3 <- c(0, 0, 0.02, 0, -0.01, 0.03) stock4 <- c(0, -0.02, 0.01, 0, 0, -0.02) df <- data.frame(stock1,stock2,stock3,stock4) as.data.frame(lapply(df, function(x) { n <- min(which(x != 0)) - 1 if (n > 0) x[1:n] <- NA x })) `

The output is as expected:

` stock1 stock2 stock3 stock4 1 0.01 NA NA NA 2 -0.02 NA NA -0.02 3 0.01 0.02 0.02 0.01 4 0.05 0.04 0.00 0.00 5 0.04 -0.03 -0.01 0.00 6 -0.02 0.02 0.03 -0.02 `

*Update:* As @Daniel_Fischer notes, there's a clever trick to avoid the `1:0`

problem. You can instead write:

`as.data.frame(lapply(df, function(x) { n <- min(which(x != 0)) - 1 x[0:n] <- NA # use 0:n instead of 1:n x })) `

This takes advantage of the fact that R ignores zeros in this type of indexing operation, so:

`x[0:0] <- NA # same as x[0] <- NA and does nothing x[0:1] <- NA # same as x[1] <- NA x[0:2] <- NA # same as x[1:2] <- NA, etc. `