# Why does “vectorizing” this simple R loop give the wrong result?

• A+
Category：Languages

Perhaps a very dumb question.

I am trying to "vectorize" the following loop:

``set.seed(0) x <- round(runif(10), 2) # [1] 0.90 0.27 0.37 0.57 0.91 0.20 0.90 0.94 0.66 0.63 sig <- sample.int(10) # [1]  1  2  9  5  3  4  8  6  7 10 for (i in seq_along(sig)) x[i] <- x[sig[i]] x # [1] 0.90 0.27 0.66 0.91 0.66 0.91 0.94 0.91 0.94 0.63 ``

I think it is simply `x[sig]` but the result does not match.

``set.seed(0) x <- round(runif(10), 2) x[sig] # [1] 0.90 0.27 0.66 0.91 0.37 0.57 0.94 0.20 0.90 0.63 ``

What's wrong?

There is actually a trap here: address aliasing. The loop reads `x` and writes to `x`. The memory block it reads overlaps the memory block it writes to. Such self-reference introduces loop dependency and is a hazard for "vectorization".

By contrast, `x[sig]` creates a new memory block for writing, eliminating address aliasing.

Which piece of code is correct depends on what we want to do.

If we want to perform a shuffling / permutation of `x`, then `x[sig]` is the right one. The loop hopes to do "in-place" permutation without using extra memory, but "in-place" permutation is in fact a more complicated operation: not only entries of `x` need be swapped, entries of `sig` also need be swapped along the iteration.

If we deem the loop as the correct thing, then there is no way to "vectorize" it. Well, if implementing the loop in Rcpp is seen as a "vectorization" then let it be. But there is no chance to further "vectorize" the C / C++ loop with SIMD.

Remark:

This Q & A is motivated by this Q & A. OP originally presented a loop

``for (i in 1:num) {   for (j in 1:num) {     mat[i, j] <- mat[i, mat[j, "rm"]]   } } ``

It is tempting to "vectorize" it as

``mat[1:num, 1:num] <- mat[1:num, mat[1:num, "rm"]] ``

but it is actually wrong. Later OP changed the loop to

``for (i in 1:num) {   for (j in 1:num) {     mat[i, j] <- mat[i, 1 + num + mat[j, "rm"]]   } } ``

which eliminates the address aliasing issue, because the columns to be replaced are the first `num` columns, while the columns to be looked up are after the first `num` columns.