Why is this naive matrix multiplication faster than base R's?

• A+
Category：Languages

In R, matrix multiplication is very optimized, i.e. is really just a call to BLAS/LAPACK. However, I'm surprised this very naive C++ code for matrix multiplication seems reliably 30% faster.

library(Rcpp)   # Simple C++ code for matrix multiplication  mm_code =   "NumericVector my_mm(NumericMatrix m, NumericVector v){    int nRow = m.rows();    int nCol = m.cols();    NumericVector ans(nRow);    double v_j;    for(int j = 0; j < nCol; j++){      v_j = v[j];      for(int i = 0; i < nRow; i++){        ans[i] += m(i,j) * v_j;      }    }    return(ans);  }  "  # Compiling  my_mm = cppFunction(code = mm_code)   # Simulating data to use  nRow = 10^4  nCol = 10^4   m = matrix(rnorm(nRow * nCol), nrow = nRow)  v = rnorm(nCol)   system.time(my_ans <- my_mm(m, v)) #>    user  system elapsed  #>   0.103   0.001   0.103   system.time(r_ans <- m %*% v) #>   user  system elapsed  #>  0.154   0.001   0.154    # Double checking answer is correct  max(abs(my_ans - r_ans))  #>  0

Does base R's %*% perform some type of data check that I'm skipping over?

A quick glance in names.c (here in particular) points you to do_matprod, the C function that is called by %*% and which is found in the file array.c. (Interestingly, it turns out, that both crossprod and tcrossprod dispatch to that same function as well). Here is a link to the code of do_matprod.

Scrolling through the function, you can see that it takes care of a number of things your naive implementation does not, including:

1. Keeps row and column names, where that makes sense.
2. Allows for dispatch to alternative S4 methods when the two objects being operated on by a call to %*% are of classes for which such methods have been provided. (That's what's happening in this portion of the function.)
3. Handles both real and complex matrices.
4. Implements a series of rules for how to handle multiplication of a matrix and a matrix, a vector and a matrix, a matrix and a vector, and a vector and a vector. (Recall that under cross-multiplication in R, a vector on the LHS is treated as a row vector, whereas on the RHS, it is treated as a column vector; this is the code that makes that so.)

Near the end of the function, it dispatches to either of matprod or or cmatprod. Interestingly (to me at least), in the case of real matrices, if either matrix might contain NaN or Inf values, then matprod dispatches (here) to a function called simple_matprod which is about as simple and straightforward as your own. Otherwise, it dispatches to one of a couple of BLAS Fortran routines which, presumably are faster, if uniformly 'well-behaved' matrix elements can be guaranteed.