# Why is this naive matrix multiplication faster than base R's?

• A+
Category：Languages

In R, matrix multiplication is very optimized, i.e. is really just a call to BLAS/LAPACK. However, I'm surprised this very naive C++ code for matrix multiplication seems reliably 30% faster.

`` library(Rcpp)   # Simple C++ code for matrix multiplication  mm_code =   "NumericVector my_mm(NumericMatrix m, NumericVector v){    int nRow = m.rows();    int nCol = m.cols();    NumericVector ans(nRow);    double v_j;    for(int j = 0; j < nCol; j++){      v_j = v[j];      for(int i = 0; i < nRow; i++){        ans[i] += m(i,j) * v_j;      }    }    return(ans);  }  "  # Compiling  my_mm = cppFunction(code = mm_code)   # Simulating data to use  nRow = 10^4  nCol = 10^4   m = matrix(rnorm(nRow * nCol), nrow = nRow)  v = rnorm(nCol)   system.time(my_ans <- my_mm(m, v)) #>    user  system elapsed  #>   0.103   0.001   0.103   system.time(r_ans <- m %*% v) #>   user  system elapsed  #>  0.154   0.001   0.154    # Double checking answer is correct  max(abs(my_ans - r_ans))  #> [1] 0 ``

Does base R's `%*%` perform some type of data check that I'm skipping over?

A quick glance in `names.c` (here in particular) points you to `do_matprod`, the C function that is called by `%*%` and which is found in the file `array.c`. (Interestingly, it turns out, that both `crossprod` and `tcrossprod` dispatch to that same function as well). Here is a link to the code of `do_matprod`.

Scrolling through the function, you can see that it takes care of a number of things your naive implementation does not, including:

1. Keeps row and column names, where that makes sense.
2. Allows for dispatch to alternative S4 methods when the two objects being operated on by a call to `%*%` are of classes for which such methods have been provided. (That's what's happening in this portion of the function.)
3. Handles both real and complex matrices.
4. Implements a series of rules for how to handle multiplication of a matrix and a matrix, a vector and a matrix, a matrix and a vector, and a vector and a vector. (Recall that under cross-multiplication in R, a vector on the LHS is treated as a row vector, whereas on the RHS, it is treated as a column vector; this is the code that makes that so.)

Near the end of the function, it dispatches to either of `matprod` or or `cmatprod`. Interestingly (to me at least), in the case of real matrices, if either matrix might contain `NaN` or `Inf` values, then `matprod` dispatches (here) to a function called `simple_matprod` which is about as simple and straightforward as your own. Otherwise, it dispatches to one of a couple of BLAS Fortran routines which, presumably are faster, if uniformly 'well-behaved' matrix elements can be guaranteed.