writing an object to disk in R through C++ vs. fst

  • A+

I was inspired by the fst package to try to write a C++ function to quickly serialize some data structures I have in R to disk.

But I am having trouble achieving the same write speed even on very simple objects. The code below is a simple example of writing a large 1 GB vector to disk.

Using custom C++ code, I achieve a write speed of 135 MB/s, which is the limit of my disk according to CrystalBench.

On the same data, write_fst achieves a write speed of 223 MB/s, which seems impossible since my disk can't write that fast. (Note, I am using fst::threads_fst(1) and compress=0 settings, and the files have the same data size.)

What am I missing?

How can I get the C++ function to write to disk faster?

C++ Code:

#include <Rcpp.h> #include <fstream> #include <cstring> #include <iostream>  // [[Rcpp::plugins(cpp11)]]  using namespace Rcpp;  // [[Rcpp::export]] void test(SEXP x) {   char* d = reinterpret_cast<char*>(REAL(x));   long dl = Rf_xlength(x) * 8;   std::ofstream OutFile;   OutFile.open("/tmp/test.raw", std::ios::out | std::ios::binary);   OutFile.write(d, dl);   OutFile.close(); } 

R Code:

library(microbenchmark) library(Rcpp) library(dplyr) library(fst) fst::threads_fst(1)  sourceCpp("test.cpp")  x <- runif(134217728) # 1 gigabyte df <- data.frame(x)  microbenchmark(test(x), write_fst(df, "/tmp/test.fst", compress=0), times=3) Unit: seconds                                          expr      min       lq     mean   median       uq      max neval                                       test(x) 6.549581 7.262408 7.559021 7.975235 8.063740 8.152246     3  write_fst(df, "/tmp/test.fst", compress = 0) 4.548579 4.570346 4.592398 4.592114 4.614307 4.636501     3  file.info("/tmp/test.fst")$size/1e6 # [1] 1073.742  file.info("/tmp/test.raw")$size/1e6 # [1] 1073.742 


Benchmarking SSD write and read performance is a tricky business and hard to do right. There are many effects to take into account.

For example, many SSD's use techniques to accelerate data speeds (intelligently), such as DRAM caching. Those techniques can increase your write speed, especially in cases where an identical dataset is written to disk multiple times, as in your example. To avoid this effect, each iteration of the benchmark should write a unique dataset to disk.

The block sizes of write and read operations are also important: the default physical sector size of SSD's is 4KB. Writing smaller blocks hampers performance, but with fst I found that writing blocks of data larger than a few MB's also lowers performance, due to CPU cache effects. Because fst writes it's data to disk in relatively small chunks, it's usually faster than alternatives that write data in a single large block.

To facilitate this block-wise writing to SSD, you could modify your code:

Rcpp::cppFunction('    #include <fstream>   #include <cstring>   #include <iostream>    #define BLOCKSIZE 262144 // 2^18 bytes per block    long test_blocks(SEXP x, Rcpp::String path) {     char* d = reinterpret_cast<char*>(REAL(x));      std::ofstream outfile;     outfile.open(path.get_cstring(), std::ios::out | std::ios::binary);      long dl = Rf_xlength(x) * 8;     long nr_of_blocks = dl / BLOCKSIZE;      for (long block_nr = 0; block_nr < nr_of_blocks; block_nr++) {       outfile.write(&d[block_nr * BLOCKSIZE], BLOCKSIZE);     }      long remaining_bytes = dl % BLOCKSIZE;     outfile.write(&d[nr_of_blocks * BLOCKSIZE], remaining_bytes);      outfile.close();      return dl;     } ') 

Now we can compare methods test, test_blocks and fst::write_fst in a single benchmark:

x <- runif(134217728) # 1 gigabyte df <- data.frame(X = x)  fst::threads_fst(1)  # use fst in single threaded mode  microbenchmark::microbenchmark(   test(x, "test.bin"),   test_blocks(x, "test.bin"),   fst::write_fst(df, "test.fst", compress = 0),   times = 10) #> Unit: seconds #>                                          expr      min       lq     mean #>                           test(x, "test.bin") 1.473615 1.506019 1.590430 #>                    test_blocks(x, "test.bin") 1.018082 1.062673 1.134956 #>  fst::write_fst(df, "test.fst", compress = 0) 1.127446 1.144039 1.249864 #>    median       uq      max neval #>  1.600055 1.635883 1.765512    10 #>  1.131631 1.204373 1.264220    10 #>  1.261269 1.327304 1.343248    10 

As you can see, the modified method test_blocks is about 40 percent faster than the original method and even slightly faster than the fst package. This is expected, because fst has some overhead in storing column and table information, (possible) attributes, hashes and compression information.

Please note that the difference between fst and your initial test method is much less pronounced on my system, showing again the challenges in using benchmarks to optimize a system.


:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: