Serious performance issue with iterating simulations

  • A+
Category:Languages

I recently stumbled upon a performance problem while implementing a simulation algorithm. I managed to find the bottleneck function (signally, it's the internal call to arrayfun that slows everything down):

function sim = simulate_frequency(the_f,k,n)      r = rand(1,n); %      x = arrayfun(@(x) find(x <= the_f,1,'first'),r);     sim = (histcounts(x,[1:k Inf]) ./ n).';  end 

It is being used in other parts of code as follows:

h0 = zeros(1,sims);  for i = 1:sims     p = simulate_frequency(the_f,k,n);     h0(i) = max(abs(p - the_p)); end 

Here are some possible values:

% Test Case 1 sims = 10000; the_f = [0.3010; 0.4771; 0.6021; 0.6990; 0.7782; 0.8451; 0.9031; 0.9542; 1.0000]; k = 9; n = 95;  % Test Case 2 sims = 10000; the_f = [0.0413; 0.0791; 0.1139; 0.1461; 0.1760; 0.2041; 0.2304; 0.2552; 0.2787; 0.3010; 0.3222; 0.3424; 0.3617; 0.3802; 0.3979; 0.4149; 0.4313; 0.4471; 0.4623; 0.4771; 0.4913; 0.5051; 0.5185; 0.5314; 0.5440; 0.5563; 0.5682; 0.5797; 0.5910; 0.6020; 0.6127; 0.6232; 0.6334; 0.6434; 0.6532; 0.6627; 0.6720; 0.6812; 0.6901; 0.6989; 0.7075; 0.7160; 0.7242; 0.7323; 0.7403; 0.7481; 0.7558; 0.7634; 0.7708; 0.7781; 0.7853; 0.7923; 0.7993; 0.8061; 0.8129; 0.8195; 0.8260; 0.8325; 0.8388; 0.8450; 0.8512; 0.8573; 0.8633; 0.8692; 0.8750; 0.8808; 0.8864; 0.8920; 0.8976; 0.9030; 0.9084; 0.9138; 0.9190; 0.9242; 0.9294; 0.9344; 0.9395; 0.9444; 0.9493; 0.9542; 0.9590; 0.9637; 0.9684; 0.9731; 0.9777; 0.9822; 0.9867; 0.9912; 0.9956; 1.000]; k = 90; n = 95; 

The scalar sims must be in the range 1000 1000000. The vector of cumulated frequencies the_f never contains more than 100 elements. The scalar k represents the number of elements in the_f. Finally, the scalar n represents the number of elements in the empirical sample vector, and can even be very large (up to 10000 elements, as far as I can tell).

Any clue about how to improve the computation time of this process?

 


This seems to be slightly faster for me in the second test case, not the first. The time differences might be larger for longer the_f and larger values of n.

function sim = simulate_frequency(the_f,k,n)     r = rand(1,n); %      [row,col] = find(r <= the_f); % Implicit singleton expansion going on here!     [~,ind] = unique(col,'first');     x = row(ind);     sim = (histcounts(x,[1:k Inf]) ./ n).'; end 

I'm using implicit singleton expansion in r <= the_f, use bsxfun if you have an older version of MATLAB (but you know the drill).

Find then returns row and column to all the locations where r is larger than the_f. unique finds the indices into the result for the first element of each column.

Credit: Andrei Bobrov over on MATLAB Answers


Another option (derived from this other answer) is a bit shorter but also a bit more obscure IMO:

mask = r <= the_f; [x,~] = find(mask & (cumsum(mask,1)==1)); 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: