- A+

I recently stumbled upon a performance problem while implementing a simulation algorithm. I managed to find the bottleneck function (signally, it's the internal call to `arrayfun`

that slows everything down):

`function sim = simulate_frequency(the_f,k,n) r = rand(1,n); % x = arrayfun(@(x) find(x <= the_f,1,'first'),r); sim = (histcounts(x,[1:k Inf]) ./ n).'; end `

It is being used in other parts of code as follows:

`h0 = zeros(1,sims); for i = 1:sims p = simulate_frequency(the_f,k,n); h0(i) = max(abs(p - the_p)); end `

Here are some possible values:

`% Test Case 1 sims = 10000; the_f = [0.3010; 0.4771; 0.6021; 0.6990; 0.7782; 0.8451; 0.9031; 0.9542; 1.0000]; k = 9; n = 95; % Test Case 2 sims = 10000; the_f = [0.0413; 0.0791; 0.1139; 0.1461; 0.1760; 0.2041; 0.2304; 0.2552; 0.2787; 0.3010; 0.3222; 0.3424; 0.3617; 0.3802; 0.3979; 0.4149; 0.4313; 0.4471; 0.4623; 0.4771; 0.4913; 0.5051; 0.5185; 0.5314; 0.5440; 0.5563; 0.5682; 0.5797; 0.5910; 0.6020; 0.6127; 0.6232; 0.6334; 0.6434; 0.6532; 0.6627; 0.6720; 0.6812; 0.6901; 0.6989; 0.7075; 0.7160; 0.7242; 0.7323; 0.7403; 0.7481; 0.7558; 0.7634; 0.7708; 0.7781; 0.7853; 0.7923; 0.7993; 0.8061; 0.8129; 0.8195; 0.8260; 0.8325; 0.8388; 0.8450; 0.8512; 0.8573; 0.8633; 0.8692; 0.8750; 0.8808; 0.8864; 0.8920; 0.8976; 0.9030; 0.9084; 0.9138; 0.9190; 0.9242; 0.9294; 0.9344; 0.9395; 0.9444; 0.9493; 0.9542; 0.9590; 0.9637; 0.9684; 0.9731; 0.9777; 0.9822; 0.9867; 0.9912; 0.9956; 1.000]; k = 90; n = 95; `

The scalar `sims`

must be in the range `1000`

`1000000`

. The vector of cumulated frequencies `the_f`

never contains more than `100`

elements. The scalar `k`

represents the number of elements in `the_f`

. Finally, the scalar `n`

represents the number of elements in the empirical sample vector, and can even be very large (up to `10000`

elements, as far as I can tell).

Any clue about how to improve the computation time of this process?

This seems to be slightly faster for me in the second test case, not the first. The time differences might be larger for longer `the_f`

and larger values of `n`

.

`function sim = simulate_frequency(the_f,k,n) r = rand(1,n); % [row,col] = find(r <= the_f); % Implicit singleton expansion going on here! [~,ind] = unique(col,'first'); x = row(ind); sim = (histcounts(x,[1:k Inf]) ./ n).'; end `

I'm using implicit singleton expansion in `r <= the_f`

, use `bsxfun`

if you have an older version of MATLAB (but you know the drill).

Find then returns row and column to all the locations where `r`

is larger than `the_f`

. `unique`

finds the indices into the result for the first element of each column.

^{Credit: Andrei Bobrov over on MATLAB Answers}

Another option (derived from this other answer) is a bit shorter but also a bit more obscure IMO:

`mask = r <= the_f; [x,~] = find(mask & (cumsum(mask,1)==1)); `