- A+

I am trying to create a generator that returns numbers in a given range that pass a particular test given by a function `foo`

. However I would like the numbers to be tested in a random order. The following code will achieve this:

`from random import shuffle def MyGenerator(foo, num): order = list(range(num)) shuffle(order) for i in order: if foo(i): yield i `

**The Problem**

The problem with this solution is that sometimes the range will be quite large (`num`

might be of the order `10**8`

and upwards). This function can become slow, having such a large list in memory. I have tried to avoid this problem, with the following code:

`from random import randint def MyGenerator(foo, num): tried = set() while len(tried) <= num - 1: i = randint(0, num-1) if i in tried: continue tried.add(i) if foo(i): yield i `

This works well most of the time, since in most cases `num`

will be quite large, `foo`

will pass a reasonable number of numbers and the total number of times the `__next__`

method will be called will be relatively small (say, a maximum of 200 often much smaller). Therefore its reasonable likely we stumble upon a value that passes the `foo`

test and the size of `tried`

never gets large. (Even if it only passes 10% of the time, we wouldn't expect `tried`

to get larger than about 2000 roughly.)

However, when `num`

is small (close to the number of times that the `__next__`

method is called, or `foo`

fails most of the time, the above solution becomes very inefficient - randomly guessing numbers until it guesses one that isn't in `tried`

.

**My attempted solution...**

I was hoping to use some kind of function that maps the numbers `0,1,2,..., n`

onto themselves in a roughly random way. (This isn't being used for any security purposes and so doesn't matter if it isn't the most 'random' function in the world). The function here (Create a random bijective function which has same domain and range) maps signed 32-bit integers onto themselves, but I am not sure how to adapt the mapping to a smaller range. Given `num`

I don't even need a bijection on `0,1,..num`

just a value of `n`

larger than and 'close' to `num`

(using whatever definition of close you see fit). Then I can do the following:

`def mix_function_factory(num): # something here??? def foo(index): # something else here?? return foo def MyGenerator(foo, num): mix_function = mix_function_factory(num): for i in range(num): index = mix_function(i) if index <= num: if foo(index): yield index `

(so long as the bijection isn't on a set of numbers massively larger than `num`

the number of times `index <= num`

isn't True will be small).

**My Question**

Can you think of one of the following:

- A potential solution for
`mix_function_factory`

or even a few other potential functions for`mix_function`

that I could attempt to generalise for different values of`num`

? - A better way of solving the original problem?

Many thanks in advance....

The problem is basically generating a random permutation of the integers in the range `0..n-1`

.

Luckily for us, these numbers have a very useful property: they all have a distinct value modulo `n`

. If we can apply some mathemical operations to these numbers while taking care to keep each number distinct modulo `n`

, it's easy to generate a permutation that *appears* random. And the best part is that we don't need any memory to keep track of numbers we've already generated, because each number is calculated with a simple formula.

Examples of operations we can perform on every number `x`

in the range include:

**Addition**: We can add any integer`c`

to`x`

.**Multiplication**: We can multiply`x`

with any number`m`

that shares no prime factors with`n`

.

Applying just these two operations on the range `0..n-1`

already gives quite satisfactory results:

`>>> n = 7 >>> c = 1 >>> m = 3 >>> [((x+c) * m) % n for x in range(n)] [3, 6, 2, 5, 1, 4, 0] `

Looks random, doesn't it?

If we generate `c`

and `m`

from a random number, it'll actually *be* random, too. But keep in mind that there is no guarantee that this algorithm will generate all possible permutations, or that each permutation has the same probability of being generated.

# Implementation

The difficult part about the implementation is really just generating a suitable random `m`

. I used the prime factorization code from this answer to do so.

`import random # credit for prime factorization code goes # to https://stackoverflow.com/a/17000452/1222951 def prime_factors(n): gaps = [1,2,2,4,2,4,2,4,6,2,6] length, cycle = 11, 3 f, fs, next_ = 2, [], 0 while f * f <= n: while n % f == 0: fs.append(f) n /= f f += gaps[next_] next_ += 1 if next_ == length: next_ = cycle if n > 1: fs.append(n) return fs def generate_c_and_m(n, seed=None): # we need to know n's prime factors to find a suitable multiplier m p_factors = set(prime_factors(n)) def is_valid_multiplier(m): # m must not share any prime factors with n factors = prime_factors(m) return not p_factors.intersection(factors) # if no seed was given, generate random values for c and m if seed is None: c = random.randint(n) m = random.randint(1, 2*n) else: c = seed m = seed # make sure m is valid while not is_valid_multiplier(m): m += 1 return c, m `

Now that we can generate suitable values for `c`

and `m`

, creating the permutation is trivial:

`def random_range(n, seed=None): c, m = generate_c_and_m(n, seed) for x in range(n): yield ((x + c) * m) % n `

And your generator function can be implemented as

`def MyGenerator(foo, num): for x in random_range(num): if foo(x): yield x `