# Sample number with equal probability which is not part of a set

• A+
Category：Languages

I have a number `n` and a set of numbers `∈[1..n]*` with size `s` (which is substantially smaller than `n`). I want to sample a number `[1..n]` with equal probability, but the number is not allowed to be in the set.

I am trying to solve the problem in at worst `O(log n + s)`. I am not sure whether it's possible.

A naive approach is creating an array of numbers from 1 to n excluding all numbers in the set and then pick one array element. This will run in `O(n)` and is not an option.

Another approach may be just generating random numbers `∈[1..n]` and rejecting them if they are contained in the (hash)set. This has no theoretical bound as any number could be sampled multiple times even if it is in the set. But on average this might be a practical solution if `s` is substantially smaller than `n`.

I tried looking through typical algorithmic problems but couldn't find a matching one.

Say s is sorted. Generate a random number between 1 and n-s, call it k. We've chosen the k'th element of {1,...,n} - s. Now we need to find it.

Use binary search on s to find the count of the elements of s <= k. This takes O(log |s|). Add this to k. In doing so, we may have passed or arrived at additional elements of s. We can adjust for this by incrementing our answer for each such element that we pass, which we find by checking the next larger element of s from the point we found in our binary search.

E.g., n = 100, s = {1,4,5,22}, and our random number is 3. So our approach should return the third element of [2,3,6,7,...,21,23,24,...,100] which is 6. Binary search finds that 1 element is at most 3, so we increment to 4. Now we compare to the next larger element of s which is 4 so increment to 5. Repeating this finds 5 in so we increment to 6. We check s once more, see that 6 isn't in it, so we stop.

E.g., n = 100, s = {1,4,5,22}, and our random number is 4. So our approach should return the fourth element of [2,3,6,7,...,21,23,24,...,100] which is 7. Binary search finds that 2 elements are at most 4, so we increment to 6. Now we compare to the next larger element of s which is 5 so increment to 7. We check s once more, see that the next number is > 7, so we stop.

If we assume that "s is substantially smaller than n" means |s| <= log(n), then we will increment at most log(n) times, and in any case at most s times.

If s is not sorted then we can do the following. Create an array of bits of size s. Generate k. Parse s and do two things: 1) count the number of elements < k, call this r. At the same time, set the i'th bit to 1 if k+i is in s (0 indexed so if k is in s then the first bit is set).

Now, increment k a number of times equal to r plus the number of set bits is the array with an index <= the number of times incremented.

E.g., n = 100, s = {1,4,5,22}, and our random number is 4. So our approach should return the fourth element of [2,3,6,7,...,21,23,24,...,100] which is 7. We parse s and 1) note that 1 element is below 4 (r=1), and 2) set our array to [1, 1, 0, 0]. We increment once for r=1 and an additional two times for the two set bits, ending up at 7.

This is O(s) time, O(s) space.