I have a python list where elements can repeat.
>>> a = [1,2,2,3,3,4,5,6]
I want to get the first
n unique elements from the list. So, in this case, if i want the first 5 unique elements, they would be:
I have come up with a solution using generators:
def iterate(itr, upper=5): count = 0 for index, element in enumerate(itr): if index==0: count += 1 yield element elif element not in itr[:index] and count<upper: count += 1 yield element >>> i = iterate(a, 5) >>> [e for e in i] >>> [1,2,3,4,5]
I have doubts on this being the most optimal solution. Is there an alternative strategy that i can implement to write it in a more pythonic and efficient way?
I would use a
set to remember what was seen and return from the generator when you have
a = [1,2,2,3,3,4,5,6] def get_unique_N(iterable, N): """Yields (in order) the first N unique elements of iterable. Might yield less if data too short.""" seen = set() for e in iterable: if e in seen: continue seen.add(e) yield e if len(seen) == N: return k = get_unique_N([1,2,2,3,3,4,5,6], 4) print(list(k))
With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using
Your solution using
elif element not in itr[:index] and count<upper: uses
O(k) lookups - with
k being the length of the slice - using a set reduces this to
O(1) lookups but uses more memory because the set has to be kept as well. It is a speed vs. memory tradeoff - what is better is application/data dependend.
For 6 uniques (in longer list):
- you would have lookups of
- mine would have
5001*O(1)lookup + memory for