Numpy remove duplicate column values

  • A+
Category:Languages

I have a numpy array as follows

array([[ 6,  5],    [ 6,  9],    [ 7,  5],    [ 7,  9],    [ 8, 10],    [ 9, 10],    [ 9, 11],    [10, 10]]) 

I want to pick elements such that y coordinates are unique. If two y coordinates are same I want to pick element with lesser x coordinate.

Expected output

array([[ 6,  5],    [ 6,  9],    [ 8, 10],    [ 9, 11]]) 

Explanation

pick [6,5] over [7,5]

pick [8,10] over [9,10] and [10,10]

pick [9, 11]

Thanks

 


First, sort by the first column:

a = a[a[:, 0].argsort()] 

Returning unique indices using np.unique with the return_index flag:

a[np.unique(a[:, 1], return_index=True)[1]]  array([[ 6,  5],        [ 6,  9],        [ 8, 10],        [ 9, 11]]) 

Some timings:

a = np.random.randint(1, 10, 10000).reshape(-1, 2)  In [45]: %timeit rows_by_unique_y(a) 3.83 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  In [46]: %timeit argsort_unique(a) 370 µs ± 8.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 

Yes, my approach uses an initial sort, but vectorized operations in numpy beat iteration in Python.

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: