Numpy remove duplicate column values

• A+
Category：Languages

I have a numpy array as follows

array([[ 6,  5],    [ 6,  9],    [ 7,  5],    [ 7,  9],    [ 8, 10],    [ 9, 10],    [ 9, 11],    [10, 10]])

I want to pick elements such that y coordinates are unique. If two y coordinates are same I want to pick element with lesser x coordinate.

Expected output

array([[ 6,  5],    [ 6,  9],    [ 8, 10],    [ 9, 11]])

Explanation

pick [6,5] over [7,5]

pick [8,10] over [9,10] and [10,10]

pick [9, 11]

Thanks

First, sort by the first column:

a = a[a[:, 0].argsort()]

Returning unique indices using np.unique with the return_index flag:

a[np.unique(a[:, 1], return_index=True)]  array([[ 6,  5],        [ 6,  9],        [ 8, 10],        [ 9, 11]])

Some timings:

a = np.random.randint(1, 10, 10000).reshape(-1, 2)  In : %timeit rows_by_unique_y(a) 3.83 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  In : %timeit argsort_unique(a) 370 µs ± 8.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Yes, my approach uses an initial sort, but vectorized operations in numpy beat iteration in Python.