# Numpy remove duplicate column values

• A+
Category：Languages

I have a numpy array as follows

``array([[ 6,  5],    [ 6,  9],    [ 7,  5],    [ 7,  9],    [ 8, 10],    [ 9, 10],    [ 9, 11],    [10, 10]]) ``

I want to pick elements such that y coordinates are unique. If two y coordinates are same I want to pick element with lesser x coordinate.

Expected output

``array([[ 6,  5],    [ 6,  9],    [ 8, 10],    [ 9, 11]]) ``

Explanation

pick `[6,5]` over `[7,5]`

pick `[8,10]` over `[9,10]` and `[10,10]`

pick `[9, 11]`

Thanks

First, sort by the first column:

``a = a[a[:, 0].argsort()] ``

Returning unique indices using `np.unique` with the `return_index` flag:

``a[np.unique(a[:, 1], return_index=True)[1]]  array([[ 6,  5],        [ 6,  9],        [ 8, 10],        [ 9, 11]]) ``

Some timings:

``a = np.random.randint(1, 10, 10000).reshape(-1, 2)  In [45]: %timeit rows_by_unique_y(a) 3.83 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  In [46]: %timeit argsort_unique(a) 370 µs ± 8.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``

Yes, my approach uses an initial sort, but vectorized operations in numpy beat iteration in Python.