# Argsorting values in a list of lists

• A+
Category：Languages

I have a list of lists `A` of length `m`. Every list of `A` contains positive numbers from `{1, 2, ..., n}`. The following is an example, where `m = 3` and `n = 4`.

``A = [[1, 1, 3], [1, 2], [1, 1, 2, 4]] ``

I represent every number `x` in `A` as a pair `(i, j)` where `A[i][j] = x`. I would like to sort the numbers in `A` in non-decreasing order; breaking ties by lowest first index. That is, if `A[i1][j1] == A[i2][j2]`, then `(i1, j1)` comes before `(i2, j2)` iff `i1 <= i2`.

In the example, I would like to return the pairs:

``(0, 0), (0, 1), (1, 0), (2, 0), (2, 1), (1, 1), (2, 2), (0, 2), (2, 3) ``

which represents the sorted numbers

``1, 1, 1, 1, 1, 2, 2, 3, 4 ``

What I did is a naive approach that works as follows:

• First I sort every list in `A`.
• Then I iterate the numbers in `{1, 2, ..., n}` and the list `A` and add the pairs.

Code:

``for i in range(m):      A[i].sort() S = [] for x in range(1, n+1):     for i in range(m):         for j in range(len(A[i])):             if A[i][j] == x:                 S.append((i, j)) ``

I think this approach is not good. Can we do better?

### `list.sort`

You can generate a list of indexes and then call `list.sort` with a `key`:

``B = [(i, j) for i, x in enumerate(A) for j, _ in enumerate(x)] B.sort(key=lambda ix: A[ix][ix]) ``

``print(B) [(0, 0), (0, 1), (1, 0), (2, 0), (2, 1), (1, 1), (2, 2), (0, 2), (2, 3)] ``

Note that on python-2.x where iterable unpacking in functions is supported, you can simplify the `sort` call a bit:

``B.sort(key=lambda (i, j): A[i][j]) ``

### `sorted`

This is an alternative to the version above, and generates two lists (one in memory which `sorted` then works on, to return another copy).

``B = sorted([        (i, j) for i, x in enumerate(A) for j, _ in enumerate(x)     ],      key=lambda ix: A[ix][ix] )  print(B) [(0, 0), (0, 1), (1, 0), (2, 0), (2, 1), (1, 1), (2, 2), (0, 2), (2, 3)] ``

### Performance

On popular demand, adding some timings and a plot. From the graph, we see that calling `list.sort` is more efficient than `sorted`. This is because `list.sort` performs an in-place sort, so there is no time/space overhead from creating a copy of the data which `sorted` has.

Functions

``def cs1(A):     B = [(i, j) for i, x in enumerate(A) for j, _ in enumerate(x)]     B.sort(key=lambda ix: A[ix][ix])       return B  def cs2(A):     return sorted([            (i, j) for i, x in enumerate(A) for j, _ in enumerate(x)         ],          key=lambda ix: A[ix][ix]     )  def rorydaulton(A):      triplets = [(x, i, j) for i, row in enumerate(A) for j, x in enumerate(row)]     pairs = [(i, j) for x, i, j in sorted(triplets)]      return pairs  def jpp(A):     def _create_array(data):         lens = np.array([len(i) for i in data])         mask = np.arange(lens.max()) < lens[:,None]         out = np.full(mask.shape, max(map(max, data))+1, dtype=int)  # Pad with max_value + 1         out[mask] = np.concatenate(data)         return out      def _apply_argsort(arr):         return np.dstack(np.unravel_index(np.argsort(arr.ravel()), arr.shape))      return _apply_argsort(_create_array(A))[:sum(map(len, A))]  def agngazer(A):     idx = np.argsort(np.fromiter(chain(*A), dtype=np.int))     return np.array(        tuple((i, j) for i, r in enumerate(A) for j, _ in enumerate(r))     )[idx] ``

Performance Benchmarking Code

``from timeit import timeit from itertools import chain  import pandas as pd import numpy as np import matplotlib.pyplot as plt  res = pd.DataFrame(        index=['cs1', 'cs2', 'rorydaulton', 'jpp', 'agngazer'],        columns=[10, 50, 100, 500, 1000, 5000, 10000, 50000],        dtype=float )  for f in res.index:      for c in res.columns:         l = [[1, 1, 3], [1, 2], [1, 1, 2, 4]] * c         stmt = '{}(l)'.format(f)         setp = 'from __main__ import l, {}'.format(f)         res.at[f, c] = timeit(stmt, setp, number=30)  ax = res.div(res.min()).T.plot(loglog=True)  ax.set_xlabel("N");  ax.set_ylabel("time (relative)");  plt.show(); ``