Argsorting values in a list of lists

  • A+
Category:Languages

I have a list of lists A of length m. Every list of A contains positive numbers from {1, 2, ..., n}. The following is an example, where m = 3 and n = 4.

A = [[1, 1, 3], [1, 2], [1, 1, 2, 4]] 

I represent every number x in A as a pair (i, j) where A[i][j] = x. I would like to sort the numbers in A in non-decreasing order; breaking ties by lowest first index. That is, if A[i1][j1] == A[i2][j2], then (i1, j1) comes before (i2, j2) iff i1 <= i2.

In the example, I would like to return the pairs:

(0, 0), (0, 1), (1, 0), (2, 0), (2, 1), (1, 1), (2, 2), (0, 2), (2, 3) 

which represents the sorted numbers

1, 1, 1, 1, 1, 2, 2, 3, 4 

What I did is a naive approach that works as follows:

  • First I sort every list in A.
  • Then I iterate the numbers in {1, 2, ..., n} and the list A and add the pairs.

Code:

for i in range(m):      A[i].sort() S = [] for x in range(1, n+1):     for i in range(m):         for j in range(len(A[i])):             if A[i][j] == x:                 S.append((i, j)) 

I think this approach is not good. Can we do better?


list.sort

You can generate a list of indexes and then call list.sort with a key:

B = [(i, j) for i, x in enumerate(A) for j, _ in enumerate(x)] B.sort(key=lambda ix: A[ix[0]][ix[1]]) 

print(B) [(0, 0), (0, 1), (1, 0), (2, 0), (2, 1), (1, 1), (2, 2), (0, 2), (2, 3)] 

Note that on python-2.x where iterable unpacking in functions is supported, you can simplify the sort call a bit:

B.sort(key=lambda (i, j): A[i][j]) 

sorted

This is an alternative to the version above, and generates two lists (one in memory which sorted then works on, to return another copy).

B = sorted([        (i, j) for i, x in enumerate(A) for j, _ in enumerate(x)     ],      key=lambda ix: A[ix[0]][ix[1]] )  print(B) [(0, 0), (0, 1), (1, 0), (2, 0), (2, 1), (1, 1), (2, 2), (0, 2), (2, 3)] 

Performance

On popular demand, adding some timings and a plot.

Argsorting values in a list of lists

From the graph, we see that calling list.sort is more efficient than sorted. This is because list.sort performs an in-place sort, so there is no time/space overhead from creating a copy of the data which sorted has.

Functions

def cs1(A):     B = [(i, j) for i, x in enumerate(A) for j, _ in enumerate(x)]     B.sort(key=lambda ix: A[ix[0]][ix[1]])       return B  def cs2(A):     return sorted([            (i, j) for i, x in enumerate(A) for j, _ in enumerate(x)         ],          key=lambda ix: A[ix[0]][ix[1]]     )  def rorydaulton(A):      triplets = [(x, i, j) for i, row in enumerate(A) for j, x in enumerate(row)]     pairs = [(i, j) for x, i, j in sorted(triplets)]      return pairs  def jpp(A):     def _create_array(data):         lens = np.array([len(i) for i in data])         mask = np.arange(lens.max()) < lens[:,None]         out = np.full(mask.shape, max(map(max, data))+1, dtype=int)  # Pad with max_value + 1         out[mask] = np.concatenate(data)         return out      def _apply_argsort(arr):         return np.dstack(np.unravel_index(np.argsort(arr.ravel()), arr.shape))[0]      return _apply_argsort(_create_array(A))[:sum(map(len, A))]  def agngazer(A):     idx = np.argsort(np.fromiter(chain(*A), dtype=np.int))     return np.array(        tuple((i, j) for i, r in enumerate(A) for j, _ in enumerate(r))     )[idx] 

Performance Benchmarking Code

from timeit import timeit from itertools import chain  import pandas as pd import numpy as np import matplotlib.pyplot as plt  res = pd.DataFrame(        index=['cs1', 'cs2', 'rorydaulton', 'jpp', 'agngazer'],        columns=[10, 50, 100, 500, 1000, 5000, 10000, 50000],        dtype=float )  for f in res.index:      for c in res.columns:         l = [[1, 1, 3], [1, 2], [1, 1, 2, 4]] * c         stmt = '{}(l)'.format(f)         setp = 'from __main__ import l, {}'.format(f)         res.at[f, c] = timeit(stmt, setp, number=30)  ax = res.div(res.min()).T.plot(loglog=True)  ax.set_xlabel("N");  ax.set_ylabel("time (relative)");  plt.show(); 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: