Union of sublists with common elements

  • A+

Say I have for example the following nested list:

L = [['John','Sayyed'], ['John' , 'Simon'] ,['bush','trump'],      ['Sam','Suri','NewYork'],['Suri','Orlando','Canada']] 

How can I group these sublists, by getting the union of sublists which have a common element with at least another sublist within the group? So for the previous example the result should be:

[['John','Sayyed','Simon'] ,['bush','trump'],  ['Sam','Suri','NewYork','Orlando','Canada']] 

Thus the first two sublists are joined as they share 'John'. Could someone please share their valuable thoughts ?


You can use networkx for that. Generate a graph, and add your list as the graph edges using add_edges_from. Then use connected_components, which will precisely give you a list of sets of the connected components in the graph:

import networkx as nx   L = [['John','Sayyed'], ['John' , 'Simon'] ,['bush','trump']  G=nx.Graph() G.add_edges_from(L) list(nx.connected_components(G))  [{'John', 'Sayyed', 'Simon'}, {'bush', 'trump'}] 


In the case of having sublists with more than 2 elements, you can get all the length 2 combinations from each sublist and use these as the network edges:

from itertools import combinations, chain  L = [['John','Sayyed'], [ 'John' , 'Simon'] ,['bush','trump'],      ['Sam','Suri','NewYork'],['Suri','Orlando','Canada']]  L2_nested = [list(combinations(l,2)) for l in L] L2 = list(chain.from_iterable(L2_nested)) #[('John', 'Sayyed'), ('John', 'Simon'), ('bush', 'trump'), ('Sam', 'Suri')...  G=nx.Graph() G.add_edges_from(L2) list(nx.connected_components(G))  [{'John', 'Sayyed', 'Simon'}, {'bush', 'trump'}, {'Canada', 'NewYork', 'Orlando', 'Sam', 'Suri'}] 

More detailed explanation on connected components:

In graph theory, a connected component (or just component) of an undirected graph is a subgraph in which any two vertices are connected to each other by paths, and which is connected to no additional vertices in the supergraph

So essentially, this code creates a graph, with edges from the list, where each edge is composed by two values u,v where u and v will be nodes connected by this edge.

And hence, the union of sublists with at least one sublist with a common element can be translated into a Graph Theory problem as all nodes that are reachable between each other through the existing paths. An example of this can be seen in the following graph, which has three connected components:

Union of sublists with common elements


:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: