# R Find the “groups” of tupples

• A+
Category：Languages

I try to find the "group" (`id3`) based on two variables (`id1`, `id2`):

``df = data.frame(id1 = c(1,1,2,2,3,3,4,4,5,5),             id2 = c('a','b','a','c','c','d','x','y','y','z'),             id3 = c(rep('group1',6), rep('group2',4)))      id1 id2      id3 1    1   a   group1 2    1   b   group1 3    2   a   group1 4    2   c   group1 5    3   c   group1 6    3   d   group1 7    4   x   group2 8    4   y   group2 9    5   y   group2 10   5   z   group2    ``

For example `id1=1` is related to `a` and `b` of `id2`. But `id1=2` is also related to `a` so both belong to one group (`id3=group1`). But since `id1=2` and `id1=3` share `id2=c`, also `id1=3` belongs to that group (`id3=1`). The values of the tupple `((1,2),('a','b','c'))` appear no where else, so no other row belongs to that group (which is labeled `gruop1` generically).

My idea was to create a table based on `id3` which would subsequently populated in a loop.

``solution = data.frame(id3= c('group1', 'group2'),id1=NA, id2=NA) group= 1   for (step in c(1:1000)) { # run many steps to make sure to get all values   solution\$id1[group] = # populate     solution\$id2[group] = # populate      if (fully populated) {     group = group +1   }}  ``

I am struggling to see how to populate.

Disclaimer: I asked a similar question here, but using names in `id2` led a lot of people point me to fuzzy string procedures in R, which are not needed here, since there exist an exact solution. I also include all code I have tried since then in this post.

You can leverage on `igraph` to find the different clusters of networks

``library(igraph) g <- graph_from_data_frame(df, FALSE) cg <- clusters(g)\$membership df\$id3 <- cg[df\$id1] df ``

output:

``   id1 id2 id3 1    1   a   1 2    1   b   1 3    2   a   1 4    2   c   1 5    3   c   1 6    3   d   1 7    4   x   2 8    4   y   2 9    5   y   2 10   5   z   2 ``