R Find the “groups” of tuples

• A+
Category：Languages

I try to find the "group" (id3) based on two variables (id1, id2):

df = data.frame(id1 = c(1,1,2,2,3,3,4,4,5,5),             id2 = c('a','b','a','c','c','d','x','y','y','z'),             id3 = c(rep('group1',6), rep('group2',4)))      id1 id2      id3 1    1   a   group1 2    1   b   group1 3    2   a   group1 4    2   c   group1 5    3   c   group1 6    3   d   group1 7    4   x   group2 8    4   y   group2 9    5   y   group2 10   5   z   group2

For example id1=1 is related to a and b of id2. But id1=2 is also related to a so both belong to one group (id3=group1). But since id1=2 and id1=3 share id2=c, also id1=3 belongs to that group (id3=1). The values of the tuple ((1,2),('a','b','c')) appear no where else, so no other row belongs to that group (which is labeled group1 generically).

My idea was to create a table based on id3 which would subsequently populated in a loop.

solution = data.frame(id3= c('group1', 'group2'),id1=NA, id2=NA) group= 1   for (step in c(1:1000)) { # run many steps to make sure to get all values   solution\$id1[group] = # populate     solution\$id2[group] = # populate      if (fully populated) {     group = group +1   }}

I am struggling to see how to populate.

Disclaimer: I asked a similar question here, but using names in id2 led a lot of people point me to fuzzy string procedures in R, which are not needed here, since there exist an exact solution. I also include all code I have tried since then in this post.

You can leverage on igraph to find the different clusters of networks

library(igraph) g <- graph_from_data_frame(df, FALSE) cg <- clusters(g)\$membership df\$id3 <- cg[df\$id1] df

output:

id1 id2 id3 1    1   a   1 2    1   b   1 3    2   a   1 4    2   c   1 5    3   c   1 6    3   d   1 7    4   x   2 8    4   y   2 9    5   y   2 10   5   z   2