R Find the “groups” of tupples

  • A+
Category:Languages

I try to find the "group" (id3) based on two variables (id1, id2):

df = data.frame(id1 = c(1,1,2,2,3,3,4,4,5,5),             id2 = c('a','b','a','c','c','d','x','y','y','z'),             id3 = c(rep('group1',6), rep('group2',4)))      id1 id2      id3 1    1   a   group1 2    1   b   group1 3    2   a   group1 4    2   c   group1 5    3   c   group1 6    3   d   group1 7    4   x   group2 8    4   y   group2 9    5   y   group2 10   5   z   group2    

For example id1=1 is related to a and b of id2. But id1=2 is also related to a so both belong to one group (id3=group1). But since id1=2 and id1=3 share id2=c, also id1=3 belongs to that group (id3=1). The values of the tupple ((1,2),('a','b','c')) appear no where else, so no other row belongs to that group (which is labeled gruop1 generically).

My idea was to create a table based on id3 which would subsequently populated in a loop.

solution = data.frame(id3= c('group1', 'group2'),id1=NA, id2=NA) group= 1   for (step in c(1:1000)) { # run many steps to make sure to get all values   solution$id1[group] = # populate     solution$id2[group] = # populate      if (fully populated) {     group = group +1   }}  

I am struggling to see how to populate.


Disclaimer: I asked a similar question here, but using names in id2 led a lot of people point me to fuzzy string procedures in R, which are not needed here, since there exist an exact solution. I also include all code I have tried since then in this post.

 


You can leverage on igraph to find the different clusters of networks

library(igraph) g <- graph_from_data_frame(df, FALSE) cg <- clusters(g)$membership df$id3 <- cg[df$id1] df 

output:

   id1 id2 id3 1    1   a   1 2    1   b   1 3    2   a   1 4    2   c   1 5    3   c   1 6    3   d   1 7    4   x   2 8    4   y   2 9    5   y   2 10   5   z   2 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: