Making pairs of words based on one column

  • A+
Category:Languages

I want to make pairs of words based on the third column (identifier). My file is similar to this example:

A ID.1 B ID.2 C ID.1 D ID.1 E ID.2 F ID.3   

The result I want is:

A C ID.1 A D ID.1 B E ID.2 C D ID.1 

Note that I don't want to obtain the same word pair in the opposite order. In my real file some words appear more than one time with different identifiers.

I tried this code which works well but requires a lot of time (and I don't know if there are redundancies):

counter=2 cat filtered_go_annotation.txt | while read f1 f2; do  tail -n +$counter go_annotation.txt | grep $f2 | awk '{print "'$f1' " $1}';  ((counter++)) done > go_network2.txt 

The 'tail' is used to delete a line when it's read.


in two steps

$ sort -k2 file > file.s $ join -j2 file.s{,} | awk '!(a[$2,$3]++ + a[$3,$2]++){print $2,$3,$1}'  A C ID.1 A D ID.1 C D ID.1 B E ID.2 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: