sed with simultaneous and sequential replace

  • A+
Category:Languages

I'm not sure this is possible to do what I want in sed (or awk or any bash tool):

I want to make a script that replaces : ) in a string by <happy> and ) : by <sad>. This can easily be done with sed with:

echo "test : )" | sed 's/: )/<happy>/g' echo "test ) :" | sed 's/) :/<sad>/g' 

Unfortunately, sometimes I have strings like these:

I'm happy : ) : ) : ) I'm sad ) : ) : ) : 

In that case, the output should be:

I'm happy <happy> <happy> <happy> I'm sad <sad> <sad> <sad> 

But by combining the two commands above:

echo "I'm happy : ) : ) : )" | sed 's/: )/<happy>/g' | sed 's/) :/<sad>/g' echo "I'm sad ) : ) : ) :" | sed 's/: )/<happy>/g' | sed 's/) :/<sad>/g' 

I will get:

I'm happy <happy> <happy> <happy> I'm sad ) <happy> <happy> : 

The way to solve this would be to do both replacements in parallel, by treating the string from left to right. I tried to use something like this: sed 's/a/b/g;s/c/d/g' but the replacement is only done one pattern after one other, and doesn't solve the problem.

 


With GNU awk for the 3rd arg to match():

$ cat script1.awk BEGIN {     map[": )"] = "<happy>"     map[") :"] = "<sad>" } {     while ( match($0,/(.*)(: /)|/) :)(.*)/,a) ) {         $0 = a[1] map[a[2]] a[3]     }     print }  $ awk -f script1.awk file I'm happy <happy> <happy> <happy> I'm sad <sad> <sad> <sad> 

With any awk:

$ cat script2.awk BEGIN {     map[": )"] = "<happy>"     map[") :"] = "<sad>" } {     while ( match($0,/: /)|/) :/) ) {         $0 = substr($0,1,RSTART-1) map[substr($0,RSTART,RLENGTH)] substr($0,RSTART+RLENGTH)     }     print }  $ awk -f script2.awk file I'm happy <happy> <happy> <happy> I'm sad <sad> <sad> <sad> 

Although both approaches produce the same output in this case, the first approach actually works from the end of the string to the front courtesy of the leading .* while the second approach works front to back. You can see that with this test:

$ echo ': ) :' | awk -f script1.awk : <sad>  $ echo ': ) :' | awk -f script2.awk <happy> : 

You can do a back-to-front pass with any awk with a tweak but I don't think that's what you really want anyway.


Edit to build the regexp from the map:

$ cat tst.awk BEGIN {     map[": )"] = "<happy>"     map[") :"] = "<sad>"     for (emoji in map) {         gsub(/[^^]/,"[&]",emoji)         gsub(//^/,"//^",emoji)         emojis = (emojis == "" ? "" : emojis "|") emoji     } } {     while ( match($0,emojis) ) {         $0 = substr($0,1,RSTART-1) map[substr($0,RSTART,RLENGTH)] substr($0,RSTART+RLENGTH)     }     print }  $ awk -f tst.awk file I'm happy <happy> <happy> <happy> I'm sad <sad> <sad> <sad> 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: