Reformat data using awk

  • A+
Category:Languages

I have a dataset that contains rows of UUIDs followed by locations and transaction IDs. The UUIDs are separated by a semi-colon (';') and the transactions are separated by tabs, like the following:

01234;LOC_1=ABC    LOC_1=BCD    LOC_2=CDE 56789;LOC_2=DEF    LOC_3=EFG 

I know all of the location codes in advance. What I want to do is transform this data into a format I can load into SQL/Postgres for analysis, like this:

01234;LOC_1=ABC 01234;LOC_1=BCD 01234;LOC_2=CDE 56789;LOC_2=DEF 56789;LOC_3=EFG 

I'm pretty sure I can do this easily using awk (or similar) by looking up location IDs from a file (ex. LOC_1) and matching any instance of the location ID and printing that out next to the UUID. I haven't been able to get it right yet, and any help is much appreciated!

My locations file is named location and my dataset is data. Note that I can edit the original file or write the results to a new file, either is fine.

 


You comment on knowing the locations and the mapping file makes me suspicious what your example seems to have done isn't exactly what is being asked - but it seems like you're wanting to reformat each set of tab delimited LOC= values into a row with their UUID in front.

If so, this will do the trick:

awk ' BEGIN {OFS=FS=";"} {split($2,locs,"/t"); for (n in locs) { print $1,locs[n]}}'  

Given:

$ cat -A data.txt  01234;LOC_1=ABC^ILOC_1=BCD^ILOC_2=CDE$  56789;LOC_2=DEF^ILOC_3=EFG$ 

Then:

$ awk ' BEGIN {OFS=FS=";"} {split($2,locs,"/t"); for (n in locs) { print $1,locs[n]}}' data.txt  01234;LOC_1=ABC  01234;LOC_1=BCD  01234;LOC_2=CDE  56789;LOC_2=DEF  56789;LOC_3=EFG 

The BEGIN {OFS=FS=";"} block sets the input and output delimiter to ;.

For each row, we then split the second field into an array named locs, splitting on tab, via - split($2,locs,"/t")

And then loop through locs printing the UUID and each loc value - for (n in locs) { print $1,locs[n]}

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: