Merging data based on matching first column in Python

  • A+
Category:Languages

I currently have two sets of data files that look like this:

File 1:

test1 ba ab cd dh gf test2 fa ab cd dh gf test3 rt ty er wq ee test4 er rt sf sd sa 

and in file 2:

test1 123 344 123 test1 234 567 787 test1 221 344 566 test3 456 121 677 

I would like to combine the files based on mathching rows in the first column (so that "tests" match up)

like so:

test1 ba ab cd dh gf 123 344 123 test1 ba ab cd dh gf 234 567 787 test1 ba ab cd dh gf 221 344 566 test3 rt ty er wq ee 456 121 677 

I have this Code

def combineFiles(file1,file2,outfile):       def read_file(file):          data = {}          for line in csv.reader(file):             data[line[0]] = line[1:]          return data       with open(file1, 'r') as f1, open(file2, 'r') as f2:          data1 = read_file(f1)          data2 = read_file(f2)          with open(outfile, 'w') as out:             wtr= csv.writer(out)             for key in data1.keys():                try:                   wtr.writerow(((key), ','.join(data1[key]), ','.join(data2[key])))                except KeyError:                   pass 

However the output ends up looking like this:

test1 ba ab cd dh gf 123 344 123 test3 er rt sf sd sa 456 121 677 

Can anyone help me with how to make the output so that test1 can be printed all three times?

Much Appreciated

 


While I would recommend Brad Solomon's approach as it's pretty succinct, you need a pretty small change in your code.

Since your second file is the one that has the "final say", you just need to create a dictionary for the first file. Then you can write the output file as you read from the second file, fetching values from the data1 dictionary as you go:

with open(file1, 'r') as f1, open(file2, 'r') as f2:     data1 = read_file(f1)     with open(outfile, 'w') as out:         wtr = csv.writer(out, delimiter=' ')         for line in csv.reader(f2, delimiter=' '):             # only write if there is a corresponding line in file1             if line[0] in data1:                 # as you write, get the corresponding file1 data                 wtr.writerow(line[0:] + data1[line[0]] + line[1:]) 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: