Union of two pandas DataFrames

  • A+
Category:Languages

Say I have two data frames:

df1:

  A 0 a 1 b 

df2:

  A 0 a 1 c 

I want the result to be the union of the two frames with an extra column showing the source data frame that the row belongs to. In case of duplicates, duplicates should be removed and the respective extra column should show both sources:

  A  B 0 a  df1, df2 1 b  df1 2 c  df2 

I can get the concatenated data frame (df3) without duplicates as follows:

import pandas as pd df3=pd.concat([df1,df2],ignore_index=True).drop_duplicates().reset_index(drop=True) 

I can't think of/find a method to have control over what element goes where. How can I add the extra column?

Thank you very much for any tips.

 


Merge with an indicator argument, and remap the result:

m = {'left_only': 'df1', 'right_only': 'df2', 'both': 'df1, df2'}  result = df1.merge(df2, on=['A'], how='outer', indicator='B') result['B'] = result['B'].map(m)  result    A         B 0  a  df1, df2 1  b       df1 2  c       df2 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: