subtracting columns based on matching dataframes in Pandas

  • A+
Category:Languages

I have two dataframes looking like

df1:

   ID    A   B   C   D  0 'ID1' 0.5 2.1 3.5 6.6 1 'ID2' 1.2 5.5 4.3 2.2 2 'ID1' 0.7 1.2 5.6 6.0  3 'ID3' 1.1 7.2 10. 3.2 

df2:

   ID    A   B   C   D  0 'ID1' 1.0 2.0 3.3 4.4 1 'ID2' 1.5 5.0 4.0 2.2 2 'ID3' 0.6 1.2 5.9 6.2  3 'ID4' 1.1 7.2 8.5 3.0 

df1 can have multiple entries with the same ID whereas each ID occurs only once in df2. Also not all ID in df2 are necessarily present in df1. I can't solve this by using set_index() as multiple rows in df1 can have the same ID, and that the ID in df1 and df2 are not aligned.

I want to create a new dataframe where I subtract the values in df2[['A','B','C','D']] from df1[['A','B','C','D']] based on matching the ID.

The resulting dataframe would look like:

df_new:

   ID     A    B   C   D  0 'ID1' -0.5  0.1 0.2 2.2 1 'ID2' -0.3  0.5 0.3 0.0 2 'ID1' -0.3 -0.8 2.3 1.6 3 'ID3'  0.5  6.0 1.5 0.2 

I know how to do this with a loop, but since I'm dealing with huge data quantities this is not practical at all. What is the best way of approaching this with Pandas?


You just need set_index and subtract

(df1.set_index('ID')-df2.set_index('ID')).dropna(axis=0) Out[174]:           A    B    C    D ID                        'ID1' -0.5  0.1  0.2  2.2 'ID1' -0.3 -0.8  2.3  1.6 'ID2' -0.3  0.5  0.3  0.0 'ID3'  0.5  6.0  4.1 -3.0 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: