Only copy one key-column into merged DataFrame

  • A+
Category:Languages

Consider the following DataFrames:

df1 = pd.DataFrame({'a': [0, 1, 2, 3], 'b': list('abcd')}) df2 = pd.DataFrame({'c': list('abcd'), 'd': 'Alex'}) 

In this instance, df1['b'] and df2['c'] are the key columns. So when merging:

df1.merge(df2, left_on='b', right_on='c')    a  b  c     d 0  0  a  a  Alex 1  1  b  b  Alex 2  2  c  c  Alex 3  3  d  d  Alex 

I end up with both key columns in the resultant DataFrame when I only need one. I've been using:

df1.merge(df2, left_on='b', right_on='c').drop('c', axis='columns') 

Is there a way to only keep one key column?

 


One way is to set b and c as the index of your frames respectively, and use join followed by reset_index:

df1.set_index('b').join(df2.set_index('c')).reset_index()     b  a     d 0  a  0  Alex 1  b  1  Alex 2  c  2  Alex 3  d  3  Alex 

This will be faster than the merge/drop method on large dataframes, mostly because drop is slow. @Bill's method is faster than my suggestion, and @W-B & @PiRsquared easily outspeed the other suggestions:

import timeit  df1 = pd.concat((df1 for _ in range(1000))) df2 = pd.concat((df2 for _ in range(1000)))  def index_method(df1 = df1, df2 = df2):     return df1.set_index('b').join(df2.set_index('c')).reset_index()   def merge_method(df1 = df1, df2=df2):     return df1.merge(df2, left_on='b', right_on='c').drop('c', axis='columns')  def rename_method(df1 = df1, df2 = df2):     return df1.rename({'b': 'c'}, axis=1).merge(df2)  def index_method2(df1 = df1, df2 = df2):     return df1.join(df2.set_index('c'), on='b')  def assign_method(df1 = df1, df2 = df2):     return df1.set_index('b').assign(c=df2.set_index('c').d).reset_index()  def map_method(df1 = df1, df2 = df2):     return df1.assign(d=df1.b.map(dict(df2.values)))  >>> timeit.timeit(index_method, number=10) / 10 0.7853091600998596 >>> timeit.timeit(merge_method, number=10) / 10 1.1696729859002517 >>> timeit.timeit(rename_method, number=10) / 10 0.4291436871004407 >>> timeit.timeit(index_method2, number=10) / 10 0.5037374985004135 >>> timeit.timeit(assign_method, number=10) / 10 0.0038641377999738325 >>> timeit.timeit(map_method, number=10) / 10 0.006620216699957382 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: