pandas pivot and join in two dataframes

  • A+
Category:Languages

I have two dataFrames :

df1    mag   cat 0  101   A1 1  256   A2   2  760   A2 3  888   A3   ...  df2    A1    A2    A3    ... 0  E50R  AZ33  REZ3  1  T605  YYU6  YHG5 2  IR50  P0O9  BF53 3  NaN   YY9I  NaN 

And I would like to create a final DataFrame which looks like :

df    101   256   760   888  ... 0  E50R  AZ33  AZ33  REZ3 1  T605  YYU6  YYU6  YHG5 2  IR50  P0O9  P0O9  BF53 3  NaN   YY9I  YY9I  NaN 

I tried something with pivot, but it doesn't seem to do the job Could you help me ?

 


You can use a combination of GroupBy, numpy.repeat, itertools.chain:

from itertools import chain  # map cat to list of mag s = df1.groupby('cat')['mag'].apply(list)  # calculate indices for columns, including repeats cols_idx = np.repeat(range(len(df2.columns)), s.map(len))  # apply indexing res = df2.iloc[:, cols_idx]  # rename columns res.columns = list(chain.from_iterable(df2.columns.map(s.get)))  print(res)      101   256   760   888 0  E50R  AZ33  AZ33  REZ3 1  T605  YYU6  YYU6  YHG5 2  IR50  P0O9  P0O9  BF53 3   NaN  YY9I  YY9I   NaN 

Performance benchmarking

Some good and different solutions here, so you may be interested in performance. Wen's reindex solution is the clear winner.

%timeit wen(df1, df2)   # 632 µs per loop %timeit jpp(df1, df2)   # 2.55 ms per loop %timeit scb(df1, df2)   # 7.98 ms per loop %timeit abhi(df1, df2)  # 4.52 ms per loop 

Code:

def jpp(df1, df2):     s = df1.groupby('cat')['mag'].apply(list)     cols_idx = np.repeat(range(len(df2.columns)), s.map(len))     res = df2.iloc[:, cols_idx]     res.columns = list(chain.from_iterable(df2.columns.map(s.get)))         return res  def scb(df1, df2):     df_out = (df2.stack().reset_index()                  .merge(df1, left_on='level_1', right_on='cat')[['level_0','mag',0]])     return df_out.pivot('level_0','mag',0).reset_index(drop=True)      def abhi(df1, df2):     return df2.T.merge(df1, left_index=True, right_on='cat').drop('cat', axis=1).set_index('mag').T  def wen(df1, df2):     newdf=df2.reindex(columns=df1.cat)     newdf.columns=df1.mag     return newdf 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: