Best way to flatten dataframe based on values on column

  • A+
Category:Languages

I have to process a whole dataframe with some hundered thousands rows, but I can simplify it as below:

df = pd.DataFrame([ ('a', 1, 1), ('a', 0, 0), ('a', 0, 1), ('b', 0, 0), ('b', 1, 0), ('b', 0, 1), ('c', 1, 1), ('c', 1, 0), ('c', 1, 0) ], columns=['A', 'B', 'C'])  print (df)     A  B  C 0  a  1  1 1  a  0  0 2  a  0  1 3  b  0  0 4  b  1  0 5  b  0  1 6  c  1  1 7  c  1  0 8  c  1  0 

My goal it to flatten the columns "B" and "C" based on the label they have in the "A" column

   A  B_1  B_2  B_3  C_1  C_2  C_3 0  a    1    0    0    1    0    1 3  b    0    1    0    0    0    1 6  c    1    1    1    1    0    0 

The code I wrote gives the result I want, but it is pretty slow as it uses a simple for loop on the unique labels. The solution I see is to write some vectorized function that optimize my code. Anyone has some idea? Below I append the code.

added_col = ['B_1', 'B_2', 'B_3', 'C_1', 'C_2', 'C_3']  new_df = df.drop(['B', 'C'], axis=1).copy() new_df = new_df.iloc[[x for x in range(0, len(df), 3)], :] new_df = pd.concat([new_df,pd.DataFrame(columns=added_col)], sort=False)  for e, elem in new_df['A'].iteritems():     new_df.loc[e, added_col] = df[df['A'] == elem].loc[:,['B','C']].T.values.flatten() 

 


Here is one way:

# create a row number by group df['rn'] = df.groupby('A').cumcount() + 1  # pivot the table new_df = df.set_index(['A', 'rn']).unstack()  # rename columns new_df.columns = [x + '_' + str(y) for (x, y) in new_df.columns]  new_df.reset_index() #   A  B_1  B_2  B_3  C_1  C_2  C_3 #0  a    1    0    0    1    0    1 #1  b    0    1    0    0    0    1 #2  c    1    1    1    1    0    0 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: