Manipulate pandas dataframe to display desired output

  • A+
Category:Languages

I have the following DataFrame structure:

profile_id  user   birthday 123, 124    test1  day1 131, 132    test2  day2 

What I need to display is:

profile_id  user   birthday 123        test1   day1  124        test1   day1 131        test2   day2 132        test2   day2 

In the profile_id column I have a couple of ids separated with a comma, and I need to loop through each id.

 


One-liner

df.loc[df.index.repeat(df.profile_id.str.count(', ') + 1)].assign(     profile_id=', '.join(df.profile_id).split(', '))    profile_id   user birthday 0        123  test1     day1 0        124  test1     day1 1        131  test2     day2 1        132  test2     day2 

Broken down

sep = ', ' idx = df.index.repeat(df.profile_id.str.count(sep) + 1) new = sep.join(df.profile_id).split(sep) df.loc[idx].assign(profile_id=new)    profile_id   user birthday 0        123  test1     day1 0        124  test1     day1 1        131  test2     day2 1        132  test2     day2 

Numpy slice instead of loc

also get a fresh index

sep = ', ' col = 'profile_id' p = df[col] i = np.arange(len(df)).repeat(p.str.count(sep) + 1) pd.DataFrame({     col: sep.join(p).split(sep),     **{c: df[c].values[i] for c in df if c != col} }, columns=df.columns)    profile_id   user birthday 0        123  test1     day1 1        124  test1     day1 2        131  test2     day2 3        132  test2     day2 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: