Groupby and append lists and strings

  • A+
Category:Languages

I am trying to group-by the values in my "value_1" column. But my last column is made up of lists. When I try to group-by using my "value_1" column, the column made up of lists disappears.

Dataframe:

 value_1:        value_2:           value_3:               list:   american     california, nyc      walmart, kmart      [supermarket, connivence]   canadian         toronto            dunkinDonuts      [coffee]  american          texas                               [state]  canadian                             walmart          [supermarket]     ...              ...                 ...              .... 

My expected output is:

value_1:        value_2:              value_3:             list:  american   california, nyc, texas   walmart, kmart      [supermarket, connivence, state]  canadian         toronto         dunkinDonuts, walmart  [coffee, supermarket] 

Thanks!

 


Create dynamically dictionary by all columns with no list and value_1 and for list use lambda function with list comprehension with flatenning:

f1 = lambda x: ', '.join(x.dropna()) #alternative for join only strings #f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)]) f2 = lambda x: [z for y in x for z in y] d = dict.fromkeys(df.columns.difference(['value_1','list']), f1) d['list'] = f2   df = df.groupby('value_1', as_index=False).agg(d) print (df)      value_1                 value_2                value_3  / 0   american  california, nyc, texas         walmart, kmart    1   canadian                 toronto  dunkinDonuts, walmart                                    list   0  [supermarket, connivence, state]   1             [coffee, supermarket]   

Explanation:

f1 and f2 are lambda functions.

First remove missing values (if exist) and join strings with separator:

f1 = lambda x: ', '.join(x.dropna()) 

First get only strings values (omit missing values, because NaNs) and join strings with separator:

f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)]) 

First get all string values with filtering empty strings and join strings with separator:

f1 = lambda x: ', '.join([y for y in x if y != ''])  

Function f2 is for flatten lists, because after aggregation get nested lists like [['a','b'], ['c']]

f2 = lambda x: [z for y in x for z in y] 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: