How to append list elements over long format in Python Pandas

  • A+

I have the following data:

study_id       list_value
1              ['aaa', 'bbb']
1              ['aaa']
1              ['ccc']
2              ['ddd', 'eee', 'aaa']
2              np.NaN
2              ['zzz', 'aaa', 'bbb']

How can I convert it into something like this?

study_id       list_value
1              ['aaa', 'bbb', 'ccc']
1              ['aaa', 'bbb', 'ccc']
1              ['aaa', 'bbb', 'ccc']
2              ['aaa', 'bbb', 'ddd', 'eee', 'zzz'] 
2              ['aaa', 'bbb', 'ddd', 'eee', 'zzz'] 
2              ['aaa', 'bbb', 'ddd', 'eee', 'zzz'] # order of list item doesn't matter

itertools.chain with GroupBy.transform
First, get rid of NaNs inside your column using a list comprehension (messy, I know, but this is the fastest way to do it).

df['list_value'] = [
    [] if not isinstance(x, list) else x for x in df.list_value

Next, group on study_id and flatten your lists inside GroupBy.transform and extract unique values using a set.

from itertools import chain

df['list_value'] = df.groupby('study_id').list_value.transform(
    lambda x: [list(set(chain.from_iterable(x)))]

As a last step, if you plan to mutate individual list items, you may want to do

df['list_value'] = [x[:] for x in df['list_value']]

If not, changes in one list will be reflected across all sublists in that group.

   study_id                 list_value
0         1            [aaa, ccc, bbb]
1         1            [aaa, ccc, bbb]
2         1            [aaa, ccc, bbb]
3         2  [bbb, ddd, eee, aaa, zzz]
4         2  [bbb, ddd, eee, aaa, zzz]
5         2  [bbb, ddd, eee, aaa, zzz]


:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: