- A+
Category:Languages
Any quick way to achieve the below output pls?
Input:
Code Items 123 eq-hk 456 ca-eu; tp-lbe 789 ca-us 321 go-ch 654 ca-au; go-au 987 go-jp 147 co-ml; go-ml 258 ca-us 369 ca-us; ca-my 741 ca-us 852 ca-eu 963 ca-ml; co-ml; go-ml
Output:
Code eq ca go co tp 123 hk 456 eu lbe 789 us 321 ch 654 au au 987 jp 147 ml ml 258 us 369 us,my 741 us 852 eu 963 ml ml ml
Am again running into loops and a very ugly code to make it work. If there is an elegant way to achieve this pls?
Thank you!
import pandas as pd df = pd.DataFrame([ ('123', 'eq-hk'), ('456', 'ca-eu; tp-lbe'), ('789', 'ca-us'), ('321', 'go-ch'), ('654', 'ca-au; go-au'), ('987', 'go-jp'), ('147', 'co-ml; go-ml'), ('258', 'ca-us'), ('369', 'ca-us; ca-my'), ('741', 'ca-us'), ('852', 'ca-eu'), ('963', 'ca-ml; co-ml; go-ml')], columns=['Code', 'Items']) # Get item type list from each row, sum (concatenate) the lists and convert # to a set to remove duplicates item_types = set(df['Items'].str.findall('(/w+)-').sum()) print(item_types) # {'ca', 'co', 'eq', 'go', 'tp'} # Generate a column for each item type df1 = pd.DataFrame(df['Code']) for t in item_types: df1[t] = df['Items'].str.findall('%s-(/w+)' % t).apply(lambda x: ''.join(x)) print(df1) # Code ca tp eq co go #0 123 hk #1 456 eu lbe #2 789 us #3 321 ch #4 654 au au #5 987 jp #6 147 ml ml #7 258 us #8 369 usmy #9 741 us #10 852 eu #11 963 ml ml ml