Split and sort value in dataframe

  • A+
Category:Languages

I have a raw data dataframe for example:

      Data 0   XZ_1A-2A 1   XZ_3C 2   XZ_4B-5A 3   XZ_18A-20C 

And I want to split (and delete XZ_) it to be

      Data        T1      T2 0   XZ_1A-2A      1A      2A 1   XZ_3C         3C 2   XZ_4B-5A      4B      5A 3   XZ_18A-20C   18A     20C 

Then sort by number

       T1      T2 0      2A      1A 1      3C 2      5A      4B 3     20C     18A      

Thanks for your help.


[fixed]

Update_1:
When input data:

      Data 0   XZ_17A-1A 1   XZ_5C 2   XZ_3A-28A 

and compile with

df= (df['Data'].str.split('_|-',expand=True)[[1,2]]).rename(columns={1:'T2',2:'T1'}).fillna('') asc=pd.DataFrame(np.sort(df[['T1','T2']],axis=1))  des= asc[asc.columns.values[::-1]] 

I got the output

     1    0 0   1A  17A 1   5C   2   3A  28A  

Update_2:
If there are more than 2 items (about 1~5) in simgle row, how to sort it?

       Data 0   XZ_17A-1A 1   XZ_5C 2   XZ_3A-28A 3   XZ_5A-19A-42C    4   XZ_3A-28A-41A-42A 

 


Try this:

df= df['Data'].str.split('_|-',expand=True)[[1,2]] 

Output:

     1     2 0   1A    2A 1   3C  None 2   4B    5A 3  18A   20C 

To get sorted value use below,

df= (df['Data'].str.split('_|-',expand=True)[[1,2]]).rename(columns={1:'T2',2:'T1'}).fillna('') asc= df.apply(np.sort,axis=1) asc=pd.DataFrame(np.sort(df[['T1','T2']],axis=1)) #alternative way des= asc[asc.columns.values[::-1]] print des 

Output:

    T1    T2 0   2A    1A 1   3C      2   5A    4B 3  20C   18A 

Explanation:

a) after cleanup the data, sort the dataframe by ascending order based on row values using np.sort,axis=1

b) To get descending order reverse the column order.

c) use fillna to get exact result.

Edit:

df= (df['Data'].str.split('_|-',expand=True)[[1,2]]).rename(columns={1:'T2',2:'T1'})  df['n1']=df['T1'].str.extract('(/d+)').astype(float) df['n2']=df['T2'].str.extract('(/d+)').astype(float) res=pd.DataFrame() res['result'] =df.apply(lambda x: [x['T1'],x['T2']] if x['n1']>x['n2'] else [x['T2'],x['T1']],axis=1).fillna('') res[['T1','T2']]=res['result'].astype(str).str.replace("/[|/]|'",'').str.split(',',expand=True) 

Output:

       result   T1     T2 0   [17A, 1A]  17A     1A 1  [5C, None]   5C       2   [28A, 3A]  28A     3A 

For multiple column see below example,

df= (df['Data'].str.split('_|-',expand=True)[[1,2]]).rename(columns={1:'T2',2:'T1'}).fillna('') df['n1']=df['T1'].str.extract('(/d+)').astype(float) df['n2']=df['T2'].str.extract('(/d+)').astype(float) df['n3']=[432,4,15] res=pd.DataFrame() res['result'] =df.apply(lambda x: sorted([x['n1'],x['n2'],x['n3']],reverse=True),axis=1) res[['T1','T2','T3']]=res['result'].astype(str).str.replace("/[|/]|'",'').str.split(',',expand=True) 

Input:

    T2   T1    n1    n2   n3 0  17A   1A   1.0  17.0  432 1   5C        NaN   5.0    4 2   3A  28A  28.0   3.0   15 

Output:

             result    T1     T2    T3 0  [432, 17.0, 1.0]   432   17.0   1.0 1     [nan, 5.0, 4]   nan    5.0     4 2   [28.0, 15, 3.0]  28.0     15   3.0 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: