Pandas dataframe get value of last nonzero column

  • A+
Category:Languages

I have a pandas dataframe which contains 3 columns, each containing a site that a user has visited during a session.

In some cases, a user may have not visited 3 sites in a single session. This is shown by a 0, denoting that no site has been visited.

import pandas as pd  df = pd.DataFrame(data=[[5, 8, 1],[8,0,0],[1,17,0]],                    columns=['site1', 'site2', 'site3']) print(df)     site1  site2  site3 0      5      8      1 1      8      0      0 2      1     17      0 

In the example above, user 0 has visited sites 5, 8 and 1. User 1 has visited site 8 only, and user 2 has visited sites 1 and 17.

I would like to create a new column, last_site, which shows the last site visited by the user in that session.

The result I want is this:

   site1  site2  site3  last_site 0      5      8      1          1 1      8      0      0          8 2      1     17      0         17 

How can I do this in a concise way using pandas?

 


Use forward filling of misisng values created by replacing 0 values and thenselect last column by iloc:

df['last'] = df.replace(0, np.nan).ffill(axis=1).iloc[:, -1].astype(int) print (df)    site1  site2  site3  last 0      5      8      1     1 1      8      0      0     8 2      1     17      0    17 

If performance is important is possible use numpy:

a = df.values m = a != 0  df['last'] = a[np.arange(m.shape[0]), m.shape[1]-m[:,::-1].argmax(1)-1] print (df)    site1  site2  site3  last 0      5      8      1     1 1      8      0      0     8 2      1     17      0    17 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: