How to calculate the average of the most recent three non-nan value using Python

  • A+
Category:Languages

I have a dataframe df looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.

name day1 day2 day3 day4  day5 day6 day7 A    1     1   nan   2    3    0   3 B    nan   nan nan   nan  nan  nan 3 C    1     1   0     1    1    1   1 D    1     1   0     1    nan  1   4 

The expect output should looks like the following

name day1 day2 day3 day4  day5 day6 day7    expected  A    1     1   nan   2    3    0   3        2     <-  1/3*(day5 + day6 + day7) B    nan   nan nan   nan  nan  nan 3        nan   <-  less than 3 non-missing C    1     1   0     1    1    1   1        1     <-  1/3*(day5 + day6 + day7) D    1     1   0     1    nan  1   4        2    <-  1/3 *(day4 + day6 + day7) 

I know how to calculate the average of the last three column and count how many non-missing observation are there. df.iloc[:, 5:7].count(axis=1) average of the last three column df.iloc[:, 5:7].count(axis=1) number of non-nan in the last three column

If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3.

But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?

 


Vectorized one using justify -

N = 3 # last N entries for averaging avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1) df['expected'] = avg 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: