 A+
I have a dataframe df
looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three nonmissing columns then the average number is missing.
name day1 day2 day3 day4 day5 day6 day7 A 1 1 nan 2 3 0 3 B nan nan nan nan nan nan 3 C 1 1 0 1 1 1 1 D 1 1 0 1 nan 1 4
The expect output should looks like the following
name day1 day2 day3 day4 day5 day6 day7 expected A 1 1 nan 2 3 0 3 2 < 1/3*(day5 + day6 + day7) B nan nan nan nan nan nan 3 nan < less than 3 nonmissing C 1 1 0 1 1 1 1 1 < 1/3*(day5 + day6 + day7) D 1 1 0 1 nan 1 4 2 < 1/3 *(day4 + day6 + day7)
I know how to calculate the average of the last three column and count how many nonmissing observation are there. df.iloc[:, 5:7].count(axis=1) average of the last three column
df.iloc[:, 5:7].count(axis=1) number of nonnan in the last three column
If there are less than 3 nonmissing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3
.
But I am struggling to find a way to calculate the average of the last three nonmissing columns. Can anyone teach me how to solve this please?
Vectorized one using justify

N = 3 # last N entries for averaging avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,N:],1) df['expected'] = avg