How to efficiently fillna(0) if series is all-nan, or else remaining non-nan entries are zero?

  • A+
Category:Languages

Given that I have a pandas Series, I want to fill the NaNs with zero if either all the values are NaN or if all the values are either zero or NaN.

For example, I would want to fill the NaNs in the following Series with zeroes.

0       0 1       0 2       NaN 3       NaN 4       NaN 5       NaN 6       NaN 7       NaN 8       NaN 

But, I would not want to fillna(0) the following Series:

0       0 1       0 2       2 3       0 4       NaN 5       NaN 6       NaN 7       NaN 8       NaN 

I was looking at the documentation and it seems like I could use pandas.Series.value_counts to ensure the values are only 0 and NaN, and then simply call fillna(0).In other words, I am looking to check if set(s.unique().astype(str)).issubset(['0.0','nan']), THEN fillna(0), otherwise do not.

Considering how powerful pandas is, it seemed like a there may be a better way to do this. Does anyone have any suggestions to do this cleanly and efficiently?

Potential solution thanks to cᴏʟᴅsᴘᴇᴇᴅ

if s.dropna().eq(0).all():     s = s.fillna(0) 


You can compare by 0 and isna if only NaNs and 0 and then fillna:

if ((s == 0) | (s.isna())).all():     s = pd.Series(0, index=s.index) 

Or compare unique values:

if pd.Series(s.unique()).fillna(0).eq(0).all():     s = pd.Series(0, index=s.index) 

@cᴏʟᴅsᴘᴇᴇᴅ solution, thank you - compare Series without NaNs with dropna:

 if s.dropna().eq(0).all():     s = pd.Series(0, index=s.index) 

Solution from question - need convert to strings, because problem with compare with NaNs:

if set(s.unique().astype(str)).issubset(['0.0','nan']):      s = pd.Series(0, index=s.index) 

Timings:

s = pd.Series(np.random.choice([0,np.nan], size=10000))  In [68]: %timeit ((s == 0) | (s.isna())).all() The slowest run took 4.85 times longer than the fastest. This could mean that an intermediate result is being cached. 1000 loops, best of 3: 574 µs per loop  In [69]: %timeit pd.Series(s.unique()).fillna(0).eq(0).all() 1000 loops, best of 3: 587 µs per loop  In [70]: %timeit s.dropna().eq(0).all() The slowest run took 4.65 times longer than the fastest. This could mean that an intermediate result is being cached. 1000 loops, best of 3: 774 µs per loop  In [71]: %timeit set(s.unique().astype(str)).issubset(['0.0','nan']) The slowest run took 5.78 times longer than the fastest. This could mean that an intermediate result is being cached. 10000 loops, best of 3: 157 µs per loop 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: