Sum a column by ID, but skip the first instance?

  • A+
Category:Languages

I have a dataframe like the following.

A = [{'ID':1, 'Period':1, 'Variable':21}, {'ID':1,'Period':2, 'Variable':12},        {'ID':2, 'Period':2, 'Variable':14}, {'ID':2, 'Period':3, 'Variable':18}]  df = pd.DataFrame(A) 

I would essentially like to do an operation like df.groupby('ID').sum() to get the sum of the Variable column, but I need to skip the first period observed for a particular ID. So, for ID=1, I am dropping the observation at period 1, but for ID=2, I am dropping the observation at period 2.

How can I do this?

 


You can slice within each group to ignore the first row and call sum:

In[46]: df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())  Out[46]:  ID 1    12 2    18 Name: Variable, dtype: int64 

If you want all the columns:

In[47]: df.groupby('ID').apply(lambda x: x.iloc[1:].sum())  Out[47]:      ID  Period  Variable ID                       1    1       2        12 2    2       3        18 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: