- A+

Category：Languages

I am trying to apply a function, cumulatively, to values that lie within a window defined by 'start' and 'finish' columns. So, 'start' and 'finish' define the intervals where the value is 'active'; for each row, I want to get a sum of all 'active' values at the time.

Here is a 'bruteforce' example that does what I am after - is there a more elegant, faster or more memory efficient way of doing this?

`df = pd.DataFrame(data=[[1,3,100], [2,4,200], [3,6,300], [4,6,400], [5,6,500]], columns=['start', 'finish', 'val']) df['dummy'] = 1 df = df.merge(df, on=['dummy'], how='left') df = df[(df['start_y'] <= df['start_x']) & (df['finish_y'] > df['start_x'])] val = df.groupby('start_x')['val_y'].sum() `

Originally, df is:

` start finish val 0 1 3 100 1 2 4 200 2 3 6 300 3 4 6 400 4 5 6 500 `

The result I am after is:

`1 100 2 300 3 500 4 700 5 1200 `

`numba`

`from numba import njit @njit def pir_numba(S, F, V): mn = S.min() mx = F.max() out = np.zeros(mx) for s, f, v in zip(S, F, V): out[s:f] += v return out[mn:] pir_numba(*[df[c].values for c in ['start', 'finish', 'val']]) `

`np.bincount`

`s, f, v = [df[col].values for col in ['start', 'finish', 'val']] np.bincount([i - 1 for r in map(range, s, f) for i in r], v.repeat(f - s)) array([ 100., 300., 500., 700., 1200.]) `

### Comprehension

This depends on the `index`

being unique

`pd.Series({ (k, i): v for i, s, f, v in df.itertuples() for k in range(s, f) }).sum(level=0) 1 100 2 300 3 500 4 700 5 1200 dtype: int64 `

With no dependence on `index`

`pd.Series({ (k, i): v for i, (s, f, v) in enumerate(zip(*map(df.get, ['start', 'finish', 'val']))) for k in range(s, f) }).sum(level=0) `