- A+

## Objective and Motivation

I've seen this kind of question several times over and have seen many other questions that involve some element of this. Most recently, I had to spend a bit of time explaining this concept in comments while looking for an appropriate canonical Q&A. I did not find one and so I thought I'd write one.

This question usually arises with respect to a specific operation but equally applies to most arithmetic operations.

- How do I subtract a
`Series`

from every column in a`DataFrame`

? - How do I add a
`Series`

from every column in a`DataFrame`

? - How do I multiply a
`Series`

from every column in a`DataFrame`

? - How do I divide a
`Series`

from every column in a`DataFrame`

?

## The Question

Given a `Series`

`s`

and `DataFrame`

`df`

. How do I operate on each column of `df`

with `s`

?

`df = pd.DataFrame( [[1, 2, 3], [4, 5, 6]], index=[0, 1], columns=['a', 'b', 'c'] ) s = pd.Series([3, 14], index=[0, 1]) `

When I attempt to add them, I get all `np.nan`

`df + s a b c 0 1 0 NaN NaN NaN NaN NaN 1 NaN NaN NaN NaN NaN `

What I thought I should get is

` a b c 0 4 5 6 1 18 19 20 `

Please bear the preamble. It's important to address some higher level concepts first. Since my motivation is to share knowledge and teach, I wanted to make this as clear as possible.

It is helpful to create a mental model of what `Series`

and `DataFrame`

objects are.

# Anatomy of a `Series`

A `Series`

should be thought of as an enhanced dictionary. This isn't always a perfect analogy, but we'll start here. Also, there are other analogies that you can make but I am targetting a dictionary in order to demonstrate the purpose of this post.

`index`

These are the keys that we can reference to get at the corresponding values. When the elements of the index are unique, the comparison to a dictionary becomes very close.

`values`

These are the corresponding values that are keyed by the index.

# Anatomy of a `DataFrame`

A `DataFrame`

should be thought of as a dictionary of `Series`

or a `Series`

of `Series`

. In this case the keys are the column names and the values are the columns themselves as `Series`

objects. Each `Series`

agrees to share the same `index`

which is the index of the `DataFrame`

.

`columns`

These are the keys that we can reference to get at the corresponding `Series`

.

`index`

This the the index that all of the `Series`

values agree to share.

## Note: RE: `columns`

and `index`

objects

They are the same kind of things. A `DataFrame`

s `index`

can be used as another `DataFrame`

s `columns`

. In fact, this happens when you do `df.T`

to get a transpose.

`values`

This is a 2 dimensional array that contains the data in a `DataFrame`

. The reality is that `values`

is **NOT** what is stored inside the `DataFrame`

object. (Well sometimes it is, but I'm not about to try to describe the block mananager). The point is, it is better to think of this as access to a 2 dimensional array of the data.

# Define Sample Data

These are sample `pandas.Index`

objects that can be used as the `index`

of a `Series`

or `DataFrame`

or can be used as the `columns`

of a `DataFrame`

`idx_lower = pd.Index([*'abcde'], name='lower') idx_range = pd.RangeIndex(5, name='range') `

These are sample `pandas.Series`

objects that use the `pandas.Index`

objects above

`s0 = pd.Series(range(10, 15), idx_lower) s1 = pd.Series(range(30, 40, 2), idx_lower) s2 = pd.Series(range(50, 10, -8), idx_range) `

These are sample `pandas.DataFrame`

objects that use the `pandas.Index`

objects above

`df0 = pd.DataFrame(100, index=idx_range, columns=idx_lower) df1 = pd.DataFrame( np.arange(np.product(df0.shape)).reshape(df0.shape), index=idx_range, columns=idx_lower ) `

`Series`

on `Series`

When operating on two `Series`

, the alignment is obvious. You align the `index`

of one `Series`

with the `index`

of the other.

`s1 + s0 lower a 40 b 43 c 46 d 49 e 52 dtype: int64 `

Which is the same as when I randomly shuffle one before I operate. The indices will sitll align.

`s1 + s0.sample(frac=1) lower a 40 b 43 c 46 d 49 e 52 dtype: int64 `

And is **NOT** the case when instead I operate with the values of the shuffled `Series`

. In this case, Pandas doesn't have the `index`

to align with and therefore operates from a positions.

`s1 + s0.sample(frac=1).values lower a 42 b 42 c 47 d 50 e 49 dtype: int64 `

Add a scalar

`s1 + 1 lower a 31 b 33 c 35 d 37 e 39 dtype: int64 `

`DataFrame`

on `DataFrame`

Similar is true when operating between two `DataFrame`

s

The alignment is obvious and does what we think it should do

`df0 + df1 lower a b c d e range 0 100 101 102 103 104 1 105 106 107 108 109 2 110 111 112 113 114 3 115 116 117 118 119 4 120 121 122 123 124 `

Shuffle second `DataFrame`

on both axes. The `index`

and `columns`

will still align and give us the same thing.

`df0 + df1.sample(frac=1).sample(frac=1, axis=1) lower a b c d e range 0 100 101 102 103 104 1 105 106 107 108 109 2 110 111 112 113 114 3 115 116 117 118 119 4 120 121 122 123 124 `

Same shuffling but add the array and not the `DataFrame`

. No longer aligned and will get different results.

`df0 + df1.sample(frac=1).sample(frac=1, axis=1).values lower a b c d e range 0 123 124 121 122 120 1 118 119 116 117 115 2 108 109 106 107 105 3 103 104 101 102 100 4 113 114 111 112 110 `

Add 1 dimensional array. Will align with columns and broadcast across rows.

`df0 + [*range(2, df0.shape[1] + 2)] lower a b c d e range 0 102 103 104 105 106 1 102 103 104 105 106 2 102 103 104 105 106 3 102 103 104 105 106 4 102 103 104 105 106 `

Add a scalar. Nothing to align with so broadcasts to everything

`df0 + 1 lower a b c d e range 0 101 101 101 101 101 1 101 101 101 101 101 2 101 101 101 101 101 3 101 101 101 101 101 4 101 101 101 101 101 `

`DataFrame`

on `Series`

If `DataFrame`

s are to be though of as dictionaries of `Series`

and `Series`

are to be thought of as dictionaries of values, then it is natural that when operating between a `DataFrame`

and `Series`

that they should be aligned by their "keys".

`s0: lower a b c d e 10 11 12 13 14 df0: lower a b c d e range 0 100 100 100 100 100 1 100 100 100 100 100 2 100 100 100 100 100 3 100 100 100 100 100 4 100 100 100 100 100 `

And when we operate, the `10`

in `s0['a']`

gets added to the entire column of `df0['a']`

`df0 + s0 lower a b c d e range 0 110 111 112 113 114 1 110 111 112 113 114 2 110 111 112 113 114 3 110 111 112 113 114 4 110 111 112 113 114 `

### Heart of the issue and point of the post

What about if I want `s2`

and `df0`

?

`s2: df0: | lower a b c d e range | range 0 50 | 0 100 100 100 100 100 1 42 | 1 100 100 100 100 100 2 34 | 2 100 100 100 100 100 3 26 | 3 100 100 100 100 100 4 18 | 4 100 100 100 100 100 `

When I operate, I get the all `np.nan`

as cited in the question

`df0 + s2 a b c d e 0 1 2 3 4 range 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN `

This does not produce what we wanted. Because Pandas is aligning the `index`

of `s2`

with the `columns`

of `df0`

. The `columns`

of the result includes a union of the `index`

of `s2`

and the `columns`

of `df0`

.

We could fake it out with tricky transposition

`(df0.T + s2).T lower a b c d e range 0 150 150 150 150 150 1 142 142 142 142 142 2 134 134 134 134 134 3 126 126 126 126 126 4 118 118 118 118 118 `

But it turns out Pandas has a better solution. There are operation methods that allow us to pass an `axis`

argument to specify the axis to align with.

`-`

`sub`

`+`

`add`

`*`

`mul`

`/`

`div`

`**`

`pow`

And so the answer is simply

`df0.add(s2, axis='index') lower a b c d e range 0 150 150 150 150 150 1 142 142 142 142 142 2 134 134 134 134 134 3 126 126 126 126 126 4 118 118 118 118 118 `

Turns out `axis='index'`

is synomynous with `axis=0`

.

As is `axis='columns'`

synomynous with `axis=1`

`df0.add(s2, axis=0) lower a b c d e range 0 150 150 150 150 150 1 142 142 142 142 142 2 134 134 134 134 134 3 126 126 126 126 126 4 118 118 118 118 118 `

### Rest of the operations

`df0.sub(s2, axis=0) lower a b c d e range 0 50 50 50 50 50 1 58 58 58 58 58 2 66 66 66 66 66 3 74 74 74 74 74 4 82 82 82 82 82 `

`df0.mul(s2, axis=0) lower a b c d e range 0 5000 5000 5000 5000 5000 1 4200 4200 4200 4200 4200 2 3400 3400 3400 3400 3400 3 2600 2600 2600 2600 2600 4 1800 1800 1800 1800 1800 `

`df0.div(s2, axis=0) lower a b c d e range 0 2.000000 2.000000 2.000000 2.000000 2.000000 1 2.380952 2.380952 2.380952 2.380952 2.380952 2 2.941176 2.941176 2.941176 2.941176 2.941176 3 3.846154 3.846154 3.846154 3.846154 3.846154 4 5.555556 5.555556 5.555556 5.555556 5.555556 `

`df0.pow(1 / s2, axis=0) lower a b c d e range 0 1.096478 1.096478 1.096478 1.096478 1.096478 1 1.115884 1.115884 1.115884 1.115884 1.115884 2 1.145048 1.145048 1.145048 1.145048 1.145048 3 1.193777 1.193777 1.193777 1.193777 1.193777 4 1.291550 1.291550 1.291550 1.291550 1.291550 `