# How do I operate on a DataFrame with a Series for every column

• A+
Category：Languages

## Objective and Motivation

I've seen this kind of question several times over and have seen many other questions that involve some element of this. Most recently, I had to spend a bit of time explaining this concept in comments while looking for an appropriate canonical Q&A. I did not find one and so I thought I'd write one.

This question usually arises with respect to a specific operation but equally applies to most arithmetic operations.

• How do I subtract a `Series` from every column in a `DataFrame`?
• How do I add a `Series` from every column in a `DataFrame`?
• How do I multiply a `Series` from every column in a `DataFrame`?
• How do I divide a `Series` from every column in a `DataFrame`?

## The Question

Given a `Series` `s` and `DataFrame` `df`. How do I operate on each column of `df` with `s`?

``df = pd.DataFrame(     [[1, 2, 3], [4, 5, 6]],     index=[0, 1],     columns=['a', 'b', 'c'] )  s = pd.Series([3, 14], index=[0, 1]) ``

When I attempt to add them, I get all `np.nan`

``df + s      a   b   c   0   1 0 NaN NaN NaN NaN NaN 1 NaN NaN NaN NaN NaN ``

What I thought I should get is

``    a   b   c 0   4   5   6 1  18  19  20 ``

Please bear the preamble. It's important to address some higher level concepts first. Since my motivation is to share knowledge and teach, I wanted to make this as clear as possible.

It is helpful to create a mental model of what `Series` and `DataFrame` objects are.

# Anatomy of a `Series`

A `Series` should be thought of as an enhanced dictionary. This isn't always a perfect analogy, but we'll start here. Also, there are other analogies that you can make but I am targetting a dictionary in order to demonstrate the purpose of this post.

## `index`

These are the keys that we can reference to get at the corresponding values. When the elements of the index are unique, the comparison to a dictionary becomes very close.

## `values`

These are the corresponding values that are keyed by the index.

# Anatomy of a `DataFrame`

A `DataFrame` should be thought of as a dictionary of `Series` or a `Series` of `Series`. In this case the keys are the column names and the values are the columns themselves as `Series` objects. Each `Series` agrees to share the same `index` which is the index of the `DataFrame`.

## `columns`

These are the keys that we can reference to get at the corresponding `Series`.

## `index`

This the the index that all of the `Series` values agree to share.

## Note: RE: `columns` and `index` objects

They are the same kind of things. A `DataFrame`s `index` can be used as another `DataFrame`s `columns`. In fact, this happens when you do `df.T` to get a transpose.

## `values`

This is a 2 dimensional array that contains the data in a `DataFrame`. The reality is that `values` is NOT what is stored inside the `DataFrame` object. (Well sometimes it is, but I'm not about to try to describe the block mananager). The point is, it is better to think of this as access to a 2 dimensional array of the data.

# Define Sample Data

These are sample `pandas.Index` objects that can be used as the `index` of a `Series` or `DataFrame` or can be used as the `columns` of a `DataFrame`

``idx_lower = pd.Index([*'abcde'], name='lower') idx_range = pd.RangeIndex(5, name='range') ``

These are sample `pandas.Series` objects that use the `pandas.Index` objects above

``s0 = pd.Series(range(10, 15), idx_lower) s1 = pd.Series(range(30, 40, 2), idx_lower) s2 = pd.Series(range(50, 10, -8), idx_range) ``

These are sample `pandas.DataFrame` objects that use the `pandas.Index` objects above

``df0 = pd.DataFrame(100, index=idx_range, columns=idx_lower) df1 = pd.DataFrame(     np.arange(np.product(df0.shape)).reshape(df0.shape),     index=idx_range, columns=idx_lower ) ``

## `Series` on `Series`

When operating on two `Series`, the alignment is obvious. You align the `index` of one `Series` with the `index` of the other.

``s1 + s0  lower a    40 b    43 c    46 d    49 e    52 dtype: int64 ``

Which is the same as when I randomly shuffle one before I operate. The indices will sitll align.

``s1 + s0.sample(frac=1)  lower a    40 b    43 c    46 d    49 e    52 dtype: int64 ``

And is NOT the case when instead I operate with the values of the shuffled `Series`. In this case, Pandas doesn't have the `index` to align with and therefore operates from a positions.

``s1 + s0.sample(frac=1).values  lower a    42 b    42 c    47 d    50 e    49 dtype: int64 ``

``s1 + 1  lower a    31 b    33 c    35 d    37 e    39 dtype: int64 ``

## `DataFrame` on `DataFrame`

Similar is true when operating between two `DataFrame`s
The alignment is obvious and does what we think it should do

``df0 + df1  lower    a    b    c    d    e range                          0      100  101  102  103  104 1      105  106  107  108  109 2      110  111  112  113  114 3      115  116  117  118  119 4      120  121  122  123  124 ``

Shuffle second `DataFrame` on both axes. The `index` and `columns` will still align and give us the same thing.

``df0 + df1.sample(frac=1).sample(frac=1, axis=1)  lower    a    b    c    d    e range                          0      100  101  102  103  104 1      105  106  107  108  109 2      110  111  112  113  114 3      115  116  117  118  119 4      120  121  122  123  124 ``

Same shuffling but add the array and not the `DataFrame`. No longer aligned and will get different results.

``df0 + df1.sample(frac=1).sample(frac=1, axis=1).values  lower    a    b    c    d    e range                          0      123  124  121  122  120 1      118  119  116  117  115 2      108  109  106  107  105 3      103  104  101  102  100 4      113  114  111  112  110 ``

Add 1 dimensional array. Will align with columns and broadcast across rows.

``df0 + [*range(2, df0.shape[1] + 2)]  lower    a    b    c    d    e range                          0      102  103  104  105  106 1      102  103  104  105  106 2      102  103  104  105  106 3      102  103  104  105  106 4      102  103  104  105  106 ``

``df0 + 1  lower    a    b    c    d    e range                          0      101  101  101  101  101 1      101  101  101  101  101 2      101  101  101  101  101 3      101  101  101  101  101 4      101  101  101  101  101 ``

## `DataFrame` on `Series`

If `DataFrame`s are to be though of as dictionaries of `Series` and `Series` are to be thought of as dictionaries of values, then it is natural that when operating between a `DataFrame` and `Series` that they should be aligned by their "keys".

``s0: lower    a    b    c    d    e         10   11   12   13   14  df0: lower    a    b    c    d    e range                          0      100  100  100  100  100 1      100  100  100  100  100 2      100  100  100  100  100 3      100  100  100  100  100 4      100  100  100  100  100 ``

And when we operate, the `10` in `s0['a']` gets added to the entire column of `df0['a']`

``df0 + s0  lower    a    b    c    d    e range                          0      110  111  112  113  114 1      110  111  112  113  114 2      110  111  112  113  114 3      110  111  112  113  114 4      110  111  112  113  114 ``

### Heart of the issue and point of the post

What about if I want `s2` and `df0`?

``s2:               df0:               |    lower    a    b    c    d    e range        |    range                          0      50    |    0      100  100  100  100  100 1      42    |    1      100  100  100  100  100 2      34    |    2      100  100  100  100  100 3      26    |    3      100  100  100  100  100 4      18    |    4      100  100  100  100  100 ``

When I operate, I get the all `np.nan` as cited in the question

``df0 + s2          a   b   c   d   e   0   1   2   3   4 range                                         0     NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1     NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2     NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 3     NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 4     NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ``

This does not produce what we wanted. Because Pandas is aligning the `index` of `s2` with the `columns` of `df0`. The `columns` of the result includes a union of the `index` of `s2` and the `columns` of `df0`.

We could fake it out with tricky transposition

``(df0.T + s2).T  lower    a    b    c    d    e range                          0      150  150  150  150  150 1      142  142  142  142  142 2      134  134  134  134  134 3      126  126  126  126  126 4      118  118  118  118  118 ``

But it turns out Pandas has a better solution. There are operation methods that allow us to pass an `axis` argument to specify the axis to align with.

`-` `sub`
`+` `add`
`*` `mul`
`/` `div`
`**` `pow`

And so the answer is simply

``df0.add(s2, axis='index')  lower    a    b    c    d    e range                          0      150  150  150  150  150 1      142  142  142  142  142 2      134  134  134  134  134 3      126  126  126  126  126 4      118  118  118  118  118 ``

Turns out `axis='index'` is synomynous with `axis=0`.
As is `axis='columns'` synomynous with `axis=1`

``df0.add(s2, axis=0)  lower    a    b    c    d    e range                          0      150  150  150  150  150 1      142  142  142  142  142 2      134  134  134  134  134 3      126  126  126  126  126 4      118  118  118  118  118 ``

### Rest of the operations

``df0.sub(s2, axis=0)  lower   a   b   c   d   e range                     0      50  50  50  50  50 1      58  58  58  58  58 2      66  66  66  66  66 3      74  74  74  74  74 4      82  82  82  82  82 ``

``df0.mul(s2, axis=0)  lower     a     b     c     d     e range                               0      5000  5000  5000  5000  5000 1      4200  4200  4200  4200  4200 2      3400  3400  3400  3400  3400 3      2600  2600  2600  2600  2600 4      1800  1800  1800  1800  1800 ``

``df0.div(s2, axis=0)  lower         a         b         c         d         e range                                                   0      2.000000  2.000000  2.000000  2.000000  2.000000 1      2.380952  2.380952  2.380952  2.380952  2.380952 2      2.941176  2.941176  2.941176  2.941176  2.941176 3      3.846154  3.846154  3.846154  3.846154  3.846154 4      5.555556  5.555556  5.555556  5.555556  5.555556 ``

``df0.pow(1 / s2, axis=0)  lower         a         b         c         d         e range                                                   0      1.096478  1.096478  1.096478  1.096478  1.096478 1      1.115884  1.115884  1.115884  1.115884  1.115884 2      1.145048  1.145048  1.145048  1.145048  1.145048 3      1.193777  1.193777  1.193777  1.193777  1.193777 4      1.291550  1.291550  1.291550  1.291550  1.291550 ``