Pythonic way to fill rows with date range

  • A+
Category:Languages

I am working on a problem statement that requires me to fill the rows of missing dates (i.e dates in between two dates in columns of a pandas dataframe). Please see the example below. I am using Pandas for my current approach (mentioned below).

Input Data Example (which has around 25000 rows):

A  | B  | C  | Date1    | Date2 a1 | b1 | c1 | 1Jan1990 | 15Aug1990 <- this row should be repeated for all dates between the two dates ....................... a3 | b3 | c3 | 11May1986 | 11May1986 <- this row should NOT be repeated. Just 1 entry since both dates are same. ....................... a5 | b5 | c5 | 1Dec1984 | 31Dec2017 <- this row should be repeated for all dates between the two dates .......................... .......................... 

Output Expected:

A  | B  | C  | Month    | Year a1 | b1 | c1 | 1        | 1990  <- Since date 1 column for this row was Jan 1990 a1 | b1 | c1 | 2        | 1990     ....................... ....................... a1 | b1 | c1 | 7        | 1990   a1 | b1 | c1 | 8        | 1990  <- Since date 2 column for this row was Aug 1990 .......................... a3 | b3 | c3 | 5        | 1986  <- only 1 row since two dates in input dataframe were same for this row. ........................... a5 | b5 | c5 | 12       | 1984 <- since date 1 column for this row was Dec 1984 a5 | b5 | c5 | 1        | 1985  .......................... .......................... a5 | b5 | c5 | 11       | 2017  a5 | b5 | c5 | 12       | 2017 <- Since date 2 column for this row was Dec 2017 

I know of more traditional way to achieve this (my current approach):

  • Iterate over each row.
  • Get the days difference between two date columns.
  • If the date is the same in both columns, just include one row for that month and year in output dataframe
  • If dates are different (diff > 0), then get all (month, year) combination for each date difference row and append to new dataframe

Since the input data has around 25000 rows, I believe the output data will be extremely very large, so I am looking for more Pythonic way to achieve this (if possible and faster than iterative approach)!

 


It looks to me like the best tool to use here is PeriodIndex (to generate the months and years between dates).

However, PeriodIndex can only operate on one row at a time. So, if we are going to use PeriodIndex, every row has to be processed individually. That unfortunately means looping through the rows of the DataFrame:

import pandas as pd df = pd.DataFrame([('a1','b1','c1','1Jan1990','15Aug1990'),                    ('a3','b3','c3','11May1986','11May1986'),                    ('a5','b5','c5','1Dec1984','31Dec2017')],                   columns=['A','B','C','Date1','Date2'])  result = []  for tup in df.itertuples():     index = pd.PeriodIndex(start=tup.Date1, end=tup.Date2, freq='M')     new_df = pd.DataFrame([(tup.A, tup.B, tup.C)], index=index)     new_df['Month'] = new_df.index.month     new_df['Year'] = new_df.index.year     result.append(new_df) result = pd.concat(result, axis=0) print(result) 

yields

          0   1   2  Month  Year 1990-01  a1  b1  c1      1  1990    <--- Beginning of row 1 1990-02  a1  b1  c1      2  1990 1990-03  a1  b1  c1      3  1990 1990-04  a1  b1  c1      4  1990 1990-05  a1  b1  c1      5  1990 1990-06  a1  b1  c1      6  1990 1990-07  a1  b1  c1      7  1990 1990-08  a1  b1  c1      8  1990    <--- End of row 1 1986-05  a3  b3  c3      5  1986    <--- Beginning and End of row 2 1984-12  a5  b5  c5     12  1984    <--- Beginning row 3 1985-01  a5  b5  c5      1  1985 1985-02  a5  b5  c5      2  1985 1985-03  a5  b5  c5      3  1985 1985-04  a5  b5  c5      4  1985 ...      ..  ..  ..    ...   ... 2017-09  a5  b5  c5      9  2017 2017-10  a5  b5  c5     10  2017 2017-11  a5  b5  c5     11  2017 2017-12  a5  b5  c5     12  2017    <--- End of row 3  [406 rows x 5 columns] 

Note that you may not really need to define Month and Year columns

new_df['Month'] = new_df.index.month new_df['Year'] = new_df.index.year 

since you already have a PeriodIndex which makes computing months and years very easy.

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: