How would I add 1 year to a column?
I’ve tried using map
and apply but I failed miserably.
I also wonder why pl.date()
accepts integers while it advertises that it only accepts str
or pli.Expr
.
A small hack workaround is:
col = pl.col('date').dt df = df.with_column(pl.when(pl.col(column).is_not_null()) .then(pl.date(col.year() + 1, col.month(), col.day())) .otherwise(pl.date(col.year() + 1,col.month(), col.day())) .alias("date"))
but this won’t work for months or days. I can’t just add a number or I’ll get a:
> thread 'thread '<unnamed>' panicked at 'invalid or out-of-range date<unnamed>', ' panicked at '/github/home/.cargo/registry/src/github.com-1ecc6299db9ec823/chrono-0.4.19/src/naive/date.rsinvalid or out-of-range date:', 173:/github/home/.cargo/registry/src/github.com-1ecc6299db9ec823/chrono-0.4.19/src/naive/date.rs51 :note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Most likely because day and month cycle while year goes to infinity.
I could also do this:
df = df.with_column( pl.when(col.month() == 1) .then(pl.date(col.year(), 2, col.day())) .when(col.month() == 2) .then(pl.date(col.year(), 3, col.day())) .when(col.month() == 3) .then(pl.date(col.year(), 4, col.day())) .when(col.month() == 4) .then(pl.date(col.year(), 5, col.day())) .when(col.month() == 5) .then(pl.date(col.year(), 6, col.day())) .when(col.month() == 6) .then(pl.date(col.year(), 7, col.day())) .when(col.month() == 7) .then(pl.date(col.year(), 8, col.day())) .when(col.month() == 8) .then(pl.date(col.year(), 9, col.day())) .when(col.month() == 9) .then(pl.date(col.year(), 10, col.day())) .when(col.month() == 10) .then(pl.date(col.year(), 11, col.day())) .when(col.month() == 11) .then(pl.date(col.year(), 12, col.day())) .otherwise(pl.date(col.year() + 1, 1, 1)) .alias("valid_from") )
Advertisement
Answer
Polars allows to do addition and subtraction with python’s timedelta
objects. However above week units things get a bit more complicated as we have to take different days of the month and leap years into account.
For this polars has offset_by
under the dt
namespace.
(pl.DataFrame({ "dates": pl.date_range(datetime(2000, 1, 1), datetime(2026, 1, 1), "1y") }).with_columns([ pl.col("dates").dt.offset_by("1y").alias("dates_and_1_yr") ]))
shape: (27, 2) ┌─────────────────────┬─────────────────────┐ │ dates ┆ dates_and_1_yr │ │ --- ┆ --- │ │ datetime[ns] ┆ datetime[ns] │ ╞═════════════════════╪═════════════════════╡ │ 2000-01-01 00:00:00 ┆ 2001-01-01 00:00:00 │ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ │ 2001-01-01 00:00:00 ┆ 2002-01-01 00:00:00 │ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ │ 2002-01-01 00:00:00 ┆ 2003-01-01 00:00:00 │ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ │ 2003-01-01 00:00:00 ┆ 2004-01-01 00:00:00 │ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ │ ... ┆ ... │ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ │ 2023-01-01 00:00:00 ┆ 2024-01-01 00:00:00 │ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ │ 2024-01-01 00:00:00 ┆ 2025-01-01 00:00:00 │ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ │ 2025-01-01 00:00:00 ┆ 2026-01-01 00:00:00 │ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ │ 2026-01-01 00:00:00 ┆ 2027-01-01 00:00:00 │ └─────────────────────┴─────────────────────┘