say that i have a df in the following format:
year 2016 2017 2018 2019 2020 min max avg month 2021-01-01 284 288 311 383 476 284 476 357.4 2021-02-01 301 315 330 388 441 301 441 359.6 2021-03-01 303 331 341 400 475 303 475 375.4 2021-04-01 283 300 339 419 492 283 492 372.6 2021-05-01 287 288 346 420 445 287 445 359.7 2021-06-01 283 292 340 424 446 283 446 359.1 2021-07-01 294 296 360 444 452 294 452 370.3 2021-08-01 294 315 381 445 451 294 451 375.9 2021-09-01 288 331 405 464 459 288 464 385.6 2021-10-01 327 349 424 457 453 327 457 399.1 2021-11-01 316 351 413 469 471 316 471 401.0 2021-12-01 259 329 384 467 465 259 467 375.7
and i would like to get the difference of the 2020 column by using df['delta'] = df['2020'].diff()
this will obviously return NaN for the first value in the column. how can i make it so that it automatically interprets that diff as the difference between the FIRST value of 2020 and the LAST value of 2019?
Advertisement
Answer
If you want only for 2020:
df["delta"] = pd.concat([df["2019"], df["2020"]]).diff().tail(len(df))
Prints:
year 2016 2017 2018 2019 2020 min max avg delta 0 2021-01-01 284 288 311 383 476 284 476 357.4 9.0 1 2021-02-01 301 315 330 388 441 301 441 359.6 -35.0 2 2021-03-01 303 331 341 400 475 303 475 375.4 34.0 3 2021-04-01 283 300 339 419 492 283 492 372.6 17.0 4 2021-05-01 287 288 346 420 445 287 445 359.7 -47.0 5 2021-06-01 283 292 340 424 446 283 446 359.1 1.0 6 2021-07-01 294 296 360 444 452 294 452 370.3 6.0 7 2021-08-01 294 315 381 445 451 294 451 375.9 -1.0 8 2021-09-01 288 331 405 464 459 288 464 385.6 8.0 9 2021-10-01 327 349 424 457 453 327 457 399.1 -6.0 10 2021-11-01 316 351 413 469 471 316 471 401.0 18.0 11 2021-12-01 259 329 384 467 465 259 467 375.7 -6.0