Skip to content
Advertisement

How to split in train and test by month

I have a dataframe structured like this

 Time            Z            X          Y  

01-01-18         1           20         10
02-01-18        20            4         15
03-01-18        34           16         21
04-01-18        67           38          8
05-01-18        89           10         18
06-01-18        45           40          4
07-01-18        22           10         13
08-01-18         1           46         11
...
24-12-20        56           28          9
25-12-20         6           14         22
26-12-20         9            5         40
27-12-20        56           11         10
28-12-21        78           61         35
29-12-21        33           23         29
30-12-21         2           35         12
31-12-21         0           31          7

I have data for all days and months from 2018 to 2021, with around 50k observations

How can I aggregate all the data for the same month and perform a Train-Test splitting for each month? I.e. for all the data of the months of January, February, March and so on.

Advertisement

Answer

try this:

df['month'] = df.Time.apply(lambda x: x.split('-')[1]) #get month

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement