I almost finished my time series model, collected enough data and now I am stuck at hyperparameter optimization.
And after lots of googling I found new & good library called ultraopt, but problem is that how much amount of fragment of data should I use from my total data (~150 GB) for hyperparameter tuning. And I want to try lots of algorithm and combinations, is there any faster and easy way?
Or
Is there any math involved, something like, mydata = 100%size
hyperparameter optimization with 5% of mydatasize,
optimized hyperparameter *or+ or something with left 95% of datasize #something like this
To get a similar result as full data used for optimization at a time. Is there any shortcut for these?
I am using Python 3.7, CPU: AMD ryzen5 3400g, GPU: AMD Vega 11, RAM: 16 GB
Advertisement
Answer
Hyperparameter tuning is typically done on the validation set of a train-val-test split, where each split will have something along the lines of 70%, 10%, and 20% of the entire dataset respectively. As a baseline, random search can be used while Bayesian optimization with Gaussian processes has been shown to be more compute efficient. scikit-optimize is a good package for this.