Skip to content
Advertisement

how to do hyperparameter optimization in large data?

I almost finished my time series model, collected enough data and now I am stuck at hyperparameter optimization.

And after lots of googling I found new & good library called ultraopt, but problem is that how much amount of fragment of data should I use from my total data (~150 GB) for hyperparameter tuning. And I want to try lots of algorithm and combinations, is there any faster and easy way?

Or

Is there any math involved, something like, mydata = 100%size

hyperparameter optimization with 5% of mydatasize,

optimized hyperparameter *or+ or something with left 95% of datasize #something like this

To get a similar result as full data used for optimization at a time. Is there any shortcut for these?

I am using Python 3.7, CPU: AMD ryzen5 3400g, GPU: AMD Vega 11, RAM: 16 GB

Advertisement

Answer

Hyperparameter tuning is typically done on the validation set of a train-val-test split, where each split will have something along the lines of 70%, 10%, and 20% of the entire dataset respectively. As a baseline, random search can be used while Bayesian optimization with Gaussian processes has been shown to be more compute efficient. scikit-optimize is a good package for this.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement