I have a df with growth variables and often some initial values are 0, in which case it produces infinite values when the value moves from zero to non-zeros.
i.e.
.. some variables... var1 var2 var1_growth var2_growth 0 0 NaN NaN 0 1 NaN inf 1 2 inf 1 1.5 2.2 0.5 0.1 ...
when i run PanelOLS, i get an error message
ValueError: array must not contain infs or NaNs
Is there a way to ignore these entries to continue with the regression without having to drop them and create a different dataset?
If not, what would be the best way to proceed? should I drop app rows with ‘inf’ values in both columns? is there an easy way to do this? thanks.
Advertisement
Answer
No, you can’t ignore these entries. This issue need to be handle before training the model, if not, you can not train it.
Depending on your data and application a different method is preferred to handle these NaN
and inf
. One example of code that is posted in this SO question:
df.replace([np.inf, -np.inf], np.nan).dropna(axis=1) # You can replace inf and -inf with NaN, and then select non-null rows.
In this case, we are removing all rows that have inf
or NaN
values.