Skip to content
Advertisement

How to deal with “ValueError: array must not contain infs or NaNs” while running regressions in python

I have a df with growth variables and often some initial values are 0, in which case it produces infinite values when the value moves from zero to non-zeros.

i.e.

.. some variables... var1   var2   var1_growth  var2_growth
                      0      0        NaN          NaN
                      0      1        NaN          inf
                      1      2        inf           1
                     1.5    2.2       0.5          0.1
...

when i run PanelOLS, i get an error message

ValueError: array must not contain infs or NaNs

Is there a way to ignore these entries to continue with the regression without having to drop them and create a different dataset?

If not, what would be the best way to proceed? should I drop app rows with ‘inf’ values in both columns? is there an easy way to do this? thanks.

Advertisement

Answer

No, you can’t ignore these entries. This issue need to be handle before training the model, if not, you can not train it.

Depending on your data and application a different method is preferred to handle these NaN and inf. One example of code that is posted in this SO question:

df.replace([np.inf, -np.inf], np.nan).dropna(axis=1) # You can replace inf and -inf with NaN, and then select non-null rows.

In this case, we are removing all rows that have inf or NaN values.

Advertisement