Does Pandas have an equivalent of R’s na (meaning not available)? If not, what is the convention for representing a missing value, as opposed to NaN which represents a mathematically impossible value such as a divide by zero?
Advertisement
Answer
Currently there is no NA value available in Pandas or NumPy. From the section “Working with missing data” in the Pandas manual (http://pandas.pydata.org/pandas-docs/stable/missing_data.html):
The choice of using NaN internally to denote missing data was largely for simplicity and performance reasons. It differs from the MaskedArray approach of, for example,
scikits.timeseries
. We are hopeful that NumPy will soon be able to provide a native NA type solution (similar to R) performant enough to be used in pandas.
Also, this part of the documentation (http://pandas.pydata.org/pandas-docs/stable/gotchas.html#nan-integer-na-values-and-na-type-promotions) provides more details on the trade-offs in this choice of NA representation.