Skip to content
Advertisement

Lookup Values by Corresponding Column Header in Pandas 1.2.0 or newer

The operation pandas.DataFrame.lookup is “Deprecated since version 1.2.0”, and has since invalidated a lot of previous answers.

This post attempts to function as a canonical resource for looking up corresponding row col pairs in pandas versions 1.2.0 and newer.

Standard LookUp Values With Default Range Index

Given the following DataFrame:

JavaScript

I would like to be able to lookup the corresponding value in the column specified in Col:

I would like my result to look like:

JavaScript

Standard LookUp Values With a Non-Default Index

Non-Contiguous Range Index

Given the following DataFrame:

JavaScript

I would like to preserve the index but still find the correct corresponding Value:

JavaScript

MultiIndex

JavaScript

I would like to preserve the index but still find the correct corresponding Value:

JavaScript

LookUp with Default For Unmatched/Not-Found Values

Given the following DataFrame

JavaScript

I would like to look up the corresponding values if one exists otherwise I’d like to have it default to 0

JavaScript

LookUp with Missing Values in the lookup Col

Given the following DataFrame:

JavaScript

I would like any NaN values in Col to result in a NaN value in Val

JavaScript

Advertisement

Answer

Standard LookUp Values With Any Index

The documentation on Looking up values by index/column labels recommends using NumPy indexing via factorize and reindex as the replacement for the deprecated DataFrame.lookup.

JavaScript

df

JavaScript

factorize is used to convert the column encode the values as an “enumerated type”.

JavaScript

Notice that B corresponds to 0 and A corresponds to 1. reindex is used to ensure that columns appear in the same order as the enumeration:

JavaScript

We need to create an appropriate range indexer compatible with NumPy indexing.

The standard approach is to use np.arange based on the length of the DataFrame:

JavaScript

Now NumPy indexing will work to select values from the DataFrame:

JavaScript

*Note: This approach will always work regardless of type of index.

MultiIndex

JavaScript
JavaScript

Why use np.arange and not df.index directly?

Standard Contiguous Range Index

JavaScript

In this case only, there is no error as the result from np.arange is the same as the df.index. df

JavaScript

Non-Contiguous Range Index Error

Raises IndexError:

JavaScript
JavaScript

MultiIndex Error

JavaScript

Raises IndexError:

JavaScript

LookUp with Default For Unmatched/Not-Found Values

There are a few approaches.

First let’s look at what happens by default if there is a non-corresponding value:

JavaScript
JavaScript

If we look at why the NaN values are introduced, we will find that when factorize goes through the column it will enumerate all groups present regardless of whether they correspond to a column or not.

For this reason, when we reindex the DataFrame we will end up with the following result:

JavaScript
JavaScript

If we want to specify a default value, we can specify the fill_value argument of reindex which allows us to modify the behaviour as it relates to missing column values:

JavaScript
JavaScript

This means that we can do:

JavaScript

df:

JavaScript

*Notice the dtype of the column is int, since NaN was never introduced, and, therefore, the column type was not changed.


LookUp with Missing Values in the lookup Col

factorize has a default na_sentinel=-1, meaning that when NaN values appear in the column being factorized the resulting idx value is -1

JavaScript

This -1 means that, by default, we’ll be pulling from the last column when we reindex. Notice the col still only contains the values B and A. Meaning, that we will end up with the value from A in Val for the last row.

The easiest way to handle this is to fillna Col with some value that cannot be found in the column headers.

Here I use the empty string '':

JavaScript

Now when I reindex, the '' column will contain NaN values meaning that the lookup produces the desired result:

JavaScript

df:

JavaScript
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement