Skip to content
Advertisement

Make Pandas `df.get()` behave gracefully whether column or row key is incorrect

I have a Pandas dataframe and a function that pulls entries from the dataframe. If the requested entry is not present in the dataframe—whether because the requested column does not exist, because the requested row/index does not exist, or both—I would like to return the string 'entry not found' instead of an error message.

import pandas as pd

df = pd.DataFrame({'col1': [12, 13, 14, 15], 'col2': [16, 15, 14, 13]})

Ideally, I would like to write my query function as

def query(col, idx):
    return df.get(col, idx, 'entry not found')

Unfortunately, the df.get() method only accepts two arguments, so I came up with the following alternatives.

    def query1(col, idx):
        return df[col, idx]
    
    def query2(col, idx):
        return df[col].get(idx, 'entry not found')
    
    def query3(col, idx):
        return df.get(col, 'entry not found')[idx]
    
    def query4(col, idx):
        return df.get(col, 'entry not found').get(idx, 'entry not found')

Only query2 and query4 work if the user asks for a row that doesn’t exist:

# User asks for a row that doesn't exist.
query1('col1', 24) # KeyError
query2('col1', 24) # 'entry not found'
query3('col1', 24) # ValueError: 24 is not in range
query4('col1', 24) # 'entry not found'

Whereas only query3 (kind of) works if the user asks for a column that doesn’t exist:

# User asks for a column that doesn't exist.
query1('col5', 3) # KeyError
query2('col5', 3) # KeyError
query3('col5', 3) # Returns 'r' ( = 4th char of 'entry not found')
query4('col5', 3) # AttributeError: 'str' object has no attribute 'get'

How can I obtain the desired behavior? Is there a way to do this without a heavy try: ... except: ... block?

Advertisement

Answer

What about using get twice:

def lookup(col, idx):
    """
    `col` is the column indexer, `idx` is the row indexer.
    """
    return df.get(col, {}).get(idx, "entry not found")

First get will look for a col column:

  • if it exists, then it gives df.col
  • if it doesn’t, then it gives a dict {} (so the successor get can work)

Then second get looks for idx row:

  • if df.col is queried with this, essentially returns df.loc[idx, col] if it exists
  • otherwise entry is not found
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement