Skip to content
Advertisement

Python Pandas – Lookup a variable column depending on another column’s value

I’m trying to use the value of one cell to find the value of a cell in another column. The first cell value (‘source’) dictates which column to lookup.

import pandas as pd

df = pd.DataFrame({'A': ['John', 'Andrew', 'Bob', 'Fred'], 'B': [
                  'Fred', 'Simon', 'Andrew', 'Andrew'], 'source': ['A', 'B', 'A', 'B']}, )

print(df)

        A       B source
0    John    Fred      A
1  Andrew   Simon      B
2     Bob  Andrew      A
3    Fred  Andrew      B

My required output value in the ‘output’ column is a lookup of the ‘source’:

        A       B source  output
0    John    Fred      A    John
1  Andrew   Simon      B   Simon
2     Bob  Andrew      A     Bob
3    Fred  Andrew      B  Andrew

Failed attempts

df['output'] = df[df['source']]

This results in a ValueError: Wrong number of items passed 4, placement implies 1 because the df['source'] passes in a Series, not a string. I tried converting to a string using:

df['output'] = df[df['source'].convertDTypes(convert_string=True)]

which gave error AttributeError: ‘Series’ object has no attribute ‘convertDTypes’.

Working solution

I found a solution might by iterating through the rows using:

for index, row in df.iterrows():
    column = df.loc[index, 'source']
    df.at[index, 'output'] = df.loc[index, column]

However, this post suggests iterating is a bad idea. The code doesn’t seem very elegant, either.

I feel I’ve missed something basic here; this really should not be that hard.

Advertisement

Answer

Let us do numpy way since lookup will not longer work in the future version

df['new'] = df.values[df.index,df.columns.get_indexer(df.source)]
df
Out[339]: 
        A       B source     new
0    John    Fred      A    John
1  Andrew   Simon      B   Simon
2     Bob  Andrew      A     Bob
3    Fred  Andrew      B  Andrew
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement