Skip to content
Advertisement

Is it possible access a list stored in a dataframe in a vectorized manner?

Considering a dataframe like so:

data = {
    'lists': [[0, 1, 2],[3, 4, 5],[6, 7, 8]],
    'indexes': [0, 1, 2]
}
df = pd.DataFrame(data=data)
       lists  indexes
0  [0, 1, 2]        0
1  [3, 4, 5]        1
2  [6, 7, 8]        2

I want to create a new column ‘extracted_value’ which would be the value contained in the list at ‘indexes’ index (list = [0, 1, 2], indexes = 0 -> 0, indexes = 1 -> 1, and so on)

       lists  indexes    extracted_values
0  [0, 1, 2]        0                   0
1  [3, 4, 5]        1                   4
2  [6, 7, 8]        2                   8

Doing it with iterrows() is extremely slow as I work with dataframes containing multiple millions of lines.

I have tried the following:

df['extracted_value'] = df['lists'][df['indexes']]

But it results in:

       lists  indexes extracted_value
0  [0, 1, 2]        0       [0, 1, 2]
1  [3, 4, 5]        1       [3, 4, 5]
2  [6, 7, 8]        2       [6, 7, 8]

The following will just results in extracted_value containing the whole list:

df['extracted_value'] = df['lists'][0]

Thank you for your help.

Advertisement

Answer

What you tried was almost ok, you only needed to put it into pd.DataFrame.apply while setting axis argument as 1 to make sure the function is applied on each row:

df['extracted_values'] = df.apply(lambda x: x['lists'][x['indexes']], axis=1)
df

       lists  indexes  extracted_values
0  [0, 1, 2]        0                 0
1  [3, 4, 5]        1                 4
2  [6, 7, 8]        2                 8
Advertisement