Considering a dataframe like so:
JavaScript
x
6
1
data = {
2
'lists': [[0, 1, 2],[3, 4, 5],[6, 7, 8]],
3
'indexes': [0, 1, 2]
4
}
5
df = pd.DataFrame(data=data)
6
JavaScript
1
5
1
lists indexes
2
0 [0, 1, 2] 0
3
1 [3, 4, 5] 1
4
2 [6, 7, 8] 2
5
I want to create a new column ‘extracted_value’ which would be the value contained in the list at ‘indexes’ index (list = [0, 1, 2], indexes = 0 -> 0, indexes = 1 -> 1, and so on)
JavaScript
1
5
1
lists indexes extracted_values
2
0 [0, 1, 2] 0 0
3
1 [3, 4, 5] 1 4
4
2 [6, 7, 8] 2 8
5
Doing it with iterrows() is extremely slow as I work with dataframes containing multiple millions of lines.
I have tried the following:
JavaScript
1
2
1
df['extracted_value'] = df['lists'][df['indexes']]
2
But it results in:
JavaScript
1
5
1
lists indexes extracted_value
2
0 [0, 1, 2] 0 [0, 1, 2]
3
1 [3, 4, 5] 1 [3, 4, 5]
4
2 [6, 7, 8] 2 [6, 7, 8]
5
The following will just results in extracted_value containing the whole list:
JavaScript
1
2
1
df['extracted_value'] = df['lists'][0]
2
Thank you for your help.
Advertisement
Answer
What you tried was almost ok, you only needed to put it into pd.DataFrame.apply
while setting axis
argument as 1 to make sure the function is applied on each row:
JavaScript
1
8
1
df['extracted_values'] = df.apply(lambda x: x['lists'][x['indexes']], axis=1)
2
df
3
4
lists indexes extracted_values
5
0 [0, 1, 2] 0 0
6
1 [3, 4, 5] 1 4
7
2 [6, 7, 8] 2 8
8