select variable number of tokens from pandas column based on tuples in another column

Question

I have a data frame with two columns: sentence containing text and selector containing arrays of tuples of varying lengths. Consider the following data frame as an example: I now want to select the words from sentence at the position indicated by the second element in each tuple (ignoring the 10 in each tuple). E.g. for the first row, I

Accepted Answer

One solution:df["selected_tokens"] = [[sent[s] for _, s, _ in select] for sent, select in zip(df["sentence"].str.split(), df["selector"])]print(df["selected_tokens"])Output0                          [KEEP]1                     [SOME, THE]2               [KEEP, OF, WORDS]3    [SOME, THE, FROM, SENTENCE.]Name: selected_tokens, dtype: objectAn alternative solution, is to use numpy to take advantage of the advance indexing features:import numpy as npsentences = df["sentence"].str.split().apply(np.array)indices = [[s[1] for s in select] for select in df["selector"]]df["selected_tokens"] = [sentence[i] for sentence, i in zip(sentences, indices)]

Advertisement

Answer