Looking for the quickest way to find the value of a column given an index for all rows

I’m looking for a more efficient way to grab acertain value at a certain index for every row in a Pandas DataFrame than the one I have currently. Here’s the code I have so far. The .apply() function with the lambda syntax is super slow so would love a more optimised version.

Let me know if you need more info/context, thanks!

def find_session_time_given_index(df, value):
    """
    Finds the minutes_in_session value where its corresponding index is the same as the value
    where the last true exists in the column last_true_index. Used in modify_events_table() in a .apply.
    :param df: pd.DataFrame
    :param value: row in .apply()
    :return: int
    """
    last_true = value["last_true_index"]
    return list(df[df["index"] == last_true]["minutes_in_session"])[0]

df["last_true_index"] = df["index"].mul(df["same_user_new_session"]).cummax()
df["minutes_of_last_true_index"] = df.apply(
    lambda x: find_session_time_given_index(df, x), axis=1
)

EDIT: Here’s an example output:

index	timestamp	user_id	last_true_index	minutes_in_session	minutes_of_last_true_index
0	Time1	1	0	100	100
1	Time2	1	0	101	100
11	Time3	2	11	54	54
12	Time4	2	11	55	54

The minutes_of_last_true_index is the column to be populated.

Answer

try merging the original df with itself using last_true_index and index as keys:

df_new = pd.merge(df, df['minutes_in_session'].reset_index(), left_on='last_true_index', right_on='index', suffixes=('', '_'))
df_new = df_new.drop(columns=('index',))
    .rename(columns={'minutes_in_session_':'minutes_of_last_true_index'}

Advertisement

Answer