I’m looking for a more efficient way to grab acertain value at a certain index for every row in a Pandas DataFrame than the one I have currently. Here’s the code I have so far. The .apply() function with the lambda syntax is super slow so would love a more optimised version.
Let me know if you need more info/context, thanks!
def find_session_time_given_index(df, value): """ Finds the minutes_in_session value where its corresponding index is the same as the value where the last true exists in the column last_true_index. Used in modify_events_table() in a .apply. :param df: pd.DataFrame :param value: row in .apply() :return: int """ last_true = value["last_true_index"] return list(df[df["index"] == last_true]["minutes_in_session"])[0] df["last_true_index"] = df["index"].mul(df["same_user_new_session"]).cummax() df["minutes_of_last_true_index"] = df.apply( lambda x: find_session_time_given_index(df, x), axis=1 )
EDIT: Here’s an example output:
index | timestamp | user_id | last_true_index | minutes_in_session | minutes_of_last_true_index |
---|---|---|---|---|---|
0 | Time1 | 1 | 0 | 100 | 100 |
1 | Time2 | 1 | 0 | 101 | 100 |
11 | Time3 | 2 | 11 | 54 | 54 |
12 | Time4 | 2 | 11 | 55 | 54 |
The minutes_of_last_true_index
is the column to be populated.
Advertisement
Answer
try merging the original df with itself using last_true_index
and index
as keys:
df_new = pd.merge(df, df['minutes_in_session'].reset_index(), left_on='last_true_index', right_on='index', suffixes=('', '_')) df_new = df_new.drop(columns=('index',)) .rename(columns={'minutes_in_session_':'minutes_of_last_true_index'}