Skip to content
Advertisement

Looking for the quickest way to find the value of a column given an index for all rows

I’m looking for a more efficient way to grab acertain value at a certain index for every row in a Pandas DataFrame than the one I have currently. Here’s the code I have so far. The .apply() function with the lambda syntax is super slow so would love a more optimised version.

Let me know if you need more info/context, thanks!

def find_session_time_given_index(df, value):
    """
    Finds the minutes_in_session value where its corresponding index is the same as the value
    where the last true exists in the column last_true_index. Used in modify_events_table() in a .apply.
    :param df: pd.DataFrame
    :param value: row in .apply()
    :return: int
    """
    last_true = value["last_true_index"]
    return list(df[df["index"] == last_true]["minutes_in_session"])[0]

df["last_true_index"] = df["index"].mul(df["same_user_new_session"]).cummax()
df["minutes_of_last_true_index"] = df.apply(
    lambda x: find_session_time_given_index(df, x), axis=1
)

EDIT: Here’s an example output:

index timestamp user_id last_true_index minutes_in_session minutes_of_last_true_index
0 Time1 1 0 100 100
1 Time2 1 0 101 100
11 Time3 2 11 54 54
12 Time4 2 11 55 54

The minutes_of_last_true_index is the column to be populated.

Advertisement

Answer

try merging the original df with itself using last_true_index and index as keys:

df_new = pd.merge(df, df['minutes_in_session'].reset_index(), left_on='last_true_index', right_on='index', suffixes=('', '_'))
df_new = df_new.drop(columns=('index',))
    .rename(columns={'minutes_in_session_':'minutes_of_last_true_index'}
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement