Skip to content
Advertisement

Alternating column values

I am working on a project where my dataset looks like bellow:

Origin Destination Num_Trips
Hamburg Frankfurt 2
Hamburg Cologne 1
Cologne Hamburg 3
Frankfurt Hamburg 5

I am interested only on one way either “Hamburg – Frankfurt” or “Frankfurt – Hamburg” and add them as number of trips made between this two locations. How can i do this in pandas so that i have one of them in my dataset with the total number of trips made between the two points either side?

Final Table:

Origin Destination Num_Trips
Hamburg Frankfurt 7
Hamburg Cologne 4

Thanks :)

Advertisement

Answer

Here’s a simple solution to your problem –

data = {
    "Origin": ["Hamburg", "Hamburg", "Cologne", "Frankfurt"],
    "Destination": ["Frankfurt", "Cologne", "Hamburg", "Hamburg"],
    "Num_Trips": [2, 1, 3, 5]
}

df = pd.DataFrame(data)

df["Key"] = df[["Origin", "Destination"]].apply(lambda x: "|".join(set(x)), axis=1)
# Origin    Destination Num_Trips   Key
# Hamburg   Frankfurt   2           Frankfurt|Hamburg
# Hamburg   Cologne     1           Cologne|Hamburg
# Cologne   Hamburg     3           Cologne|Hamburg
# Frankfurt Hamburg     5           Frankfurt|Hamburg

df.groupby("Key").agg({"Origin": "first", 
                       "Destination": "first", 
                       "Num_Trips": sum}).reset_index(drop=True)

#   Origin  Destination Num_Trips
# 0 Hamburg Cologne     4
# 1 Hamburg Frankfurt   7
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement