I am working on a project where my dataset looks like bellow:
Origin | Destination | Num_Trips |
---|---|---|
Hamburg | Frankfurt | 2 |
Hamburg | Cologne | 1 |
Cologne | Hamburg | 3 |
Frankfurt | Hamburg | 5 |
I am interested only on one way either “Hamburg – Frankfurt” or “Frankfurt – Hamburg” and add them as number of trips made between this two locations. How can i do this in pandas so that i have one of them in my dataset with the total number of trips made between the two points either side?
Final Table:
Origin | Destination | Num_Trips |
---|---|---|
Hamburg | Frankfurt | 7 |
Hamburg | Cologne | 4 |
Thanks :)
Advertisement
Answer
Here’s a simple solution to your problem –
data = { "Origin": ["Hamburg", "Hamburg", "Cologne", "Frankfurt"], "Destination": ["Frankfurt", "Cologne", "Hamburg", "Hamburg"], "Num_Trips": [2, 1, 3, 5] } df = pd.DataFrame(data) df["Key"] = df[["Origin", "Destination"]].apply(lambda x: "|".join(set(x)), axis=1) # Origin Destination Num_Trips Key # Hamburg Frankfurt 2 Frankfurt|Hamburg # Hamburg Cologne 1 Cologne|Hamburg # Cologne Hamburg 3 Cologne|Hamburg # Frankfurt Hamburg 5 Frankfurt|Hamburg df.groupby("Key").agg({"Origin": "first", "Destination": "first", "Num_Trips": sum}).reset_index(drop=True) # Origin Destination Num_Trips # 0 Hamburg Cologne 4 # 1 Hamburg Frankfurt 7