I am working on a project where my dataset looks like bellow:
Origin | Destination | Num_Trips |
---|---|---|
Hamburg | Frankfurt | 2 |
Hamburg | Cologne | 1 |
Cologne | Hamburg | 3 |
Frankfurt | Hamburg | 5 |
I am interested only on one way either “Hamburg – Frankfurt” or “Frankfurt – Hamburg” and add them as number of trips made between this two locations. How can i do this in pandas so that i have one of them in my dataset with the total number of trips made between the two points either side?
Final Table:
Origin | Destination | Num_Trips |
---|---|---|
Hamburg | Frankfurt | 7 |
Hamburg | Cologne | 4 |
Thanks :)
Advertisement
Answer
Here’s a simple solution to your problem –
JavaScript
x
23
23
1
data = {
2
"Origin": ["Hamburg", "Hamburg", "Cologne", "Frankfurt"],
3
"Destination": ["Frankfurt", "Cologne", "Hamburg", "Hamburg"],
4
"Num_Trips": [2, 1, 3, 5]
5
}
6
7
df = pd.DataFrame(data)
8
9
df["Key"] = df[["Origin", "Destination"]].apply(lambda x: "|".join(set(x)), axis=1)
10
# Origin Destination Num_Trips Key
11
# Hamburg Frankfurt 2 Frankfurt|Hamburg
12
# Hamburg Cologne 1 Cologne|Hamburg
13
# Cologne Hamburg 3 Cologne|Hamburg
14
# Frankfurt Hamburg 5 Frankfurt|Hamburg
15
16
df.groupby("Key").agg({"Origin": "first",
17
"Destination": "first",
18
"Num_Trips": sum}).reset_index(drop=True)
19
20
# Origin Destination Num_Trips
21
# 0 Hamburg Cologne 4
22
# 1 Hamburg Frankfurt 7
23