I have dataframe as per below Country: China, China, China, United Kingdom, United Kingdom,United Kingdom Country code: CN, CN, CN, UK, UK, UK Port Name: Yantian, Shekou, Quanzhou, Plymouth, Cardiff, Bird port
I want to remove the duplicates in the first two columns, only keep as: Country: China, , , United Kingdom, , Country code: CN, , , UK, , Port Name: Yantian, Shekou, Quanzhou, Plymouth, Cardiff, Bird port
I have tried df.drop_duplicates, but it will drop the whole rows.
Advertisement
Answer
You could use the pd.Series.duplicated
method:
JavaScript
x
18
18
1
import pandas as pd
2
3
df = pd.DataFrame(
4
[
5
['China', 'CN', 'Yantian'],
6
['China', 'CN', 'Shekou'],
7
['China', 'CN', 'Quanzhou'],
8
['United Kingdom', 'UK', 'Plymouth'],
9
['United Kingdom', 'UK', 'Cardiff'],
10
['United Kingdom', 'UK', 'Bird port']
11
],
12
columns=['Country', 'Country code', 'Port Name']
13
)
14
15
for col in ['Country', 'Country code']:
16
df[col][df[col].duplicated()] = np.NaN
17
print(df)
18
prints
index | Country | Country code | Port Name |
---|---|---|---|
0 | China | CN | Yantian |
1 | NaN | NaN | Shekou |
2 | NaN | NaN | Quanzhou |
3 | United Kingdom | UK | Plymouth |
4 | NaN | NaN | Cardiff |
5 | NaN | NaN | Bird port |