Skip to content
Advertisement

How to filter based on id which contains specific name in another column in an excel file using Python

I have a excel file containing three columns as shown below,

ID Name Date
1136 GG Daubringen 2019-04-08
1136 Manheinm 2020-06-16
1136 Manheinm Streitkof 2020-07-24
1136 Staufenberg 2020-08-15
1136 Stürcken 2021-03-05
1136 Stürcken (FTTH) 2021-09-13
1234 Lerbeck 2019-04-18
1234 BAd oyehausen 2020-06-26
1234 Werre Park 2020-07-14
1234 Werre Park (FTTH) 2020-08-25
1234 Werre Park (FTTH) 2021-03-15
1234 Bad oyehausen 2021-09-23

Is it possible to filter out the ID for which the name starts without (FTTH) and end the name without (FTHH), For example the first 1st ID 1136 has the name doesn’t start with (FTTH) but ends with (FTTH), but I want to filter out the ID which doesn’t start with (FTTH) and also doesn’t end with (FTTH) as in the ID 1234 , is it possible to filter it using python ??

Expecting the result to be like,

ID
1234

Advertisement

Answer

You can aggregate GroupBy.first GroupBy.last first and then filter indices for Series.str.startswith and Series.str.endswith:

df1 = df.groupby('ID')['Name'].agg(['first','last'])

i = df1.index[~df1['first'].str.startswith('(FTTH)') & ~df1['last'].str.endswith('(FTTH)')]
print (i)
#Int64Index([1234], dtype='int64', name='ID')

If need values in list:

L = i.tolist()
[1234]

If need DataFrame:

out = i.to_frame(index=False)
print (out)
     ID
0  1234

If need first without (FTTH) and last with (FTTH) use:

i = df1.index[~df1['first'].str.startswith('(FTTH)') & df1['last'].str.endswith('(FTTH)')]
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement