I’m a beginner on python and I would like to get the first and last value of the column date always that the mac_address be the same, for example:
I’ve ordered my dataframe by mac_address, date with the next line:
df = df.sort_values(by=['mac_address', 'date'], ascending=(True, True))
And the data are:
router mac_address date 589455 15001391 00:00:34:1a:03:e8 2021-01-01 22:09:34 590067 17091211 00:00:34:1a:03:e8 2021-01-01 22:10:54 590136 17091236 00:00:34:1a:03:e8 2021-01-01 22:11:04 ..... ..... ..... 635434 15001391 00:00:78:01:0d:11 2021-01-02 00:14:54 636479 17091211 00:00:78:01:0d:11 2021-01-02 00:16:17 949873 17091172 00:00:af:82:56:93 2021-01-02 11:26:39 950699 17091251 00:00:af:82:56:93 2021-01-02 11:27:59 950700 17091253 00:00:af:82:56:93 2021-01-02 11:28:59 950702 17091257 00:00:af:82:56:93 2021-01-02 11:29:59 950703 17091258 00:00:af:82:56:93 2021-01-02 11:30:59 619384 17091174 00:01:09:d2:09:e0 2021-01-01 23:34:32 365351 17091211 00:01:d2:7c:4e:32 2021-01-01 14:27:58 109858 17091236 00:02:75:86:4e:34 2021-01-01 05:50:47 110281 17091211 00:02:75:86:4e:34 2021-01-01 05:50:54
NOTE: the date column has the format “2021-01-01 05:50:54” and the differents mac address that appears are variable in the number of times
And I would like to have two outputs like this:
first output:
589455 15001391 00:00:34:1a:03:e8 2021-01-01 22:09:34 590136 17091236 00:00:34:1a:03:e8 2021-01-01 22:11:04 635434 15001391 00:00:78:01:0d:11 2021-01-02 00:14:54 636479 17091211 00:00:78:01:0d:11 2021-01-02 00:16:17 ..... ..... 949873 17091172 00:00:af:82:56:93 2021-01-02 11:26:39 950703 17091258 00:00:af:82:56:93 2021-01-02 11:30:59 619384 17091174 00:01:09:d2:09:e0 2021-01-01 23:34:32 365351 17091211 00:01:d2:7c:4e:32 2021-01-01 14:27:58
second output: Just considered the data that have a first and the last value, and not be considered the mac_adress that only once appears
589455 15001391 00:00:34:1a:03:e8 22:09:34 590136 17091236 00:00:34:1a:03:e8 22:11:04 635434 15001391 00:00:78:01:0d:11 00:14:54 636479 17091211 00:00:78:01:0d:11 00:16:17 ..... ..... 949873 17091172 00:00:af:82:56:93 11:26:39 950703 17091258 00:00:af:82:56:93 11:30:59
I don’t know if I’m complicating or this task is easier than I see, but I’ve passed the last 48 hours without any favorable result. Can you help me please? Thanks a lot
Advertisement
Answer
As your data are already sorted by mac address and dates, you don’t need to use groupby
.
df1 = df.loc[(df['mac_address'].ne(df['mac_address'].shift())) | (df['mac_address'].ne(df['mac_address'].shift(-1)))]
First output:
>>> df1 router mac_address date 589455 15001391 00:00:34:1a:03:e8 2021-01-01 22:09:34 590136 17091236 00:00:34:1a:03:e8 2021-01-01 22:11:04 635434 15001391 00:00:78:01:0d:11 2021-01-02 00:14:54 636479 17091211 00:00:78:01:0d:11 2021-01-02 00:16:17 949873 17091172 00:00:af:82:56:93 2021-01-02 11:26:39 950703 17091258 00:00:af:82:56:93 2021-01-02 11:30:59 619384 17091174 00:01:09:d2:09:e0 2021-01-01 23:34:32 365351 17091211 00:01:d2:7c:4e:32 2021-01-01 14:27:58 109858 17091236 00:02:75:86:4e:34 2021-01-01 05:50:47 110281 17091211 00:02:75:86:4e:34 2021-01-01 05:50:54
Second output:
>>> df1.loc[df1.duplicated('mac_address', keep=False)] router mac_address date 589455 15001391 00:00:34:1a:03:e8 2021-01-01 22:09:34 590136 17091236 00:00:34:1a:03:e8 2021-01-01 22:11:04 635434 15001391 00:00:78:01:0d:11 2021-01-02 00:14:54 636479 17091211 00:00:78:01:0d:11 2021-01-02 00:16:17 949873 17091172 00:00:af:82:56:93 2021-01-02 11:26:39 950703 17091258 00:00:af:82:56:93 2021-01-02 11:30:59 109858 17091236 00:02:75:86:4e:34 2021-01-01 05:50:47 110281 17091211 00:02:75:86:4e:34 2021-01-01 05:50:54