Skip to content
Advertisement

Convert a list of strings in pandas into a list of date, and filter the value accordingly

Now I have a dataframe similar as follows

business_id                  date
0   --0r8K_AQ4FZfLsX3ZYRDA  [2017-09-03 17:13:59]
1   --0zrn43LEaB4jUWTQH_Bg  [2010-10-08 22:21:20, 2010-11-01 21:29:14, 2...
2   --164t1nclzzmca7eDiJMw  [2010-02-26 02:06:53, 2010-02-27 08:00:09, 2...
3   --2aF9NhXnNVpDV0KS3xBQ  [2014-11-03 16:35:35, 2015-01-30 18:16:03, 2...
4   --2mEJ63SC_8_08_jGgVIg  [2010-12-15 17:10:46, 2013-12-28 00:27:54, 2...
... ... ...
997 -SjRCXID7eXewqloY3V86w  [2015-12-13 02:48:00, 2016-01-21 22:31:31, 2...
998 -Sjrz1Mt9RY4r6ibxzGs0Q  [2016-08-08 19:23:27, 2016-08-15 16:03:29, 2...
999 -Sk9ZND7V2x8RuauMH0FRw  [2010-09-05 02:04:25, 2010-10-15 22:48:00, 2...
1000    -SkNedh2bJHPOcKfoFlTvg  [2013-09-01 02:54:45, 2013-10-22 16:59:13, 2...
1001    -SkwKPbo5oK1-NtKkupNvw  [2010-09-11 20:23:45, 2011-05-26 16:24:35, 2...

What I am trying to do is to

  • Convert all the values in the list to date
  • Filter the value which are only later than 2018-01-01

In the first step, what I tried to do is to use a apply function so that I can cover all elements in the list:

def convert_to_date(d): 
    pd.to_datetime(d, format='%Y-%m-%d %H:%M:%S')

checkin_data['date'].apply(convert_to_date) 

However, the result was like this

0       None
1       None
2       None
3       None
4       None
        ... 
997     None
998     None
999     None
1000    None
1001    None
Name: date, Length: 1002, dtype: object

How should I fix it? Thank you for your help!

Advertisement

Answer

Add return for avoid missing values and filter greater values in boolean indexing:

print (checkin_data)
              business_id                                        date
0  --0r8K_AQ4FZfLsX3ZYRDA                       [2022-09-03 17:13:59]
1  --0zrn43LEaB4jUWTQH_Bg  [2018-10-08 22:21:20, 2010-11-01 21:29:14]
2  --164t1nclzzmca7eDiJMw  [2019-02-26 02:06:53, 2030-02-27 08:00:09]
3  --2aF9NhXnNVpDV0KS3xBQ  [2014-11-03 16:35:35, 2015-01-30 18:16:03]
4  --2mEJ63SC_8_08_jGgVIg  [2010-12-15 17:10:46, 2013-12-28 00:27:54]

def convert_to_date(d): 
    x = pd.to_datetime(d)
    return x[x > '2018-01-01'].tolist()


checkin_data['date'] = checkin_data['date'].apply(convert_to_date) 
print (checkin_data)
              business_id                                        date
0  --0r8K_AQ4FZfLsX3ZYRDA                       [2022-09-03 17:13:59]
1  --0zrn43LEaB4jUWTQH_Bg                       [2018-10-08 22:21:20]
2  --164t1nclzzmca7eDiJMw  [2019-02-26 02:06:53, 2030-02-27 08:00:09]
3  --2aF9NhXnNVpDV0KS3xBQ                                          []
4  --2mEJ63SC_8_08_jGgVIg                                          []
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement