How to replace rows which do not follow a specific schema-pattern? [closed]

Tags: , ,



I would like to delete all the rows that does not follow this pattern

01-12-2002 12:00:00

My column is type(‘O’) and I would like to convert it into datetime, but unfortunately there are some rows which contain text. What I thought was to exclude all the rows which do no follow that pattern (using regex I would say w+-w+-w+sw+-w+-w+) and not digit.

However, it seems that the pattern above does work when applied to the column.

I would appreciated if you could tell me how to fix the pattern above in order to exclude (or just replace with null values) the rows not containing that schema.

Answer

Try .str.match:

# sample data
df = pd.DataFrame({'your_column':['01-12-2002 12:00:00', 'This 01-12-2002 12:00:00', 
                                  'Another row', '01-12-2002 12:00:01']})

# different pattern than yours, notice the two `:`
df.loc[df['your_column'].str.match('^w+-w+-w+sw+:w+:w+$')]

Output:

           your_column
0  01-12-2002 12:00:00
3  01-12-2002 12:00:01


Source: stackoverflow