Skip to content
Advertisement

DataFrame contains a column of dates which are having these types: “‘5-15-2019′” and 05152021.I want to extract pattern of it

DataFrame contains dates which are having these types: “21-10-2021” and 29052021.I want to extract pattern of it. for example ‘5-15-2019’,it needs to produce ‘%d-%m-%Y’ ‘05152021’ it needs to produce ‘%d%m%Y’

i tried in this way:

search6=[]
for val in list(df.apply(lambda x:re.search('(?:[1-9]|[12][0-9]|3[01])[-](?:[1-9]|10|11|12])[-]d{2,4}',str(x)))):
if val:
li=val.group()
search6.append(li)
print(search6)

output: i got a list of those patterns.i need to get pattern ‘%d-%m-%Y’ and Similarly i need to get pattern for ‘%d%m%Y’ also.how i need to do it? can any body help me.Thank you

Advertisement

Answer

You can use the internal pandas method pandas._libs.tslibs.parsing.guess_datetime_format. Be careful, this is not part of the public API, so the function might change without any warning in the future.

option 1
from pandas._libs.tslibs.parsing import guess_datetime_format
s = pd.Series(['21-10-2021', '29052021', '5-15-2019', '05152021', '20000101', '01-01-2001'])

s.map(lambda x: guess_datetime_format(x, dayfirst=True))
option 2

....YYYY dates are not supported. For those you need to cheat by adding dashes temporarily:

def parse(x):
    out = guess_datetime_format(x, dayfirst=True)
    if out is None and x.isdigit() and len(x)==8:
        out = (guess_datetime_format(f'{x[:2]}-{x[2:4]}-{x[4:]}',
                                     dayfirst=True)
               .replace('-', '')
              )
    return out

s.map(parse)

Example:

         date   option1   option2
0  21-10-2021  %d-%m-%Y  %d-%m-%Y
1    29052021      None    %d%m%Y
2   5-15-2019  %m-%d-%Y  %m-%d-%Y
3    05152021      None    %m%d%Y
4    20000101    %Y%m%d    %Y%m%d
5  01-01-2001  %d-%m-%Y  %d-%m-%Y
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement