DataFrame contains dates which are having these types: “21-10-2021” and 29052021.I want to extract pattern of it. for example ‘5-15-2019’,it needs to produce ‘%d-%m-%Y’ ‘05152021’ it needs to produce ‘%d%m%Y’
i tried in this way:
search6=[] for val in list(df.apply(lambda x:re.search('(?:[1-9]|[12][0-9]|3[01])[-](?:[1-9]|10|11|12])[-]d{2,4}',str(x)))): if val: li=val.group() search6.append(li) print(search6)
output: i got a list of those patterns.i need to get pattern ‘%d-%m-%Y’ and Similarly i need to get pattern for ‘%d%m%Y’ also.how i need to do it? can any body help me.Thank you
Advertisement
Answer
You can use the internal pandas method pandas._libs.tslibs.parsing.guess_datetime_format
. Be careful, this is not part of the public API, so the function might change without any warning in the future.
option 1
from pandas._libs.tslibs.parsing import guess_datetime_format s = pd.Series(['21-10-2021', '29052021', '5-15-2019', '05152021', '20000101', '01-01-2001']) s.map(lambda x: guess_datetime_format(x, dayfirst=True))
option 2
....YYYY
dates are not supported. For those you need to cheat by adding dashes temporarily:
def parse(x): out = guess_datetime_format(x, dayfirst=True) if out is None and x.isdigit() and len(x)==8: out = (guess_datetime_format(f'{x[:2]}-{x[2:4]}-{x[4:]}', dayfirst=True) .replace('-', '') ) return out s.map(parse)
Example:
date option1 option2 0 21-10-2021 %d-%m-%Y %d-%m-%Y 1 29052021 None %d%m%Y 2 5-15-2019 %m-%d-%Y %m-%d-%Y 3 05152021 None %m%d%Y 4 20000101 %Y%m%d %Y%m%d 5 01-01-2001 %d-%m-%Y %d-%m-%Y