I’m running below code to clean text
import pandas as pd def not_regex(pattern): return r"((?!{}).)".format(pattern) tmp = pd.DataFrame(['No one has a European accent either @', 'That the kid reminds me of Kevin']) tmp[0].str.replace(not_regex('(\b[-/]\b|[a-zA-Z0-9])'), ' ')
Then it returns a warning
<ipython-input-8-ef8a43f91dbd>:9: FutureWarning: The default value of regex will change from True to False in a future version. tmp[0].str.replace(not_regex('(\b[-/]\b|[a-zA-Z0-9])'), ' ')
Could you please elaborate on the reason of this warning?
Advertisement
Answer
See Pandas 1.2.0 release notes:
The default value of regex for
Series.str.replace()
will change from True to False in a future release. In addition, single character regular expressions will not be treated as literal strings when regex=True is set (GH24804)
I.e., use regular expressions explicitly now:
dframe['colname'] = dframe['colname'].str.replace(r'D+', regex=True)