I want to replace multiple strings in my list of dataframes that match. I cannot get these to match and replace in place, instead it produces additional row entries.
Here’s the example data:
import pandas as pd import re from scipy import linalg nm=['sr', 'pop15', 'pop75', 'dpi', 'ddpi'] df_tbl=pd.DataFrame(linalg.circulant(nm)) ls_comb = [df_tbl.loc[0:i] for i in range(0, len(df_tbl))] extract_text=['dpi', 'pop15'] clean_text=['np.log(dpi)', 'np.log(pop15)'] cl_text=[re.search('(?<=\()[^\^\)]+', i).group(0) for i in clean_text] int_text=list(set(extract_text).intersection(cl_text))
I know that int_text
is the same as extract_text
, but in some instances I may only have one np.log
for clean_text
, so I just left this as is as I would be using int_text
to filter.
And what I have tried:
[ i.apply( lambda x: [ re.sub(rf"b{ext_t}b", cln_t, val) for val in x for ext_t, cln_t in zip(int_text, clean_text) ] ) for i in ls_comb ]
It produces the following:
[ 0 1 2 3 4 0 sr ddpi np.log(dpi) pop75 pop15 1 sr ddpi dpi pop75 np.log(pop15), 0 1 2 3 4 0 sr ddpi np.log(dpi) pop75 pop15 1 sr ddpi dpi pop75 np.log(pop15) 2 pop15 sr ddpi np.log(dpi) pop75 3 np.log(pop15) sr ddpi dpi pop75, 0 1 2 3 4 0 sr ddpi np.log(dpi) pop75 pop15 1 sr ddpi dpi pop75 np.log(pop15) 2 pop15 sr ddpi np.log(dpi) pop75 3 np.log(pop15) sr ddpi dpi pop75 4 pop75 pop15 sr ddpi np.log(dpi) 5 pop75 np.log(pop15) sr ddpi dpi, . . .
However, it produces additional rows, I expect a clean solution like this:
[ 0 1 2 3 4 0 sr ddpi np.log(dpi) pop75 np.log(pop15), 0 1 2 3 4 0 sr ddpi np.log(dpi) pop75 np.log(pop15) 1 np.log(pop15) sr ddpi np.log(dpi) pop75, . . .
Advertisement
Answer
import pandas as pd from scipy import linalg nm=['sr', 'pop15', 'pop75', 'dpi', 'ddpi'] df_tbl=pd.DataFrame(linalg.circulant(nm)) extract_text=['dpi', 'pop15'] clean_text=['np.log(dpi)', 'np.log(pop15)'] df_tbl.replace(extract_text, clean_text, inplace=True) print(df_tbl)
Output:
0 1 2 3 4 0 sr ddpi np.log(dpi) pop75 np.log(pop15) 1 np.log(pop15) sr ddpi np.log(dpi) pop75 2 pop75 np.log(pop15) sr ddpi np.log(dpi) 3 np.log(dpi) pop75 np.log(pop15) sr ddpi 4 ddpi np.log(dpi) pop75 np.log(pop15) sr