I want to replace multiple strings in my list of dataframes that match. I cannot get these to match and replace in place, instead it produces additional row entries.
Here’s the example data:
JavaScript
x
14
14
1
import pandas as pd
2
import re
3
from scipy import linalg
4
5
nm=['sr', 'pop15', 'pop75', 'dpi', 'ddpi']
6
df_tbl=pd.DataFrame(linalg.circulant(nm))
7
8
ls_comb = [df_tbl.loc[0:i] for i in range(0, len(df_tbl))]
9
10
extract_text=['dpi', 'pop15']
11
clean_text=['np.log(dpi)', 'np.log(pop15)']
12
cl_text=[re.search('(?<=\()[^\^\)]+', i).group(0) for i in clean_text]
13
int_text=list(set(extract_text).intersection(cl_text))
14
I know that int_text
is the same as extract_text
, but in some instances I may only have one np.log
for clean_text
, so I just left this as is as I would be using int_text
to filter.
And what I have tried:
JavaScript
1
11
11
1
[
2
i.apply(
3
lambda x: [
4
re.sub(rf"b{ext_t}b", cln_t, val)
5
for val in x
6
for ext_t, cln_t in zip(int_text, clean_text)
7
]
8
)
9
for i in ls_comb
10
]
11
It produces the following:
JavaScript
1
19
19
1
[ 0 1 2 3 4
2
0 sr ddpi np.log(dpi) pop75 pop15
3
1 sr ddpi dpi pop75 np.log(pop15),
4
0 1 2 3 4
5
0 sr ddpi np.log(dpi) pop75 pop15
6
1 sr ddpi dpi pop75 np.log(pop15)
7
2 pop15 sr ddpi np.log(dpi) pop75
8
3 np.log(pop15) sr ddpi dpi pop75,
9
0 1 2 3 4
10
0 sr ddpi np.log(dpi) pop75 pop15
11
1 sr ddpi dpi pop75 np.log(pop15)
12
2 pop15 sr ddpi np.log(dpi) pop75
13
3 np.log(pop15) sr ddpi dpi pop75
14
4 pop75 pop15 sr ddpi np.log(dpi)
15
5 pop75 np.log(pop15) sr ddpi dpi,
16
.
17
.
18
.
19
However, it produces additional rows, I expect a clean solution like this:
JavaScript
1
9
1
[ 0 1 2 3 4
2
0 sr ddpi np.log(dpi) pop75 np.log(pop15),
3
0 1 2 3 4
4
0 sr ddpi np.log(dpi) pop75 np.log(pop15)
5
1 np.log(pop15) sr ddpi np.log(dpi) pop75,
6
.
7
.
8
.
9
Advertisement
Answer
JavaScript
1
11
11
1
import pandas as pd
2
from scipy import linalg
3
4
nm=['sr', 'pop15', 'pop75', 'dpi', 'ddpi']
5
df_tbl=pd.DataFrame(linalg.circulant(nm))
6
7
extract_text=['dpi', 'pop15']
8
clean_text=['np.log(dpi)', 'np.log(pop15)']
9
df_tbl.replace(extract_text, clean_text, inplace=True)
10
print(df_tbl)
11
Output:
JavaScript
1
7
1
0 1 2 3 4
2
0 sr ddpi np.log(dpi) pop75 np.log(pop15)
3
1 np.log(pop15) sr ddpi np.log(dpi) pop75
4
2 pop75 np.log(pop15) sr ddpi np.log(dpi)
5
3 np.log(dpi) pop75 np.log(pop15) sr ddpi
6
4 ddpi np.log(dpi) pop75 np.log(pop15) sr
7