I am working on a code like below, which slices the address column. For this I have created a dictionary and created an empty list final
to append all the pre processing.see code
JavaScript
x
23
23
1
import pandas as pd
2
3
data = {'id': ['001', '002', '003'],
4
'address': ["William J. Clare\n290 Valley Dr.\nCasper, WY 82604\nUSA",
5
"1180 Shelard Tower\nMinneapolis, MN 55426\nUSA",
6
"William N. Barnard\n145 S. Durbin\nCasper, WY 82601\nUSA"]
7
8
df_dict = df.to_dict('records')
9
10
final = []
11
for row in df_dict:
12
add = row["address"]
13
# print(add.split("\n") , len(add.split("\n")))
14
if len(add.split("\n")) > 3:
15
target = add.split("\n")
16
target = target[-3:]
17
target = '\n'.join(target)
18
else:
19
target = add.split("\n")
20
target = '\n'.join(target)
21
final.append(target)
22
print(target)
23
After preprocessing I am appending the empty list. Now, I want to update the df_dict
with the final
list. and convert the df_dict
to pandas dataframe.
sample out put:
JavaScript
1
5
1
id address
2
1 290 Valley Dr.\nCasper, WY 82604\nUSA
3
2 1180 Shelard Tower\nMinneapolis, MN 55426\nUSA
4
3 145 S. Durbin\nCasper, WY 82601\nUSA
5
Your help will be greatly appreciated.
Thanks in advance
Advertisement
Answer
You can operate directly on your df using str.split
and apply
to re-join the last 3 segments:
JavaScript
1
12
12
1
import pandas as pd
2
3
data = {'id': [1, 2, 3],
4
'address': ["William J. Clare\n290 Valley Dr.\nCasper, WY 82604\nUSA",
5
"1180 Shelard Tower\nMinneapolis, MN 55426\nUSA",
6
"William N. Barnard\n145 S. Durbin\nCasper, WY 82601\nUSA"]
7
}
8
9
df = pd.DataFrame(data).set_index('id')
10
df['address'] = df['address'].str.rsplit('\n', n=3).apply(lambda x: '\n'.join(x[-3:]))
11
print(df)
12
Output:
JavaScript
1
6
1
address
2
id
3
1 290 Valley Dr.nCasper, WY 82604nUSA
4
2 1180 Shelard TowernMinneapolis, MN 55426nUSA
5
3 145 S. DurbinnCasper, WY 82601nUSA
6
Edit: converting your df to a dict and back is a waste of resource. You can implement your calculations in a functions and use apply
:
JavaScript
1
9
1
df = pd.DataFrame(data).set_index('id')
2
3
def many_calculations(row):
4
row['address'] = "\n".join(row['address'].rsplit('\n', 3)[-3:])
5
# add further calculations for your row here
6
return row
7
8
df = df.apply(many_calculations, axis=1)
9