I am working on a code like below, which slices the address column. For this I have created a dictionary and created an empty list final
to append all the pre processing.see code
import pandas as pd data = {'id': ['001', '002', '003'], 'address': ["William J. Clare\n290 Valley Dr.\nCasper, WY 82604\nUSA", "1180 Shelard Tower\nMinneapolis, MN 55426\nUSA", "William N. Barnard\n145 S. Durbin\nCasper, WY 82601\nUSA"] df_dict = df.to_dict('records') final = [] for row in df_dict: add = row["address"] # print(add.split("\n") , len(add.split("\n"))) if len(add.split("\n")) > 3: target = add.split("\n") target = target[-3:] target = '\n'.join(target) else: target = add.split("\n") target = '\n'.join(target) final.append(target) print(target)
After preprocessing I am appending the empty list. Now, I want to update the df_dict
with the final
list. and convert the df_dict
to pandas dataframe.
sample out put:
id address 1 290 Valley Dr.\nCasper, WY 82604\nUSA 2 1180 Shelard Tower\nMinneapolis, MN 55426\nUSA 3 145 S. Durbin\nCasper, WY 82601\nUSA
Your help will be greatly appreciated.
Thanks in advance
Advertisement
Answer
You can operate directly on your df using str.split
and apply
to re-join the last 3 segments:
import pandas as pd data = {'id': [1, 2, 3], 'address': ["William J. Clare\n290 Valley Dr.\nCasper, WY 82604\nUSA", "1180 Shelard Tower\nMinneapolis, MN 55426\nUSA", "William N. Barnard\n145 S. Durbin\nCasper, WY 82601\nUSA"] } df = pd.DataFrame(data).set_index('id') df['address'] = df['address'].str.rsplit('\n', n=3).apply(lambda x: '\n'.join(x[-3:])) print(df)
Output:
address id 1 290 Valley Dr.nCasper, WY 82604nUSA 2 1180 Shelard TowernMinneapolis, MN 55426nUSA 3 145 S. DurbinnCasper, WY 82601nUSA
Edit: converting your df to a dict and back is a waste of resource. You can implement your calculations in a functions and use apply
:
df = pd.DataFrame(data).set_index('id') def many_calculations(row): row['address'] = "\n".join(row['address'].rsplit('\n', 3)[-3:]) # add further calculations for your row here return row df = df.apply(many_calculations, axis=1)