I am working on a code like below, which slices the address column. For this I have created a dictionary and created an empty list final to append all the pre processing.see code
import pandas as pd
data = {'id':  ['001', '002', '003'],
        'address': ["William J. Clare\n290 Valley Dr.\nCasper, WY 82604\nUSA",
                    "1180 Shelard Tower\nMinneapolis, MN 55426\nUSA",
                    "William N. Barnard\n145 S. Durbin\nCasper, WY 82601\nUSA"]
df_dict = df.to_dict('records')
final = []
for row in df_dict:
    add = row["address"]
    # print(add.split("\n") , len(add.split("\n")))
    if len(add.split("\n")) > 3:
        target = add.split("\n")
        target = target[-3:]
        target = '\n'.join(target)
    else:
        target = add.split("\n")
        target = '\n'.join(target)
    final.append(target)
    print(target)
After preprocessing I am appending the empty list. Now, I want to update the df_dict with the final list. and convert the df_dict to pandas dataframe.
sample out put:
id address 1 290 Valley Dr.\nCasper, WY 82604\nUSA 2 1180 Shelard Tower\nMinneapolis, MN 55426\nUSA 3 145 S. Durbin\nCasper, WY 82601\nUSA
Your help will be greatly appreciated.
Thanks in advance
Advertisement
Answer
You can operate directly on your df using str.split and apply to re-join the last 3 segments:
import pandas as pd
data = {'id':  [1, 2, 3],
        'address': ["William J. Clare\n290 Valley Dr.\nCasper, WY 82604\nUSA",
                    "1180 Shelard Tower\nMinneapolis, MN 55426\nUSA",
                    "William N. Barnard\n145 S. Durbin\nCasper, WY 82601\nUSA"]
}
df = pd.DataFrame(data).set_index('id')
df['address'] = df['address'].str.rsplit('\n', n=3).apply(lambda x: '\n'.join(x[-3:]))
print(df)
Output:
address id 1 290 Valley Dr.nCasper, WY 82604nUSA 2 1180 Shelard TowernMinneapolis, MN 55426nUSA 3 145 S. DurbinnCasper, WY 82601nUSA
Edit: converting your df to a dict and back is a waste of resource. You can implement your calculations in a functions and use apply:
df = pd.DataFrame(data).set_index('id')
def many_calculations(row):
    row['address'] = "\n".join(row['address'].rsplit('\n', 3)[-3:])
    # add further calculations for your row here
    return row
df = df.apply(many_calculations, axis=1)