Skip to content
Advertisement

saving appended list/dictionary to pandas dataframe

I am working on a code like below, which slices the address column. For this I have created a dictionary and created an empty list final to append all the pre processing.see code

import pandas as pd

data = {'id':  ['001', '002', '003'],
        'address': ["William J. Clare\n290 Valley Dr.\nCasper, WY 82604\nUSA",
                    "1180 Shelard Tower\nMinneapolis, MN 55426\nUSA",
                    "William N. Barnard\n145 S. Durbin\nCasper, WY 82601\nUSA"]

df_dict = df.to_dict('records')

final = []
for row in df_dict:
    add = row["address"]
    # print(add.split("\n") , len(add.split("\n")))
    if len(add.split("\n")) > 3:
        target = add.split("\n")
        target = target[-3:]
        target = '\n'.join(target)
    else:
        target = add.split("\n")
        target = '\n'.join(target)
    final.append(target)
    print(target)

After preprocessing I am appending the empty list. Now, I want to update the df_dict with the final list. and convert the df_dict to pandas dataframe.

sample out put:

id  address
1   290 Valley Dr.\nCasper, WY 82604\nUSA
2   1180 Shelard Tower\nMinneapolis, MN 55426\nUSA
3   145 S. Durbin\nCasper, WY 82601\nUSA

Your help will be greatly appreciated.

Thanks in advance

Advertisement

Answer

You can operate directly on your df using str.split and apply to re-join the last 3 segments:

import pandas as pd

data = {'id':  [1, 2, 3],
        'address': ["William J. Clare\n290 Valley Dr.\nCasper, WY 82604\nUSA",
                    "1180 Shelard Tower\nMinneapolis, MN 55426\nUSA",
                    "William N. Barnard\n145 S. Durbin\nCasper, WY 82601\nUSA"]
}

df = pd.DataFrame(data).set_index('id')
df['address'] = df['address'].str.rsplit('\n', n=3).apply(lambda x: '\n'.join(x[-3:]))
print(df)

Output:

                                           address
id
1            290 Valley Dr.nCasper, WY 82604nUSA
2   1180 Shelard TowernMinneapolis, MN 55426nUSA
3             145 S. DurbinnCasper, WY 82601nUSA

Edit: converting your df to a dict and back is a waste of resource. You can implement your calculations in a functions and use apply:

df = pd.DataFrame(data).set_index('id')

def many_calculations(row):
    row['address'] = "\n".join(row['address'].rsplit('\n', 3)[-3:])
    # add further calculations for your row here
    return row

df = df.apply(many_calculations, axis=1)
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement